NLP & LLMsMachine Learning

English Text Detoxification System

Modular NLP system that rewrites toxic content into safe alternatives using explainability-driven masking and multi-objective reranking.

The Problem

Toxic Content at Scale

Toxic content proliferates across social platforms, but simple filtering removes context. The challenge was to rewrite harmful text into safe alternatives while preserving original meaning and fluency.

The Approach

Modular NLP Pipeline

Developed a modular pipeline testing 11 configurations across three stages: explainability-driven masking (DecompX), LLM-based infilling (Mistral-7B, T5-base), and a novel multi-objective reranking algorithm (Global Reranking) balancing toxicity, similarity, and fluency.

Technologies & Methods

PythonPyTorchHugging Face TransformersT5-baseMistral-7BRoBERTaBERTScoreDecompXMaRCo BARTXLM-R Toxicity ScoringLaBSE Semantic SimilarityGPT-2 Perplexity

The Results

75% Toxicity Reduction

Best configuration (T5-base + Global Reranking) reduced toxicity to 0.051 (75% from 0.208 baseline) while maintaining 93.6% semantic similarity (BERTScore). The Global Reranking algorithm reduced toxicity vs. baseline across all 11 configurations.

View Source

View Presentation Slides View Report

Key Result

75% toxicity reduction (0.208→0.051) while maintaining 93.6% semantic similarity

Technologies & Methods

PythonPyTorchHugging Face TransformersT5-baseMistral-7BRoBERTaBERTScoreDecompXMaRCo BARTXLM-R Toxicity ScoringLaBSE Semantic SimilarityGPT-2 Perplexity

View Presentation Slides

View Report

Back to Projects

NLP & LLMsMachine Learning

English Text Detoxification System

Modular NLP system that rewrites toxic content into safe alternatives using explainability-driven masking and multi-objective reranking.

The Problem

Toxic Content at Scale

The Approach

Modular NLP Pipeline

Technologies & Methods

PythonPyTorchHugging Face TransformersT5-baseMistral-7BRoBERTaBERTScoreDecompXMaRCo BARTXLM-R Toxicity ScoringLaBSE Semantic SimilarityGPT-2 Perplexity

The Results

75% Toxicity Reduction

View Source

View Presentation Slides View Report

Key Result

75% toxicity reduction (0.208→0.051) while maintaining 93.6% semantic similarity

Technologies & Methods

PythonPyTorchHugging Face TransformersT5-baseMistral-7BRoBERTaBERTScoreDecompXMaRCo BARTXLM-R Toxicity ScoringLaBSE Semantic SimilarityGPT-2 Perplexity

View Presentation Slides

View Report