English Text Detoxification System
Modular NLP system that rewrites toxic content into safe alternatives using explainability-driven masking and multi-objective reranking.

The Problem
Toxic Content at Scale
Toxic content proliferates across social platforms, but simple filtering removes context. The challenge was to rewrite harmful text into safe alternatives while preserving original meaning and fluency.
The Approach
Modular NLP Pipeline
Developed a modular pipeline testing 11 configurations across three stages: explainability-driven masking (DecompX), LLM-based infilling (Mistral-7B, T5-base), and a novel multi-objective reranking algorithm (Global Reranking) balancing toxicity, similarity, and fluency.
Technologies & Methods
The Results
75% Toxicity Reduction
Best configuration (T5-base + Global Reranking) reduced toxicity to 0.051 (75% from 0.208 baseline) while maintaining 93.6% semantic similarity (BERTScore). The Global Reranking algorithm reduced toxicity vs. baseline across all 11 configurations.