English Text Detoxification System
Modular NLP system that rewrites toxic content into safe alternatives using explainability-driven masking and multi-objective reranking.

The Problem
Context & Challenge
Toxic content proliferates across social platforms, but simple filtering removes context. The challenge was to rewrite harmful text into safe alternatives while preserving original meaning and fluency.
The Approach
Architecture & Implementation
Developed a modular pipeline testing 11 configurations across three stages: explainability-driven masking (DecompX), LLM-based infilling (Mistral-7B, T5-base), and a novel multi-objective reranking algorithm balancing toxicity, similarity, and fluency.
The Results
Impact & Metrics
Achieved 75% toxicity reduction (0.208→0.051) while maintaining 93.6% semantic similarity (BERTScore). Discovered reranking has 3-5× more impact on safety than generation quality.