MLOpsMachine Learning
End-to-End ML API on AWS EKS
Production-grade ML inference API with Redis caching, Kubernetes HPA autoscaling, and Istio service mesh on AWS EKS.

The Problem
From Notebook to Production
ML models in notebooks don't serve real users. The challenge was to deploy a sentiment analysis model to production with high availability, low latency, and automatic scaling under load.
The Approach
Kubernetes + Redis Architecture
Containerized a DistilBERT model with FastAPI and Docker. Deployed on AWS EKS with Kubernetes HPA autoscaling (1→70 pods), Redis caching (95% hit rate), Istio service mesh for traffic routing, and Grafana dashboards for observability.
Technologies & Methods
PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERTHealth ChecksReliability/Performance TestingCommand line (bash)
The Results
Zero Downtime at Scale
Sustained 70-100 req/s with 0% error rate across 28K+ requests. System autoscaled from 1 to 70 pods while maintaining sub-second P99 latency, with Redis achieving a 95% cache hit rate.