MLOpsMachine Learning
End-to-End ML API on AWS EKS
Production-grade ML inference API with Redis caching, Kubernetes HPA autoscaling, and Istio service mesh on AWS EKS.

The Problem
Context & Challenge
ML models in notebooks don't serve real users. The challenge was to deploy a sentiment analysis model to production with high availability, low latency, and automatic scaling under load.
The Approach
Architecture & Implementation
Containerized a DistilBERT model with FastAPI and Docker. Deployed on AWS EKS with Kubernetes HPA autoscaling (1→70 pods), Redis caching (95% hit rate), Istio service mesh for traffic routing, and Grafana dashboards for observability.
The Results
Impact & Metrics
Sustained 70-100 req/s with 0% error rate across 28K+ requests. Achieved sub-second P99 latency at steady state with 100% uptime during load testing (k6).
Key Result
Sustained 70-100 req/s with 0% error rate across 28K+ requests
Technologies & Methods
PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERT