MLOpsMachine Learning

End-to-End ML API on AWS EKS

Production-grade ML inference API with Redis caching, Kubernetes HPA autoscaling, and Istio service mesh on AWS EKS.

The Problem

From Notebook to Production

ML models in notebooks don't serve real users. The challenge was to deploy a sentiment analysis model to production with high availability, low latency, and automatic scaling under load.

The Approach

Kubernetes + Redis Architecture

Containerized a DistilBERT model with FastAPI and Docker. Deployed on AWS EKS with Kubernetes HPA autoscaling (1→70 pods), Redis caching (95% hit rate), Istio service mesh for traffic routing, and Grafana dashboards for observability.

Technologies & Methods

PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERTHealth ChecksReliability/Performance TestingCommand line (bash)

The Results

Zero Downtime at Scale

Sustained 70-100 req/s with 0% error rate across 28K+ requests. System autoscaled from 1 to 70 pods while maintaining sub-second P99 latency, with Redis achieving a 95% cache hit rate.

View Source

Key Result

Sustained 70-100 req/s with 0% error rate across 28K+ requests

Technologies & Methods

PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERTHealth ChecksReliability/Performance TestingCommand line (bash)

Back to Projects

MLOpsMachine Learning

End-to-End ML API on AWS EKS

Production-grade ML inference API with Redis caching, Kubernetes HPA autoscaling, and Istio service mesh on AWS EKS.

The Problem

From Notebook to Production

ML models in notebooks don't serve real users. The challenge was to deploy a sentiment analysis model to production with high availability, low latency, and automatic scaling under load.

The Approach

Kubernetes + Redis Architecture

Technologies & Methods

PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERTHealth ChecksReliability/Performance TestingCommand line (bash)

The Results

Zero Downtime at Scale

Sustained 70-100 req/s with 0% error rate across 28K+ requests. System autoscaled from 1 to 70 pods while maintaining sub-second P99 latency, with Redis achieving a 95% cache hit rate.

View Source

Key Result

Sustained 70-100 req/s with 0% error rate across 28K+ requests

Technologies & Methods

PythonFastAPIDockerKubernetes (EKS)AWSRedisIstioGrafanak6DistilBERTHealth ChecksReliability/Performance TestingCommand line (bash)