Machine Learning

Heart Failure Survival Prediction

Clinical ML system predicting patient survival using GMM-based synthetic augmentation and ensemble classifiers.

The Problem

Small Clinical Dataset

Heart failure has high mortality, but clinical datasets are small. The challenge was building an accurate survival prediction model despite limited training data (299 patients).

The Approach

Synthetic Augmentation + Ensemble

Tested 21 model configurations across Logistic Regression, KNN, Random Forest, Gradient Boosting, and Neural Networks. Addressed small-sample limitations with GMM-based synthetic data augmentation (5,299 generated samples) and threshold tuning to minimize missed high-risk cases.

Technologies & Methods

PythonTensorFlow/Kerasscikit-learnGradient BoostingRandom ForestGMMLogistic RegressionKNNK-Means ClusteringDecision TreeAdaBoostMLP Neural NetworksHyperparameter Tuning

The Results

85% Prediction Accuracy

Achieved 85% test accuracy with GMM-augmented Gradient Boosting and cut test loss by 42% (0.99 to 0.27). Tuned the decision threshold (0.25–0.30) to prioritize recall, reaching up to 75% recall and reducing missed high-risk patients.

View Source

View Presentation Slides

Key Result

85% test accuracy with 42% loss reduction via synthetic data augmentation

Technologies & Methods

PythonTensorFlow/Kerasscikit-learnGradient BoostingRandom ForestGMMLogistic RegressionKNNK-Means ClusteringDecision TreeAdaBoostMLP Neural NetworksHyperparameter Tuning

View Presentation Slides

Back to Projects

Machine Learning

Heart Failure Survival Prediction

Clinical ML system predicting patient survival using GMM-based synthetic augmentation and ensemble classifiers.

The Problem

Small Clinical Dataset

Heart failure has high mortality, but clinical datasets are small. The challenge was building an accurate survival prediction model despite limited training data (299 patients).

The Approach

Synthetic Augmentation + Ensemble

Technologies & Methods

PythonTensorFlow/Kerasscikit-learnGradient BoostingRandom ForestGMMLogistic RegressionKNNK-Means ClusteringDecision TreeAdaBoostMLP Neural NetworksHyperparameter Tuning

The Results

85% Prediction Accuracy

View Source

View Presentation Slides

Key Result

85% test accuracy with 42% loss reduction via synthetic data augmentation

Technologies & Methods

PythonTensorFlow/Kerasscikit-learnGradient BoostingRandom ForestGMMLogistic RegressionKNNK-Means ClusteringDecision TreeAdaBoostMLP Neural NetworksHyperparameter Tuning

View Presentation Slides