Abstract: Cardiovascular diseases remain the leading cause of mortality worldwide, necessitating accurate early prediction systems. This paper proposes HybridBoost, a novel XGBoost–SMOTE ensemble framework designed to address class imbalance in the UCI Heart Disease dataset through strategic oversampling and feature optimization. Unlike existing approaches that achieve approximately 92–94% accuracy, HybridBoost attains 96.8% accuracy, 0.96 F1-score, and 0.98 AUC-ROC through 5-fold cross-validation. The proposed methodology integrates Recursive Feature Elimination (RFE) with SMOTE oversampling (1:1 ratio) prior to classification using XGBoost (n_estimators = 100, max_depth = 6). Comparative analysis against Random Forest, AutoML, and SMOTE-ENN-XGBoost demonstrates a 3–5% improvement in performance. Feature importance analysis identifies chest pain type (cp), maximum heart rate (thalach), and ST depression (oldpeak) as the primary predictors of heart disease. These findings are consistent with established clinical indicators reported in previous studies. HybridBoost advances precise binary heart disease classification, moving beyond multiclass approaches, heart failure-specific models, and generic ensemble methods. The results highlight its potential for clinical decision support and future deployment in healthcare environments.

Keywords: Heart disease prediction, XGBoost, SMOTE, ensemble learning, class imbalance, UCI dataset, feature selection, cardiovascular risk assessment


Downloads: PDF | DOI: 10.17148/IARJSET.2026.13388

How to Cite:

[1] B. Rajalingam, Dr. B. Aysha Banu, R. Sathiyasri, R. Rifqua Fathima, A. Rifqua Fathima, S. Mufeena, "HybridBoost: An XGBoost-SMOTE Ensemble for Precise Heart Disease Prediction," International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2026.13388

Open chat