Hybrid Machine Learning Approaches for Early Diabetes Prediction Using Patient Health Data
Abstract: Diabetes mellitus, a pervasive chronic metabolic disorder, frequently evades early detection until irreversible complications-cardiovascular disease, nephropathy, neuropathy, and retinopathy-manifest. Conventional diagnostics reliant on laboratory assays and clinical expertise remain constrained by accessibility and cost. This investigation introduces a machine learning-driven diabetes risk prediction system leveraging the Pima Indians Diabetes Dataset, employing systematic data preprocessing, feature selection, and Logistic Regression modelling to deliver interpretable early-stage risk assessment from standard clinical parameters. Deployed through a Flask microservice architecture, the platform furnishes real-time probabilistic predictions with confidence intervals via an intuitive web interface, facilitating patient self-screening and healthcare provider decision support. Empirical validation confirms robust predictive performance suitable for population-scale early warning, while explicit positioning as an educational adjunct-rather than diagnostic substitute ensures clinical responsibility. The system advances accessible prediabetes surveillance, enabling timely lifestyle and pharmacotherapeutic interventions to mitigate long-term morbidity. CheckYourDiabetic introduces a hybrid machine learning framework for early Type 2 diabetes prediction, integrating Logistic Regression, K-Nearest Neighbors, Random Forest, and XGBoost via stacking ensemble on the Pima Indians Diabetes Dataset (n=768, 8 clinical features). Following robust preprocessing-KNN imputation, SMOTE oversampling, and RFE feature selection-the system achieves superior performance (AUC-ROC: 0.94, Sensitivity: 92%) compared to individual classifiers through complementary modeling of linear, local, and nonlinear biomarker interactions. Deployed as a Flask-based web application, it delivers real-time risk stratification with SHAP-based interpretability, enabling accessible pre-symptomatic screening and timely intervention to mitigate diabetes complications in resource-constrained settings.
How to Cite:
[1] Mohammed Nawaz Khan, K R Sumana, “Hybrid Machine Learning Approaches for Early Diabetes Prediction Using Patient Health Data,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2026.13150
