Abstract: Diabetes mellitus, a pervasive chronic metabolic disorder, frequently evades early detection until irreversible complications—cardiovascular disease, nephropathy, neuropathy, and retinopathy—manifest. Conventional diagnostics reliant on laboratory assays and clinical expertise remain constrained by accessibility and cost. This investigation introduces a machine learning-driven diabetes risk prediction system leveraging the Pima Indians Diabetes Dataset, employing systematic data preprocessing, feature selection, and Logistic Regression modelling to deliver interpretable early-stage risk assessment from standard clinical parameters. Deployed through a Flask microservice architecture, the platform furnishes real-time probabilistic predictions with confidence intervals via an intuitive web interface, facilitating patient self-screening and healthcare provider decision support. Empirical validation confirms robust predictive performance suitable for population-scale early warning, while explicit positioning as an educational adjunct—rather than diagnostic substitute ensures clinical responsibility. The system advances accessible prediabetes surveillance, enabling timely lifestyle and pharmacotherapeutic interventions to mitigate long-term morbidity. CheckYourDiabetic introduces a hybrid machine learning framework for early Type 2 diabetes prediction, integrating Logistic Regression, K-Nearest Neighbors, Random Forest, and XGBoost via stacking ensemble on the Pima Indians Diabetes Dataset (n=768, 8 clinical features). Following robust preprocessing—KNN imputation, SMOTE oversampling, and RFE feature selection—the system achieves superior performance (AUC-ROC: 0.94, Sensitivity: 92%) compared to individual classifiers through complementary modeling of linear, local, and nonlinear biomarker interactions. Deployed as a Flask-based web application, it delivers real-time risk stratification with SHAP-based interpretability, enabling accessible pre-symptomatic screening and timely intervention to mitigate diabetes complications in resource-constrained settings.


Downloads: PDF | DOI: 10.17148/IARJSET.2026.13150

How to Cite:

[1] Mohammed Nawaz Khan, K R Sumana, "Hybrid Machine Learning Approaches for Early Diabetes Prediction Using Patient Health Data," International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2026.13150

Open chat