Abstract: Diabetes is a metabolic disease that results in high blood sugar. The hormone imbalance is one of the main reasons for this metabolic disorder. The specific hormone affected is insulin, the one which regulates sugar in the blood. The disease causes the patient’s body either to not make sufficient insulin or can’t efficiently and effectively use the insulin made. The same disease also becomes the reason for the death of 1.6 million people every year. Despite our medical development and natural endurance the cases of diabetes have risen in recent decades. We are in the age of information, we have a surplus amount of data to feed our data-hungry machine learning algorithms. The medical data of diabetic patients show a similar pattern which makes it possible to predict diabetes in an early stage. Thus, contributing to fighting back against the disease and for goodwill. The paper presented seven machine learning classifiers that have been implemented on the early-stage risk prediction diabetes dataset and three different evaluation metrics i.e. classification accuracy, F-score and ROC value are used to evaluate the performance of the algorithms on the validation set. The results presented brings out clear results in favour of the Random Forest on the average-sized dataset.
Keywords: Decision Tree Classifier Random Forest Classifier, Support Vector Machines, Multi-Layer Perceptron, K-Nearest Neighbours Classifier, Naïve Bayes Classifier, Logistic Regression, Binary Classification, Diabetes.
| DOI: 10.17148/IARJSET.2021.8228