ABSTRACT: The most important issue for avoiding and preventing the progression of various diseases is earlier risk assessment and identification. To estimate disease risk factors, the researchers typically used the statistical comparative analysis or step-by-step methods of feature selection using regression techniques. The results of these methods focused on individual risk factors separately. However, rather than just one factor, a combination of factors is more likely to influence disease development. Genetic algorithms (GA) can be beneficial and efficient for finding a combination of factors for the fastest diagnosis with the highest accuracies, especially when dealing with a large number of complicated and poorly understood components, as in diseases prediction. Our proposed model demonstrates the potential for using GA to diagnose disease and predict accuracy. Our proposed ensemble model revealed that combining a limited selection of input features gives better results than using all of the single significant features individually. This model not only forecasts the optimal feature sets and accuracy but also overcomes the dataset’s missing values problem. Variables more commonly picked by LR may be more relevant for disease development prediction and accuracy by GA.

Keywords: Data Mining, Logistic Regression (LR), Genetic Algorithm (GA), Feature Selection (FS), Decision Tree (DT), Random Forest (RF)

PDF | DOI: 10.17148/IARJSET.2021.8817

Open chat