Using Pearson Correlation and Mutual Information (PC-MI) to Select Features for Accurate Breast Cancer Diagnosis Based on a Soft Voting Classifier
Abstract
Breast cancer is one of the most critical diseases suffered by many people around the world, making it the most commonmedical risk they will face. This disease is considered the leading cause of death around the world, and early detection isdifficult. In the field of healthcare, where early diagnosis based on machine learning (ML) helps save patients’ livesfrom the risks of diseases, better-performing diagnostic procedures are crucial. ML models have been used to improvethe effectiveness of early diagnosis. In this paper, we proposed a new feature selection method that combines two filtermethods, Pearson correlation and mutual information (PC-MI), to analyse the correlation amongst features and thenselect important features before passing them to a classification model. Our method is capable of early breast cancerprediction and depends on a soft voting classifier that combines a certain set of ML models (decision tree, logisticregression and support vector machine) to produce one model that carries the strengths of the models that have beencombined, yielding the best prediction accuracy. Our work is evaluated by using the Wisconsin Diagnostic Breast Cancerdatasets. The proposed methodology outperforms previous work, achieving 99.3% accuracy, an F1 score of 0.9922, arecall of 0.9846, a precision of 1 and an AUC of 0.9923. Furthermore, the accuracy of 10-fold cross-validation is 98.2%.
Metrics