Diabetes mellitus type 2 is a chronic disease which poses a serious challenge to human health worldwide. Globally, about 8.3% of the population is diagnosed with the disease. The applications of predictive analytics in diagnosis of diabetes are gaining significant momentum in medical research. The aim of this research paper is to aid medical professionals in the early detection and efficient diagnosis of Type 2 diabetes. We utilize bioinformatics theory and supervised machine learning techniques for improving the accuracy in predicting diabetes, based on 8 clinical measurements existing in the widely used PIMA dataset. We outline our methodology and highlight the implementation steps, while reviewing prominent past work in the field. Moreover, this paper fully exploits known machine learning algorithms and provides a detailed comparison of the results obtained from each method. The gradient boosting algorithm with parameter tuning proves to be the most successful, having an F1 Score of 0.853 and out of sample accuracy of 89.94%. Our prediction model focuses on computing the probability of the onset of diabetes in an individual based on their clinical data. The most crucial results of using this research within the healthcare sector are its cost-effectiveness and yielding of instant diagnosis. With this work, we intend to improve the process of diagnosing Type 2 diabetes and inspire other researchers to use machine learning based techniques for further inquiry into diabetes prediction.
Published Paper | Thesis Report | Code | Web Link