Prediction of tuberculosis cases based on sociodemographic and environmental factors in gombak, Selangor, Malaysia: A comparative assessment of multiple linear regression and artificial neural network models

Int J Mycobacteriol. 2021 Oct-Dec;10(4):442-456. doi: 10.4103/ijmy.ijmy_182_21.


BACKGROUND: Early prediction of tuberculosis (TB) cases is very crucial for its prevention and control. This study aims to predict the number of TB cases in Gombak based on sociodemographic and environmental factors.

METHODS: The sociodemographic data of 3325 TB cases from January 2013 to December 2017 in Gombak district were collected from the MyTB web and TB Information System database. Environmental data were obtained from the Department of Environment, Malaysia; Department of Irrigation and Drainage, Malaysia; and Malaysian Metrological Department from July 2012 to December 2017. Multiple linear regression (MLR) and artificial neural network (ANN) were used to develop the prediction model of TB cases. The models that used sociodemographic variables as the input datasets were referred as MLR1 and ANN1, whereas environmental variables were represented as MLR2 and ANN2 and both sociodemographic and environmental variables together were indicated as MLR3 and ANN3.

RESULTS: The ANN was found to be superior to MLR with higher adjusted coefficient of determination (R2) values in predicting TB cases; the ranges were from 0.35 to 0.47 compared to 0.07 to 0.14, respectively. The best TB prediction model, that is, ANN3 was derived from nationality, residency, income status, CO, NO2, SO2, PM10, rainfall, temperature, and atmospheric pressure, with the highest adjusted R2 value of 0.47, errors below 6, and accuracies above 96%.

CONCLUSIONS: It is envisaged that the application of the ANN algorithm based on both sociodemographic and environmental factors may enable a more accurate modeling for predicting TB cases.

PMID:34916466 | DOI:10.4103/ijmy.ijmy_182_21

Full Text Link: Read More

Generated by Feedzy