Information system for assessing the informativeness of an epidemic process features

Authors

DOI:

https://doi.org/10.20535/SRIT.2308-8893.2023.4.08

Keywords:

information system, epidemic process, informativeness of features, Shannon method, Kullback–Leibler method

Abstract

The primary objective of this study is to assess the informativeness of various parameters influencing epidemic processes utilizing the Shannon and Kullback–Leibler methods. These methods were selected based on their foundation in the principles of information theory and their extensive application in machine learning, statistics, and other relevant domains. A comparative analysis was performed between the results acquired from both methods, and an information system was designed to facilitate the uploading of data samples and the calculation of factor informativeness impacting the epidemic processes. The findings revealed that certain features, such as “Chronic lung disease,” “Chronic kidney disease,” and “Weakened immunity,” did not carry significant information for further analysis and hindered the forecasting process, as per the data set examined. The developed information system efficiently supports the assessment of feature informativeness, thereby aiding in the comprehensive analysis of epidemic processes and enabling the visualization of the results. This study contributes to the current body of knowledge by providing specific examples of applying the described algorithmic models, comparing various methods and their outcomes, and developing a supportive tool for analyzing epidemic processes.

Author Biographies

Kseniia Bazilevych, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv

Candidate of Technical Sciences (Ph.D.), an associate professor at the Department of Mathematical Modeling and Artificial Intelligence of the National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine.

Olena Kyrylenko, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv

Student at the Department of Mathematical Modeling and Artificial Intelligence of the National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine.

Yurii Parfenyuk, V. N. Karazin Kharkiv National University, Kharkiv

Ph.D., a lecturer at the Department of Theoretical and Applied Computer Sciences of V. N. Karazin Kharkiv National University, Kharkiv, Ukraine.

Sergiy Yakovlev, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv

Doctor of Physical and Mathematical Sciences, a professor at the Department of Mathematical Modeling and Artificial Intelligence of the National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine.

Serhii Krivtsov, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv

Ph.D. student at the Department of Mathematical Modeling and Artificial Intelligence of the National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine.

Ievgen Meniailov, V. N. Karazin Kharkiv National University, Kharkiv

Acting head of the Department of Theoretical and Applied Computer Sciences of V. N. Karazin Kharkiv National University, Kharkiv, Ukraine.

Victoriya Kuznietcova, V. N. Karazin Kharkiv National University, Kharkiv

Candidate of Physical and Mathematical Sciences (Ph.D.), a senior lecturer at the Department of Higher Mathematics and Computer Sciences of V. N. Karazin Kharkiv National University, Kharkiv, Ukraine.

Dmytro Chumachenko, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv

Candidate of Technical Sciences (Ph.D.), an associate professor at the Department of Mathematical Modeling and Artificial Intelligence of the National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine.

References

K. Batko and A. Ślęzak, “The use of Big Data Analytics in healthcare,” Big Data, vol. 9, no. 1 (2022), https://doi.org/10.1186/s40537-021-00553-4.

I. Izonin, R. Tkachenko, I. Dronyuk, et al., “Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method,” Mathematical Biosciences and Engineering, vol. 18, no. 3, pp. 2599–2613 (2021), https://doi.org/10.3934/mbe.2021132.

S.Y. Lee, B. Lei, and B. Mallick, “Estimation of COVID-19 spread curves integrating global data and borrowing information,” PLOS ONE, vol. 15, no. 7, 0236860 (2020), https://doi.org/10.1371/journal.pone.0236860.

S. Ma, Y. Sun, and S. Yang, “Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review,” Analytics, vol. 1, no. 2, pp. 210–227 (2022), https://doi.org/10.3390/analytics1020014.

A. Ibrahim, U. W. Humphries, A. Khan, et al., “COVID-19 Model with High- and Low-Risk Susceptible Population Incorporating the Effect of Vaccines,” Vaccines, vol. 11, no. 1 (2022), https://doi.org/10.3390/vaccines11010003.

N. Davidich, I. Chumachenko, Y. Davidich, et al., “Advanced Traveller Information Systems to Optimizing Freight Driver Route Selection,” 2020 13th International Conference on Developments in eSystems Engineering (DeSE) (2020), https://doi.org/10.1109/dese51703.2020.9450763.

S. Fedushko and T. Ustyianovych, “E-Commerce Customers Behavior Research Using Cohort Analysis: A Case Study of COVID-19,” Journal of Open Innovation: Technology, Market, and Complexity, vol. 8, no. 1, pp. 1-12 (2022), https://doi.org/10.3390/joitmc8010012.

P.S. Knopov, O.S. Samosonok, and G.D. Bila, “A Model of Infectious Disease Spread with Hidden Carriers,” Cybernetics and Systems Analysis, vol. 57, no. 4, pp. 647–655 (2021), https://doi.org/10.1007/s10559-021-00390-6.

D.A. Klyushin, “Effective algorithms for solving statistical problems posed by COVID-19 pandemic,” Elsevier eBooks, pp. 21–44 (2023), https://doi.org/10.1016/b978-0-323-90531-2.00005-9.

I. Krak, H. Kudin, V. Kasianiuk, et al., “Hyperplane Clustering of the Data in the Vector Space of Features Based on Pseudo Inversion Tools,” CEUR Workshop Proceesings, vol. 3003, pp. 98–105 (2021), https://ceur-ws.org/Vol-3003/short4.pdf

O. Filchakova, D. Dossym, A. Ilyas, et al., “Review of COVID-19 testing and diagnostic methods,” Talanta, vol. 244, 123409 (2022), https://doi.org/10.1016/j.talanta.2022.123409.

S. Patil, H. Lu, C. L. Saunders, et al., “Public preferences for electronic health data storage, access, and sharing — evidence from a pan-European survey,” Journal of the American Medical Informatics Association, vol. 23, no. 6, pp. 1096–1106 (2016), https://doi.org/10.1093/jamia/ocw012.

V. Berisha, C. Krantsevich, P. R. Hahn, et al., “Digital medicine and the curse of dimensionality,” npj Digital Medicine, vol. 4, no. 1 (2021) https://doi.org/10.1038/s41746-021-00521-5.

K. Bazilevych, S. Krivtsov, and M. Butkevych, “Intelligent Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using Shannon Method,” CEUR Workshop Proceedings, vol. 3003, pp. 65–75 (2021).

I. Meniailov and H. Padalko, “Application of Multidimensional Scaling Model for Hepatitis C Data Dimensionality Reduction,” CEUR Workshop Proceedings, vol. 3348, pp. 34–43 (2022).

K. O. Bazilevych, D. I. Chumachenko, L. F. Hulianytskyi, et al., “Intelligent Decision-Support System for Epidemiological Diagnostics. I. A Concept of Architecture Design,” Cybernetics and Systems Analysis, vol. 58, no. 3, pp. 343–353 (2022), https://doi.org/10.1007/s10559-022-00466-x.

K.O. Bazilevych, D.I. Chumachenko, L.F. Hulianytskyi, et al., Intelligent Decision-Support System for Epidemiological Diagnostics. II. Information Technologies Development,” Cybernetics and Systems Analysis, vol. 58, no. 4, pp. 499–509 (2022). https://doi.org/10.1007/s10559-022-00484-9

D. Panda, R. Ray, and Satya Ranjan Dash, “Feature Selection: Role in Designing Smart Healthcare Models,” Intelligent systems reference library, vol. 178, pp. 143–162, (2020), https://doi.org/10.1007/978-3-030-37551-5_9.

D. Geiszler, D. A. Polasky, F. Yu, and A. I. Nesvizhskii, “Detecting diagnostic features in MS/MS spectra of post-translationally modified peptides,” Nature Communications, vol. 14, no. 1 (2023), https://doi.org/10.1038/s41467-023-39828-0.

D.E. Ehrmann, S. Joshi, S.D. Goodfellow, et al., “Making machine learning matter to clinicians: model actionability in medical decision-making,” npj Digital Medicine, vol. 6, no. 1 (2023), https://doi.org/10.1038/s41746-023-00753-7.

O. Cliff, M. Prokopenko, and R. Fitch, “Minimising the Kullback–Leibler Divergence for Model Selection in Distributed Nonlinear Systems,” Entropy, vol. 20, no. 2, p. 51 (2018), doi: https://doi.org/10.3390/e20020051.

X. Wang, W. Hou, H. Zhang, et al., “KDE-OCSVM model using Kullback–Leibler divergence to detect anomalies in medical claims,” Expert Systems with Applications, vol. 200, 117056 (2022), doi: https://doi.org/10.1016/j.eswa.2022.117056.

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, et al., “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction,” Frontiers in Bioinformatics, vol. 2 (2022), https://doi.org/10.3389/fbinf.2022.927312.

J. Li, K. Cheng, S. Wang, et al., “Feature Selection,” ACM Computing Surveys, vol. 50, no.6, pp. 1–45 (2018), https://doi.org/10.1145/3136625.

F. Jalali-najafabadi, M. Stadler, N. Dand, et al., “Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models,” Scientific Reports, vol. 11, no. 1 (2021), https://doi.org/10.1038/s41598-021-00854-x.

A. D. Al-Nasser, A. Rawashdeh, and A. Talal, “On using Shannon entropy measure for formulating new weighted exponential distribution,” Journal of Taibah University for Science, vol. 16, no. 1, pp. 1035–1047 (2022), https://doi.org/10.1080/16583655.2022.2135806.

“Scikit-learn: machine learning in Python,” Scikit-learn.org (2019), https://scikit-learn.org/stable/

“COVID-19 Dataset,” www.kaggle.com (2022), https://www.kaggle.com/datasets/meirnizri/covid19-dataset

Downloads

Published

2023-12-26

Issue

Section

Problem- and function-oriented computer systems and networks