Disease informatics

Last updated

Disease Informatics (also infectious disease informatics) studies the knowledge production, sharing, modeling, and management of infectious diseases. [1] It became a more studied field as a by-product of the rapid increases in the amount of biomedical and clinical data widely available, and to meet the demands for useful data analyses of such data. [1]

Contents

Considering infectious diseases contribute to millions of deaths every year, the ability to identify and understand disease diffusion is crucial for society to apply control and prevention measures. [2] The knowledge gained by researchers in the field of disease informatics can be used to aid policymakers' decisions on issues such as spreading public awareness, updating the training of health professionals, and buying vaccines. [2]

Aside from aiding in policymakers' decisions, the goals of disease informatics also include increased identification of biomarkers for transmissibility, improved vaccine design, and a deeper understanding of host-pathogen interactions, and the optimization of antimicrobial development. [1]

Methods

Artificial intelligence

The use of artificial intelligence (AI) tools, such as machine learning and natural language processing (NLP), in disease informatics increase efficiency by automating and speeding up several data analysis processes. Advances with AI and increased accessibility of data aid in predictive modeling and public health surveillance. AI uses predictive modeling to examine vast data sets and forecast future outcomes to increase the ability to predict disease outbreaks and help guide public health treatments. [3] AI also provides a valuable avenue by combining its ability of spatial modeling with geographic information system (GIS) data to uncover geographical patterns (for example disease clusters) to support data-driven decision-making for local-level predictions of disease diffusion. [3] As the growth of AI continues, more advances for its use in disease informatics are expected to come.

Machine learning

Machine learning (ML) techniques aid the study of disease informatics with its capability to spatially and temporally predict the progression and transmission of infectious diseases. [2] In disease informatics, ML algorithms are used to analyze extensive amounts of complex data sets to identify patterns across varying types of data such as demographics, electronic health records, environmental conditions, etc. [2] The types of ML techniques commonly used are decision trees (decision tree model), random forests, support vector machines (support vector machine), and deep learning networks (deep learning). [2] Using these tools, researchers can apply them to data sets (for example genomic data, social media posts, and health records) to make predictions about the potential sources of an outbreak, the likelihood of an individual contracting a certain disease, and forecasting the number of cases of a disease in a given region. [2] ML models have proven to be just as accurate as traditional statistical methods (especially when multiple ML models are used concurrently) when it comes to predicting the spread and onset of diseases, according to numerous studies. [2]

Text mining

The use of text mining has become a beneficial avenue for querying large amounts of data to aid in gene mapping and the analysis of genomes. [1] This tool provides the ability to query medical databases for processes such as genomic mapping, by integrating the genomic and proteomic data to map the genes and highlight their interrelationships with various diseases. [1] Retrieving data of targeted sequences can be done in two ways, through a similarity search or by keyword search. A similarity search (using software like BLAST (biotechnology) is performed by entering a known sequence as a query sequence to search for sequences that have similarities. A keyword search (public tools include SRS, Entrez, and ACNUC) uses annotations that define the features of genes, such as sequence positions, to retrieve the desired gene sequences being searched for. [1]

Syndromic Surveillance

Through a process called syndromic surveillance (related to public health surveillance) data analysis methods can be successfully used to predict potential disease outbreaks by detecting timely, pre-diagnosis health indicators. [4] Syndromic surveillance combines demographic data (age, gender, ethnicity, etc.) with patient visit data (admission status, chief complaint, type of office visit, etc.) that can be put through natural language processes to highlight potential predictors of an outbreak. [4] Due to the time-sensitivity in predicting possible outbreaks, the use of chief complaint data is valuable as it is available much more quickly than formal diagnosis data from physicians' offices. [4] The key to successfully harnessing surveillance data for disease informatics is to use more than one source. Other important sources that are commonly used synchronically include the following: [4]

Limitations and future prospects

Accessibility concerns

The accuracy of these AI tools and techniques relies upon providing them with high-quality, comprehensive data. Accessibility and collection of such data is still an ongoing challenge because most of the data pulled is incomplete, noisy, and contains human errors (i.e. grammar, abbreviations, spelling) which means the data must undergo a thorough cleaning (data cleansing) before it is eligible to be used. [2] [4]

The data collected will also come from numerous sources (due to differences in data availability and governance) that use varying formatting and software, creating an issue of needing some form of standardized infrastructure to better integrate and manage data. [3] The formation of a standardized taxonomy for data analysis and predictive modeling would facilitate research collaboration, accelerate decisions, and help select the right predictive models to be used. [3]

One method being used is federated learning, which allows the AI to be trained across multiple different centers without the need for sharing raw data, keeping the data safe within its source. [3] However, the same issues of different formatting and software to ensure model convergence still affect this approach as well, so algorithmic improvements are needed.

Another concern is the potential for bias and overfitting of the predictive models, which could lead to inaccurate predictions. [2] Human error can still persist even using these tools to automate tasks, due to the fact that if the AI tools are trained incorrectly, they will produce inaccurate data. A relevant study suggests that implementing AI with wearable devices and other emerging technology in the future would benefit some of the challenges by providing real-time data for the models to use, which could lead to increased accuracy of the data in its raw form, creating less need to spend time cleaning the data, and allowing the models to make more accurate predictions. [3]

Ethical concerns

A critical concern for using AI and predictive modeling in disease informatics is data security and privacy. The data sources being used (electronic health records, demographics, etc.) contain highly sensitive information that must be protected for all parties involved. Any models or techniques being used need to be in compliance with local governmental regulations and laws such as HIPAA in the United States. The data used must also undergo rigorous data anonymization and de-identification protocols to protect patient privacy. [3]

Through the further use and growth of explainable AI, also referred to as XAI, (explainable artificial intelligence) researchers and all parties involved can ensure transparency and accountability when it comes to using data analysis and computational methods in the field of disease informatics. XAI provides explanations of how the algorithms being used work, why they were chosen, what knowledge they produce, and so on. [3]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.

Public health surveillance is, according to the World Health Organization (WHO), "the continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice." Public health surveillance may be used to track emerging health-related issues at an early stage and find active solutions in a timely manner. Surveillance systems are generally called upon to provide information regarding when and where health problems are occurring and who is affected.

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications. As such, it encompasses a variety of statistical techniques from predictive modeling and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. It represents a major subset of machine learning applications; in some contexts, it is synonymous with machine learning.

Public health informatics has been defined as the systematic application of information and computer science and technology to public health practice, research, and learning. It is one of the subdomains of health informatics.

<span class="mw-page-title-main">Disease surveillance</span> Monitoring spread of disease to establish patterns of progression

Disease surveillance is an epidemiological practice by which the spread of disease is monitored in order to establish patterns of progression. The main role of disease surveillance is to predict, observe, and minimize the harm caused by outbreak, epidemic, and pandemic situations, as well as increase knowledge about which factors contribute to such circumstances. A key part of modern disease surveillance is the practice of disease case reporting.

Real-time outbreak and disease surveillance system (RODS) is a syndromic surveillance system developed by the University of Pittsburgh, Department of Biomedical Informatics. It is "prototype developed at the University of Pittsburgh where real-time clinical data from emergency departments within a geographic region can be integrated to provide an instantaneous picture of symptom patterns and early detection of epidemic events."

The Influenza Research Database (IRD) is an integrative and comprehensive publicly available database and analysis resource to search, analyze, visualize, save and share data for influenza virus research. IRD is one of the five Bioinformatics Resource Centers (BRC) funded by the National Institute of Allergy and Infectious Diseases (NIAID), a component of the National Institutes of Health (NIH), which is an agency of the United States Department of Health and Human Services.

Infoveillance is a type of syndromic surveillance that specifically utilizes information found online. The term, along with the term infodemiology, was coined by Gunther Eysenbach to describe research that uses online information to gather information about human behavior.

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Genomic and medical data refers to an area within genetics that concerns the recording, sequencing and analysis of an organism's genome.

<span class="mw-page-title-main">Artificial intelligence in healthcare</span> Overview of the use of artificial intelligence in healthcare

Artificial intelligence in healthcare is a term used to describe the use of machine-learning algorithms and software, or artificial intelligence (AI), to copy human cognition in the analysis, presentation, and understanding of complex medical and health care data, or to exceed human capabilities by providing new ways to diagnose, treat, or prevent disease. Specifically, AI is the ability of computer algorithms to approximate conclusions based solely on input data.

<span class="mw-page-title-main">Machine learning in bioinformatics</span>

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

<span class="mw-page-title-main">Explainable artificial intelligence</span> AI in which the results of the solution can be understood by humans

Explainable AI (XAI), often overlapping with Interpretable AI, or Explainable Machine Learning (XML), either refers to an AI system over which it is possible for humans to retain intellectual oversight, or to the methods to achieve this. The main focus is usually on the reasoning behind the decisions or predictions made by the AI which are made more understandable and transparent. XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

Participatory surveillance is community-based monitoring of other individuals. This term can be applied to both digital media studies and ecological field studies. In the realm of media studies, it refers to how users surveil each other using the internet. Either through the use of social media, search engines, and other web-based methods of tracking, an individual has the power to find information both freely or non freely given about the individual being searched. Issues of privacy emerge within this sphere of participatory surveillance, predominantly focused on how much information is available on the web that an individual does not consent to. More so, disease outbreak researchers can study social-media based patterns to decrease the time it takes to detect an outbreak, an emerging field of study called infodemiology. Within the realm of ecological fieldwork, participatory surveillance is used as an overarching term for the method in which indigenous and rural communities are used to gain greater accessibility to causes of disease outbreak. By using these communities, disease outbreak can be spotted earlier than through traditional means or healthcare institutions.

<span class="mw-page-title-main">Merative</span> U.S. healthcare company

Merative L.P., formerly IBM Watson Health, is an American medical technology company that provides products and services that help clients facilitate medical research, clinical research, real world evidence, and healthcare services, through the use of artificial intelligence, data analytics, cloud computing, and other advanced information technology. Merative is owned by Francisco Partners, an American private equity firm headquartered in San Francisco, California. In 2022, IBM divested and spun-off their Watson Health division into Merative. As of 2023, it remains a standalone company headquartered in Ann Arbor with innovation centers in Hyderabad, Bengaluru, and Chennai.

Automated decision-making (ADM) involves the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM involves large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences.

Acoustic epidemiology refers to the study of the determinants and distribution of disease. It also refers to the analysis of sounds produced by the body through a single tool or a combination of diagnostic tools.

References

  1. 1 2 3 4 5 6 Sintchenko, Vitali, ed. (2010). "Infectious Disease Informatics". SpringerLink. doi:10.1007/978-1-4419-1327-2. ISBN   978-1-4419-1326-5.
  2. 1 2 3 4 5 6 7 8 9 Santangelo, Omar Enzo; Gentile, Vito; Pizzo, Stefano; Giordano, Domiziana; Cedrone, Fabrizio (2023-02-01). "Machine Learning and Prediction of Infectious Diseases: A Systematic Review". Machine Learning and Knowledge Extraction. 5 (1): 175–198. doi: 10.3390/make5010013 . ISSN   2504-4990.
  3. 1 2 3 4 5 6 7 8 Olawade, David B.; Wada, Ojima J.; David-Olawade, Aanuoluwapo Clement; Kunonga, Edward; Abaire, Olawale; Ling, Jonathan (2023-10-26). "Using artificial intelligence to improve public health: a narrative review". Frontiers in Public Health. 11. doi: 10.3389/fpubh.2023.1196397 . ISSN   2296-2565. PMC   10637620 . PMID   37954052.
  4. 1 2 3 4 5 Chen, Hsinchun; Zeng, Daniel; Yan, Ping (2010). Infectious disease informatics: syndromic surveillance for public health and biodefense. Integrated series in information systems. New York: Springer. ISBN   978-1-4419-1278-7.