Translational bioinformatics

Last updated

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. [1] Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. [2] TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. [3] Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes. [4]

Contents

History

Translational bioinformatics is a relatively young field within translational research. [5] [6] Google trends indicate the use of "bioinformatics" has decreased since the mid-1990s when it was suggested as a transformative approach to biomedical research. [6] It was coined, however, close to ten years earlier. [7] TBI was then presented as means to facilitate data organization, accessibility and improved interpretation of the available biomedical research. [6] [8] It was considered a decision support tool that could integrate biomedical information into decision-making processes that otherwise would have been omitted due to the nature of human memory and thinking patterns. [8]

Initially, the focus of TBI was on ontology and vocabulary designs for searching the mass data stores. However, this attempt was largely unsuccessful as preliminary attempts for automation resulted in misinformation. TBI needed to develop a baseline for cross-referencing data with higher order algorithms in order to link data, structures and functions in networks. [6] This went hand in hand with a focus on developing curriculum for graduate level programs and capitalization for funding on the growing public acknowledgement of the potential opportunity in TBI. [6]

When the first draft of the human genome was completed in the early 2000s, TBI continued to grow and demonstrate prominence as a means to bridge biological findings with clinical informatics, impacting the opportunities for both industries of biology and healthcare. [9] Expression profiling, text mining for trends analysis, population-based data mining providing biomedical insights, and ontology development has been explored, defined and established as important contributions to TBI. [6] [10] Achievements of the field that have been used for knowledge discovery include linking clinical records to genomics data, linking drugs with ancestry, whole genome sequencing for a group with a common disease, and semantics in literature mining. [10] There has been discussion of cooperative efforts to create cross-jurisdictional strategies for TBI, particularly in Europe. The past decade has also seen the development of personalized medicine and data sharing in pharmacogenomics. These accomplishments have solidified public interest, generated funds for investment in training and further curriculum development, increased demand for skilled personnel in the field and pushed ongoing TBI research and development. [6]

Benefits and opportunities

At present, TBI research spans multiple disciplines; however, the application of TBI in clinical settings remains limited. Currently, it is partially deployed in drug development, regulatory review, and clinical medicine. [8] The opportunity for application of TBI is much broader as increasingly medical journals are mentioning the term "informatics" and discussing bioinformatics related topics. [2] TBI research draws on four main areas of discourse: clinical genomics, genomic medicine, pharmacogenomics, and genetic epidemiology. [9] There are increasing numbers of conferences and forums focused on TBI to create opportunities for knowledge sharing and field development. General topics that appear in recent conferences include: (1) personal genomics and genomic infrastructure, (2) drug and gene research for adverse events, interactions and repurposing of drugs, (3) biomarkers and phenotype representation, (4) sequencing, science and systems medicine, (5) computational and analytical methodologies for TBI, and (6) application of bridging genetic research and clinical practice. [8] [10] [11]

With the help of bioinformaticians, biologists are able to analyze complex data, set up websites for experimental measurements, facilitate sharing of the measurements, and correlate findings to clinical outcomes. [2] Translational bioinformaticians studying a particular disease would have more sample data regarding a given disease than an individual biologist studying the disease alone.

Since the completion of the human genome, new projects are now attempting to systematically analyze all the gene alterations in a disease like cancer rather than focusing on a few genes at a time. In the future, large-scale data will be integrated from different sources in order to extract functional information. The availability of a large number of human genomes will allow for statistical mining of their relation to lifestyles, drug interactions, and other factors. Translational bioinformatics is therefore transforming the search for disease genes and is becoming a crucial component of other areas of medical research including pharmacogenomics. [12]

In a study evaluating the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis of genomic medicine, cloud-based analysis had similar cost and performance in comparison to a local computational cluster. This suggests that cloud-computing technologies might be a valuable and economical technology for facilitating large-scale translational research in genomic medicine. [13]

Methodologies

Storage

Vast amounts of bioinformatical data are currently available and continue to increase. For instance, the GenBank database, funded by the National Institute of Health (NIH), currently holds 82 billion nucleotides in 78 million sequences coding for 270,000 species. The equivalent of GenBank for gene expression microarrays, known as the Gene Expression Omnibus (GEO), has over 183,000 samples from 7,200 experiments and this number doubles or triples each year. The European Bioinformatics Institute (EBI) has a similar database called ArrayExpress which has over 100,000 samples from over 3,000 experiments. All together, TBI has access to more than a quarter million microarray samples at present. [2]

To extract relevant data from large data sets, TBI employs various methods such as data consolidation, data federation, and data warehousing. In the data consolidation approach, data is extracted from various sources and centralized in a single database. This approach enables standardization of heterogeneous data and helps address issues in interoperability and compatibility among data sets. However, proponents of this method often encounter difficulties in updating their databases as it is based on a single data model. In contrast, the data federation approach links databases together and extracts data on a regular basis, then combines the data for queries. The benefit of this approach is that it enables the user to access real-time data on a single portal. However, the limitation of this is that data collected may not always be synchronized as it is derived from multiple sources. Data warehousing provides a single unified platform for data curation. Data warehousing ingrates data from multiple sources into a common format, and is typically used in bioscience exclusively for decision support purposes. [14]

Analytics

Analytic techniques serve to translate biological data using high-throughput techniques into clinically relevant information. Currently, numerous software and methodologies for querying data exist, and this number continues to grow as more studies are conducted and published in bioinformatics journals such as Genome Biology , BMC Bioinformatics , BMC Genomics, and Bioinformatics . To ascertain the best analytical technique, tools such as Weka have been created to cipher through the array of software's and select the most appropriate technique abstracting away the need to know a specific methodology. [15]

Integration

Data integration involves developing methods that use biological information for the clinical setting. Integrating data empowers clinician's with tools for data access, knowledge discovery, and decision support. Data integration serves to utilize the wealth of information available in bioinformatics to improve patient health and safety. An example of data integration is the use of decision support systems (DSS) based on translational bioinformatics. DSS used in this regard identify correlations in patient electronic medical records (EMR) and other clinical information systems to assist clinicians in their diagnoses. [14]

Cost

Companies are now able to provide whole human genome sequencing and analysis as a simple outsourced service. Second- and third-generation versions of sequencing systems are planned to increase the amount of genomes per day, per instrument, to 80. According to the CEO of Complete Genomics Cliff Reid, the total market for whole human genome sequencing around the world has increased five-fold during 2009 and 2010, and was estimated to be 15,000 genomes for 2011. Furthermore, if the price were to fall to $1,000 per genome, he maintained that the company would still be able to make a profit. The company is also working on process improvements to bring down the internal cost to around $100 per genome, excluding sample-prep and labor costs. [16] [17]

According to the National Human Genome Research Institute (NHGRI), the costs to sequence the entire genome has significantly decreased from over $95 million in 2001 to $7,666 in January 2012. Similarly, the cost of determining one megabase (a million bases) has also decreased from over $5,000 in 2001 to $0.09 in 2012. In 2008, sequencing centers transitioned from Sanger-based (dideoxy chain termination sequencing) to 'second generation' (or 'next-generation') DNA sequencing technologies. This caused a significant drop in sequencing costs. [18]

Future directions

TBI has the potential to play a significant role in medicine; however, many challenges still remain. The overarching goal for TBI is to "develop informatics approaches for linking across traditionally disparate data and knowledge sources enabling both the generation and testing of new hypotheses". [9] Current applications of TBI face challenges due to a lack of standards resulting in diverse data collection methodologies. Furthermore, analytic and storage capabilities are hindered due to large volumes of data present in current research. This problem is projected to increase with personal genomics as it will create an even greater accumulation of data. [6] [9]

Challenges also exist in the research of drugs and biomarkers, genomic medicine, protein design metagenomics, infectious disease discovery, data curation, literature mining, and workflow development. [6] Continued belief in the opportunity and benefits of TBI justifies further funding for infrastructure, intellectual property protection and accessibility policies. [6] [19]

Available funding for TBI in the past decade has increased. [2] The demand for translational bioinformatics research is in part due to the growth in numerous areas of bioinformatics and health informatics and in part due to the popular support of projects like the Human Genome Project. [7] [9] [20] This growth and influx of funding has enabled the industry to produce assets such as a repository of gene expression data and genomic scale data while also making progress towards the concept of creating a $1000 genome and completing the Human Genome Project. [9] [20] It is believed by some that TBI will cause a cultural shift in the way scientific and clinical information are processed within the pharmaceutical industry, regulatory agencies, and clinical practice. It is also seen as a means to shift clinical trial designs away from case studies and towards EMR analysis. [8]

Leaders in the field have presented numerous predictions with regards to the direction TBI is, and should take. A collection of predictions is as follows:

  1. Lesko (2012) states that strategy must occur in the European Union to bridge the gap between academic and industry in the following ways – directly quoted: [8]
    1. Validate and publish informatics data and technology models to accepted standards in order to facilitate adoption,
    2. Transform electronic health records to make them more accessible and interoperable,
    3. Encourage information sharing, engage regulatory agencies, and
    4. Encourage increasing financial support to grow and develop TBI
  2. Altman (2011), at the 2011 AMIA Summit on TBI, predicts that: [10]
    1. Cloud computing will contribute to major biomedical discovery.
    2. Informatics applications to stem cell science will increase
    3. Immune genomics will emerge as powerful data
    4. Flow cytometry informatics will grow
    5. Molecular and expression data will combine for drug repurposing
    6. Exome sequencing will persist longer than expected Progress in interpreting non-coding DNA variations
  3. Sarkar, Butte, Lussier, Tarczy-Hornoch and Ohno-Machado (2011) state that the future of TBI must establish a way to manage the large amount of available data and look to integrate findings from projects such as the eMERGE (Electronic Medical Records and Genomics) project funded by NIH, the Personal Genome Project, the Exome Project, the Million Veteran Program and the 1000 Genomes Project. [9]

"In an information-rich world, the wealth of information means a dearth of something else—a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that it might consume" (Herbert Simon, 1971).

Associations, conferences and journals

Below is a list of existing associations, conferences and journals that are specific to TBI. By no means is this an all-inclusive list, and should be developed as others are discovered.

Associations
Conferences *websites change yearly
Journals
Special Journal Issues on Translational Bioinformatics

Training and certification

A non-exhaustive list of training and certification programs specific to TBI are listed below.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is often referred to as computational biology, though the distinction between the two terms is often disputed.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Health informatics</span> Computational approaches to health care

Health informatics is the study and implementation of computer structures and algorithms to improve communication, understanding, and management of medical information. It can be viewed as a branch of engineering and applied science.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

<span class="mw-page-title-main">Comparative genomics</span> Field of biological research

Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.

<span class="mw-page-title-main">Personalized medicine</span> Medical model that tailors medical practices to the individual patient

Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease. The terms personalized medicine, precision medicine, stratified medicine and P4 medicine are used interchangeably to describe this concept, though some authors and organizations differentiate between these expressions based on particular nuances. P4 is short for "predictive, preventive, personalized and participatory".

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.

Jack Y. Yang is an American computer scientist and biophysicist. As of 2011, he is the editor-in-chief of the International Journal of Computational Biology and Drug Design.

<span class="mw-page-title-main">Yves A. Lussier</span>

Yves A. Lussier is a physician-scientist conducting research in Precision medicine, Translational bioinformatics and Personal Genomics. As a co-founder of Purkinje, he pioneered the commercial use of controlled medical vocabulary organized as directed semantic networks in electronic medical records, as well as Pen computing for clinicians.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

John Quackenbush is an American computational biologist and genome scientist. He is a professor of biostatistics and computational biology and a professor of cancer biology at the Dana–Farber Cancer Institute (DFCI), as well as the director of its Center for Cancer Computational Biology (CCCB). Quackenbush also holds an appointment as a professor of computational biology and bioinformatics in the Department of Biostatistics at the Harvard School of Public Health.

Dr Vinod Scaria FRSB, FRSPH is an Indian biologist, medical researcher pioneering in Precision Medicine and Clinical Genomics in India. He is best known for sequencing the first Indian genome. He was also instrumental in the sequencing of The first Sri Lankan Genome, analysis of the first Malaysian Genome sequencing and analysis of the Wild-type strain of Zebrafish and the IndiGen programme on Genomics for Public Health in India.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<span class="mw-page-title-main">Genome informatics</span>

Genome Informatics is a scientific study of information processing in genomes.

<span class="mw-page-title-main">Melissa Haendel</span> American bioinformaticist

Melissa Anne Haendel is an American bioinformaticist who is the Sarah Graham Kenan Distinguished Professor at the UNC School of Medicine. She is also the Director of Precision Health & Translational Informatics, deputy director of Computational Science at The North Carolina Translational and Clinical Sciences Institute. She serves as Director of the Center for Data to Health (CD2H). Her research makes use of data to improve the discovery and diagnosis of diseases. During the COVID-19 pandemic, Haendel joined with the National Institutes of Health to launch the National COVID Cohort Collaborative (N3C), which looks to identify the risk factors that can predict severity of disease outcome and help to identify treatments.

Personalized genomics is the human genetics-derived study of analyzing and interpreting individualized genetic information by genome sequencing to identify genetic variations compared to the library of known sequences. International genetics communities have spared no effort from the past and have gradually cooperated to prosecute research projects to determine DNA sequences of the human genome using DNA sequencing techniques. The methods that are the most commonly used are whole exome sequencing and whole genome sequencing. Both approaches are used to identify genetic variations. Genome sequencing became more cost-effective over time, and made it applicable in the medical field, allowing scientists to understand which genes are attributed to specific diseases.

<span class="mw-page-title-main">Nicholas Tatonetti</span> American bioscientist and academic

Nicholas Pierino Tatonetti is an American bioscientist who is Vice Chair of Operations in the Department of Computational Biomedicine and Associate Director of Computational Oncology in the Cancer Center at Cedars-Sinai Medical Center in Los Angeles, California.

Daniel Richard Masys is an American biotechnologist and academic. He is an Affiliate Professor of Biomedical and Health Informatics at the University of Washington.

References

  1. "Translational Bioinformatics". American Medical Informatics Association. Retrieved 24 September 2014.
  2. 1 2 3 4 5 Butte, A. J. (2008). "Translational bioinformatics: Coming of age". Journal of the American Medical Informatics Association. 15 (6): 709–714. doi:10.1197/jamia.M2824. PMC   2585538 . PMID   18755990.
  3. Geospiza. "Translational bioinformatics". Archived from the original on May 28, 2011. Retrieved March 23, 2011.
  4. "When Healthcare and Computer Science Collide". University of Illinois at Chicago. 2014. Retrieved 18 September 2014.
  5. "Colorado Clinical and Translational Sciences Institute (CCTSI)" . Retrieved November 16, 2012.
  6. 1 2 3 4 5 6 7 8 9 10 Ouzounis, C. A. (2012). "Rise and demise of bioinformatics? Promise and progress". PLOS Computational Biology. 8 (4): 1–5. Bibcode:2012PLSCB...8E2487O. doi: 10.1371/journal.pcbi.1002487 . PMC   3343106 . PMID   22570600.
  7. 1 2 Shah, N. H.; Jonquet, C.; Lussier, Y. A.; Tarzy-Hornoch, P.; Ohno-Machado, L. (2009). "Ontology-driven indexing of public datasets for translational bioinformatics". BMC Bioinformatics. 10 (2): S1. doi: 10.1186/1471-2105-10-S2-S1 . PMC   2646250 . PMID   19208184.
  8. 1 2 3 4 5 6 Lesko, L. J. (2012). "Drug research and translational bioinformatics". Clinical Pharmacology & Therapeutics. 91 (6): 960–962. doi:10.1038/clpt.2012.45. PMID   22609906. S2CID   26762976.
  9. 1 2 3 4 5 6 7 Sarkar, I. N.; Butte, A. J.; Lussier, Y. A.; Tarczy-Hornoch, P.; Ohno-Machado, L. (2011). "Translational bioinformatics: Linking knowledge across biological and clinical realms". J Am Med Inform Assoc. 18 (4): 345–357. doi:10.1136/amiajnl-2011-000245. PMC   3128415 . PMID   21561873.
  10. 1 2 3 4 Altman, R. B. (10 March 2011). "Translational bioinformatics: The year in review" . Retrieved November 16, 2012.
  11. Mendonca, E. A. (2010). "Selected proceedings of the 2010 summit on translational bioinformatics". BMC Bioinformatics. 11 (9): 1–4. doi: 10.1186/1471-2105-11-S9-S1 . PMC   2967739 . PMID   21044356.
  12. Kann, M. G. (2010). "Advances in translational bioinformatics: Computational approaches for the hunting of disease genes". Briefings in Bioinformatics. 11 (1): 96–110. doi:10.1093/bib/bbp048. PMC   2810112 . PMID   20007728.
  13. Dudley, J. T. (2010). "Translational bioinformatics in the cloud: An affordable alternative". Genome Medicine. 2 (8): 51. doi: 10.1186/gm172 . PMC   2945008 . PMID   20691073.
  14. 1 2 Yan, Q (2010). "Translational Bioinformatics and Systems Biology Approaches for Personalized Medicine". Systems Biology in Drug Discovery and Development. Methods in Molecular Biology. Vol. 662. pp. 167–178. doi:10.1007/978-1-60761-800-3_8. ISBN   978-1-60761-799-0. PMID   20824471.
  15. Butte, A. J. (2009). "Translational bioinformatics applications in genome medicine". Genome Med. 1 (6): 64. doi: 10.1186/gm64 . PMC   2703873 . PMID   19566916.
  16. Heger, M. "Complete genomics targets 2015 for new instruments with capacity of 80 genomes per day" . Retrieved November 1, 2012.
  17. "Complete genomics" . Retrieved November 1, 2012.
  18. Wetterstrand, K. A. "DNA sequencing costs: Data from the NHGRI Genome sequencing program (GSP)" . Retrieved November 3, 2012.
  19. Azuaje, F. J.; Heymann, M.; Ternes, A.; Wienecke-Baldacchino, A.; Struck, D.; Moes, D.; Schneider, R. (2012). "Bioinformatics as a driver, not a passenger, of translational biomedical research: Perspectives from the 6th Benelux bioinformatics conference" (PDF). Journal of Clinical Bioinformatics. 2 (7): 1–3. doi: 10.1186/2043-9113-2-7 . PMC   3323358 . PMID   22414553.
  20. 1 2 Butte, A. J.; Chen, R. (2006). "Finding disease-related genomic experiments within an international repository: First steps in translational bioinformatics". AMIA Annu Symp Proc. 2006: 106–110. PMC   1839582 . PMID   17238312.
  21. "AMIA - American Medical Informatics Association". amia.org.