PrecisionFDA

Last updated
precisionFDA
Developer(s)
Programming language(s) Ruby, TypeScript
StatusActive
License CC0 [1]

PrecisionFDA (stylized precisionFDA) [2] [3] is a secure, collaborative, high-performance computing platform that has established a growing community of experts around the analysis of biological datasets in order to advance precision medicine, inform regulatory science, and enable improvements in health outcomes. This cloud-based platform is developed and served by the United States Food and Drug Administration (FDA). [4] [5] PrecisionFDA connects experts, citizen scientists, and scholars from around the world and provides them with a library of computational tools, workflow features, and reference data. The platform allows researchers to upload and compare data against reference genomes, and execute bioinformatic pipelines. [6] The variant call file (VCF) comparator tool also enables users to compare their genetic test results to reference genomes. [7] The platform's code is open source and available on GitHub. [8] The platform also features a crowdsourcing model to sponsor community challenges in order to stimulate the development of innovative analytics that inform precision medicine and regulatory science. Community members from around the world come together to participate in scientific challenges, solving problems that demonstrate the effectiveness of their tools, testing the capabilities of the platform, sharing their results, and engaging the community in discussions. Globally, precisionFDA has more than 5,000 users.

Contents

The precisionFDA team collaborates with multiple FDA Centers, the National Institutes of Health, and other government agencies to support the vision and intent of the American Innovation & Competitiveness Act and the 21st Century Cures Act.

History

President Barack Obama announced the formation of the Precision Medicine Initiative during the State of the Union Address in January 2015. [6] [9] In August 2015, the FDA announced the launch of precisionFDA as a part of the initiative. [10] [11] In November 2015, the FDA launched a "closed beta" version of the platform, giving select groups and individuals access to the platform. [6] [9] An open beta version of the platform was released in December 2015. [12] [13] In February 2016, the FDA announced the first precisionFDA challenge, the Consistency Challenge, which tasked users with testing the reliability and reproducibility of gene mapping and variant calling tools. [14] [15] The Truth Challenge followed the Consistency Challenge and asked participants to assess the accuracy of bioinformatics tools for identifying genetic variants. The Hidden Treasures – Warm Up challenge evaluated variant calling pipelines on a targeted set of in silico injected variants. The CFSAN Pathogen Detection Challenge evaluated bioinformatics pipelines for accurate and rapid detection of foodborne pathogens in metagenomics samples. The CDRH ID-NGS Diagnostics Biothreat Challenge addressed the issue of early detection during pathogen outbreaks by evaluating algorithms for identifying and quantifying emerging pathogens, such as the Ebola virus, from their genomic fingerprints. Subsequent challenges expanded beyond genomics into multi-omics and other data types. The NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge addressed the issue of sample mislabeling, which contributes to irreproducible research results and invalid conclusions, by evaluating algorithms for accurate detection and correction of mislabeled samples using multi-omics to enable Rigor and Reproducibility in biomedical research. [16] The Brain Cancer Predictive Modeling and Biomarker Discovery Challenge, run in collaboration with Georgetown University, asked participants to develop machine learning (ML) and artificial intelligence (AI) models to identify biomarkers and predict brain cancer patient outcomes using gene expression, DNA copy number, and clinical data. The Gaining New Insights by Detecting Adverse Event Anomalies Using FDA Open Data Challenge engaged data scientists to use unsupervised ML and AI techniques to identify anomalies in FDA adverse events, regulated product substances, and clinical trials data, essential for improving the mission of FDA. The Truth Challenge V2 assessed variant calling pipeline performance in difficult-to-map regions, segmental duplications, and Major Histocompatibility Complex (HMC) using Genome in a Bottle human genome benchmarks. The COVID-19 Risk Factor Modeling Challenge, in collaboration with the Veterans Health Administration, called upon the scientific and analytics community to develop and evaluate computational models to predict COVID-19 related health outcomes in Veterans. In total, ten community challenges have been completed on precisionFDA, which have generated a total of 562 responses from 240 participants. PrecisionFDA challenges have led to meaningful regulatory science advancements, including published best practices for benchmarking germline small-variant calls in human genomes. [17] In addition, the challenges have incentivized the development and benchmarking of novel computational pipelines, including a pipeline that uses deep neural networks to identify genetic variants. [18]

In addition to challenges, in-person and virtual app-a-thon events, which promote the development and sharing of apps and tools, are hosted on precisionFDA. In August 2016, precisionFDA launched App-a-Thon in a Box, which aimed to encourage the creation and sharing of Next Generation Sequencing (NGS) apps and executable Linux command wrappers. The most recent app-a-thon, the BioCompute Object App-a-thon, sought to improve the reproducibility of bioinformatics pipelines. Participants were asked to create BioCompute Objects (BCOs), a standardized schema for reporting computational scientific workflows, and apps to develop BCOs and check their conformance to BioCompute Specifications.

In April 2016, precisionFDA was awarded the top prize in the Informatics category at the Bio IT World Best Practices Awards. [19] In 2018, the DNAnexus platform, which is leveraged by precisionFDA, was granted Authority to Operate (ATO) by Health and Human Services (HHS) for FedRAMP Moderate. [20] In addition, the precisionFDA team received an FDA Commissioner’s Special Citation Award in 2019 for outstanding achievements and collaboration in the development of the precisionFDA platform promoting innovative regulatory science research to modernize the regulation of NGS-based genomic tests. In 2019, precisionFDA received a FedHealthIT Innovation Award and transitioned from a beta to a production release state.

Functionality

PrecisionFDA is an open-source, cloud-based platform for collaborating and testing bioinformatics pipelines and multi-omics data. [4] [5] PrecisionFDA is available to all innovators in the field of multi-omics, including members of the scientific community, diagnostic test providers, pharmaceutical and biotechnology companies, and other constituencies such as advocacy groups and patients. The platform allows researchers to upload and analyze data from both their own and other groups’ studies. [8] [9] [21] The platform hosts files such as reference genomes and genomic data, comparisons (quantification of similarities between sets of genomic variants), and apps (bioinformatics pipelines) that scientists and researchers can upload and work with. [6] [22] The precisionFDA virtual lab environment provides users with their own secure private area to conduct their research, and with configurable shared spaces where the FDA and external parties can share data and tools. For challenge sponsors, the precisionFDA platform provides a comprehensive challenge development framework enabling presentation of challenge assets, grading of submissions, and publication of results. To get involved, visit precision.fda.gov and request access to become a member of a growing community that is informing the evolution of precision medicine, advancing regulatory science, and enabling improvements in health outcomes.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">Omics</span> Suffix in biology

The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.

<span class="mw-page-title-main">Pharmacogenomics</span> Study of the role of the genome in drug response

Pharmacogenomics is the study of the role of the genome in drug response. Its name reflects its combining of pharmacology and genomics. Pharmacogenomics analyzes how the genetic makeup of a patient affects their response to drugs. It deals with the influence of acquired and inherited genetic variation on drug response, by correlating DNA mutations with pharmacokinetic, pharmacodynamic, and/or immunogenic endpoints.

<span class="mw-page-title-main">Personalized medicine</span> Medical model that tailors medical practices to the individual patient

Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease. The terms personalized medicine, precision medicine, stratified medicine and P4 medicine are used interchangeably to describe this concept though some authors and organisations use these expressions separately to indicate particular nuances.

<span class="mw-page-title-main">CLC bio</span>

CLC bio was a bioinformatics software company that developed a software suite subsequently purchased by QIAGEN.

Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.

QIAGEN Silicon Valley is a company based in Redwood City, California, USA, that develops software to analyze complex biological systems. QIAGEN Silicon Valley's first product, IPA, was introduced in 2003, and is used to help researchers analyze omics data and model biological systems. The software has been cited in thousands of scientific molecular biology publications and is one of several tools for systems biology researchers and bioinformaticians in drug discovery and institutional research.

<span class="mw-page-title-main">John Quackenbush</span> American bioinformatician

John Quackenbush is an American computational biologist and genome scientist. He is a professor of biostatistics and computational biology and a professor of cancer biology at the Dana–Farber Cancer Institute (DFCI), as well as the director of its Center for Cancer Computational Biology (CCCB). Quackenbush also holds an appointment as a professor of computational biology and bioinformatics in the Department of Biostatistics at the Harvard School of Public Health.

High-throughput sequencing technologies have led to a dramatic decline of genome sequencing costs and to an astonishingly rapid accumulation of genomic data. These technologies are enabling ambitious genome sequencing endeavours, such as the 1000 Genomes Project and 1001 Genomes Project. The storage and transfer of the tremendous amount of genomic data have become a mainstream problem, motivating the development of high-performance compression tools designed specifically for genomic data. A recent surge of interest in the development of novel algorithms and tools for storing and managing genomic re-sequencing data emphasizes the growing demand for efficient methods for genomic data compression.

Tute Genomics is a genomics startup that provides a cloud-based web application for rapid and accurate annotation of human genomic data. Built on the expertise of ANNOVAR, Tute assists researchers in identifying disease genes and biomarkers, and assists clinicians/labs in performing genetic diagnosis. Based in Provo, Utah, Tute was co-founded by Dr. Kai Wang, an Assistant Professor at the University of Southern California (USC); and Dr. Reid J. Robison, a board-certified psychiatrist with fellowship training in both neurodevelopmental genetics and bioinformatics.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

<span class="mw-page-title-main">Multiomics</span>

Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data sets are multiple "omes", such as the genome, proteome, transcriptome, epigenome, metabolome, and microbiome ; in other words, the use of multiple omics technologies to study life in a concerted way. By combining these "omes", scientists can analyze complex biological big data to find novel associations between biological entities, pinpoint relevant biomarkers and build elaborate markers of disease and physiology. In doing so, multiomics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association. The OmicTools service lists more than 99 softwares related to multiomic data analysis, as well as more than 99 databases on the topic.

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.

The Global Alliance for Genomics and Health (GA4GH) is an international consortium that is developing standards for responsibly collecting, storing, analyzing, and sharing genomic data in order to enable an "internet of genomics". GA4GH was founded in 2013.

Sophia Genetics is a data-driven medicine software company with headquarters in Lausanne, Switzerland and Boston, Massachusetts. It provides genomic and radiomic analysis for hospitals, laboratories, and biopharma institutions. The company was ranked among the 50 smartest companies by the MIT Technology Review in 2017. The company went public on the Nasdaq in 2021, floating at $1.1B.

<span class="mw-page-title-main">Manuel Corpas (scientist)</span> British bioinformatics researcher

Manuel Corpas is an Anglo-Spanish biologist and entrepreneur known primarily for his contributions to the field of Bioinformatics and Genomics. Currently Corpas is Chief Scientist of Cambridge startup Cambridge Precision Medicine, a tutor at the Institute for Continuing Education at the University of Cambridge and a lecturer at the Universidad Internacional de La Rioja. Manuel worked on the human genome from the beginning of his career, being one of the first consumers to sequence and his own genome and that of close relatives, which he published as the Corpasome. He has held positions at the Earlham Institute as Project Leader, and the Wellcome Sanger Institute, developing the DECIPHER database, a database that aids in the diagnosis of patients with rare genomic disorders.

<span class="mw-page-title-main">Biological data</span>

Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly complex when compared with other forms of data. There are many forms of biological data, including text, sequence data, protein structure, genomic data and amino acids, and links among others.

Precision diagnostics is a branch of precision medicine that involves precisely managing a patient's healthcare model and diagnosing specific diseases based on customized omics data analytics.

References

  1. "PrecisionFDA". GitHub . 11 October 2021.
  2. Blair Wyckoff, Whitney (11 February 2016). "Five tech projects buried in the budget". FedScoop. Retrieved 15 April 2016.
  3. Kern, Christine (1 January 2016). "precisionFDA Launched By FDA". Health IT Outcomes. Retrieved 15 April 2016.
  4. 1 2 Taylor, Nick Paul (16 November 2015). "FDA starts beta-testing 'the most advanced bioinformatics platform in the world'". FierceBiotechIT. Retrieved 15 April 2016.
  5. 1 2 Taylor, Nick Paul (26 February 2016). "FDA challenges industry to improve reproducibility and accuracy of informatics pipelines". FierceBiotechIT. Retrieved 15 April 2016.
  6. 1 2 3 4 Hall, Susan D. (16 November 2015). "PrecisionFDA cloud platform unveiled". FierceHealthIT. Retrieved 15 April 2016.
  7. Allen, Arthur (13 November 2015). "Our microscope on teladoc, King of telemedicine". Politico . Retrieved 15 April 2016.
  8. 1 2 Downing Peck, Andrea (28 March 2016). "FDA's New Next-Generation DNA Sequencing Platform Intended to Increase Collaboration among Scientists, Pathologists, and Clinical Laboratory Experts". Dark Daily. Retrieved 15 April 2016.
  9. 1 2 3 Blair Wyckoff, Whitney (13 November 2015). "FDA debuts 'closed beta' of precisionFDA platform". FedScoop. Retrieved 15 April 2016.
  10. Kass-Hout, Taha A.; Litwack, David (5 August 2015). "Advancing precision medicine by enabling a collaborative informatics community". FDA. Retrieved 25 April 2016.
  11. Kass-Hout, Taha A.; Johanson, Elaine (15 December 2015). "FDA Launches precisionFDA to Harness the Power of Scientific Collaboration". FDA. Retrieved 25 April 2016.
  12. Blair Wyckoff, Whitney (15 December 2015). "FDA unveils open beta of precisionFDA". FedScoop. Retrieved 15 April 2016.
  13. Hall, Susan D. (15 December 2015). "PrecisionFDA platform opens to the public". FierceHealthIT. Retrieved 15 April 2016.
  14. Krol, Aaron (4 March 2016). "PrecisionFDA Consistency Challenge Will Benchmark the Basic Software Tools of Genetic Research". Bio IT World. Retrieved 15 April 2016.
  15. "Launch of the First precisionFDA Consistency Challenge on DNAnexus-Built Cloud-Based Community Platform". Business Wire. 3 March 2016. Retrieved 15 April 2016.
  16. Boja, Emily; Težak, Živana; Zhang, Bing; Wang, Pei; Johanson, Elaine; Hinton, Denise; Rodriguez, Henry (September 2018). "Right data for right patient-a precisionFDA NCI-CPTAC Multi-omics Mislabeling Challenge". Nature Medicine. 24 (9): 1301–1302. doi:10.1038/s41591-018-0180-x. ISSN   1546-170X. PMC   6892367 . PMID   30194412.
  17. Krusche, Peter; Trigg, Len; Boutros, Paul C.; Mason, Christopher E.; De La Vega, Francisco M.; Moore, Benjamin L.; Gonzalez-Porta, Mar; Eberle, Michael A.; Tezak, Zivana; Lababidi, Samir; Truty, Rebecca (May 2019). "Best practices for benchmarking germline small-variant calls in human genomes". Nature Biotechnology. 37 (5): 555–560. doi:10.1038/s41587-019-0054-x. ISSN   1546-1696. PMC   6699627 . PMID   30858580.
  18. Poplin, Ryan; Chang, Pi-Chuan; Alexander, David; Schwartz, Scott; Colthurst, Thomas; Ku, Alexander; Newburger, Dan; Dijamco, Jojo; Nguyen, Nam; Afshar, Pegah T.; Gross, Sam S. (November 2018). "A universal SNP and small-indel variant caller using deep neural networks". Nature Biotechnology. 36 (10): 983–987. doi:10.1038/nbt.4235. ISSN   1546-1696. PMID   30247488. S2CID   52346325.
  19. Krol, Aaron (15 April 2016). "Odds and Ends from the 2016 Bio-IT World Conference and Expo". Bio IT World. Retrieved 16 April 2016.
  20. "DNAnexus Granted Authority to Operate by Health and Human Services (HHS) for FedRAMP Moderate". www.businesswire.com. 2018-10-17. Retrieved 2020-07-28.
  21. Golden, Hallie (25 November 2015). "FDA Crowdsourcing its Way to Precision Medicine. But What About Security?". Nextgov. Retrieved 16 April 2016.
  22. Anderson, Angela. "FDA Advancing Innovation Through Deep Collaboration" . Retrieved 9 May 2016.