DREAM Challenges

Last updated
DREAM Challenges
Type non-profit organization
Purpose Crowd-sourced competitions
Website http://www.dreamchallenges.org

DREAM Challenges (Dialogue for Reverse Engineering Assessment and Methods) is a non-profit initiative for advancing biomedical and systems biology research via crowd-sourced competitions. [1] [2] Started in 2006, DREAM challenges collaborate with Sage Bionetworks to provide a platform for competitions run on the Synapse platform. Over 60 DREAM challenges have been conducted over the span of over 15 years. [3]

Contents

Overview

DREAM Challenges were founded in 2006 by Gustavo Stolovizky from IBM Research [4] and Andrea Califano from Columbia University. Current chair of the DREAM organization is Paul Boutros from University of California. Further organization spans emeritus chairs Justin Guinney and Gustavo Stolovizky, and multiple DREAM directors. [5]

Individual challenges focus on tackling a specific biomedical research question, typically narrowed down to a specific disease. A prominent disease focus has been on oncology, with multiple past challenges focused on breast cancer, acute myeloid leukemia, and prostate cancer or similar diseases. [3] The data involved in an individual challenge reflects the disease context; while cancers typically involve data such as mutations in the human genome, gene expression and gene networks in transcriptomics, and large scale proteomics, newer challenges have shifted towards single cell sequencing technologies as well as emerging gut microbiome related research questions, thus reflecting trends in the wider research community. [6]

Motivation for DREAM Challenges is that via crowd-sourcing data to a larger audience via competitions, better models and insight is gained than if the analysis was conducted by a single entity. [7] Past competitions have been published in such scientific venues as the flagship journals of the Nature Portfolio and PLOS publishing groups. [8] Results of DREAM challenges are announced via web platforms, and the top performing participants are invited to present their results in the annual RECOMB/ISCB Conferences with RSG/DREAM [9] organized by the ISCB.

While DREAM Challenges have emphasized open science and data, in order to mitigate issues rising from highly sensitive data such as genomics in patient cohorts, "model to data" approaches have been adopted. [10] In such challenges participants submit their models via containers such as Docker or Singularity. This allows retaining confidentiality of the original data as these containers are then run by the organizers on the confidential data. This differs from the more traditional open data model, where participants submit predictions directly based on the provided open data.

Challenge organization

DREAM challenge comprises a core DREAM/Sage Bionetworks organization group as well as an extended scientific expert group, who may have contributed to creation and conception of the challenge or by providing key data. [11] Additionally, new DREAM challenges may be proposed by the wider research community. [12] Pharmaceutical companies or other private entities may also be involved in DREAM challenges, for example in providing data.

Challenge structure

Timelines for key stages (such as introduction webinars, model submission deadlines, and final deadline for participation) are provided in advance. After the winners are announced, organizers start collaborating with the top performing participants to conduct post hoc analyses for a publication describing key findings from the competition. [7]

Challenges may be split into sub-challenges, each addressing a different subtopic within the research question. For example, regarding cancer treatment efficacy predictions, these may be separate predictions for progression-free survival, overall survival, best overall response according to RECIST, or exact time until event (progression or death). [2]

Participation

During DREAM challenges, participants typically build models on provided data, and submit predictions or models that are then validated on held-out data by the organizers. While DREAM challenges avoid leaking validation data to participants, there are typically mid-challenge submission leaderboards available to assist participants in evaluating their performance on a sub-sampled or scrambled dataset. [7]

DREAM challenges are free for participants. During the open phase anybody can register via Synapse to participate either individually or as a team. A person may only register once and may not use any aliases.

There are some exceptions, which disqualify an individual from participating, for example: [13]

See also

Related Research Articles

Toxicogenomics is a subdiscipline of pharmacology that deals with the collection, interpretation, and storage of information about gene and protein activity within a particular cell or tissue of an organism in response to exposure to toxic substances. Toxicogenomics combines toxicology with genomics or other high-throughput molecular profiling technologies such as transcriptomics, proteomics and metabolomics. Toxicogenomics endeavors to elucidate the molecular mechanisms evolved in the expression of toxicity, and to derive molecular expression patterns that predict toxicity or the genetic susceptibility to it.

<span class="mw-page-title-main">Michael Levitt</span> Nobel laureate, biophysicist, and professor of structural biology (born 1947)

Michael Levitt, is a South African-born biophysicist and a professor of structural biology at Stanford University, a position he has held since 1987. Levitt received the 2013 Nobel Prize in Chemistry, together with Martin Karplus and Arieh Warshel, for "the development of multiscale models for complex chemical systems". In 2018, Levitt was a founding co-editor of the Annual Review of Biomedical Data Science.

<span class="mw-page-title-main">Crowdsourcing</span> Sourcing services or funds from a group

Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digital platforms to attract and divide work between participants to achieve a cumulative result. Crowdsourcing is not limited to online activity, however, and there are various historical examples of crowdsourcing. The word crowdsourcing is a portmanteau of "crowd" and "outsourcing". In contrast to outsourcing, crowdsourcing usually involves less specific and more public groups of participants.

Genotype to Phenotype Databases: a Holistic Approach (GEN2PHEN) is a European project aiming to develop a knowledge web portal integrating information from the genotype to the phenotype in a unifying portal: The Knowledge Centre].

Trey Ideker is a professor of medicine and bioengineering at UC San Diego. He is the Director of the National Resource for Network Biology, the San Diego Center for Systems Biology, and the Cancer Cell Map Initiative. He uses genome-scale measurements to construct network models of cellular processes and disease.

<span class="mw-page-title-main">David T. Jones (scientist)</span>

David Tudor Jones is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.

Sage Bionetworks is a nonprofit organization in Seattle that promotes open science and patient engagement in the research process. It is led by Luca Foschini. It was co-founded by Stephen Friend and Eric Schadt.

Event sampling methodology (ESM) refers to a diary study. ESM is also known as ecological momentary assessment (EMA) or experience sampling methodology. ESM includes sampling methods that allow researchers to study ongoing experiences and events by taking assessments one or more times per day per participant (n=1) in the naturally occurring social environment. ESM enables researchers to study the prevalence of behaviors, promote theory development, and to serve an exploratory role. The frequent sampling of events inherent in ESM enables researchers to measure the typology of activity and detect the temporal and dynamic fluctuations of experiences. The popularity of ESM as a new form of research design increased over the recent years, because it addresses the shortcomings of cross-sectional research which cannot detect intra-individual variances and processes across time and cause-effect relationships. In ESM, participants are asked to record their experiences and perceptions in a paper or electronic diary. Diary studies allow for the studying of events that occur naturally but are difficult to examine in the lab. For conducting event sampling, SurveySignal and Expimetrics are becoming popular platforms for social science researchers.

<span class="mw-page-title-main">Ruth Nussinov</span> Bioinformatician

Ruth Nussinov is an Israeli-American biologist born in Rehovot who works as a Professor in the Department of Human Genetics, School of Medicine at Tel Aviv University and is the Senior Principal Scientist and Principal Investigator at the National Cancer Institute, National Institutes of Health. Nussinov is also the Editor in Chief of the Current Opinion in Structural Biology and formerly of the journal PLOS Computational Biology.

<span class="mw-page-title-main">Alfonso Valencia</span>

Alfonso Valencia is a Spanish biologist, ICREA Professor, current director of the Life Sciences department at Barcelona Supercomputing Center. and of Spanish National Bioinformatics Institute (INB-ISCIII). From 2015-2018, he was President of the International Society for Computational Biology. His research is focused on the study of biomedical systems with computational biology and bioinformatics approaches.

SAMPL is a set of community-wide blind challenges aimed to advance computational techniques as standard predictive tools in rational drug design. A broad range of biologically relevant systems with different sizes and levels of complexities including proteins, host–guest complexes, and drug-like small molecules have been selected to test the latest modeling methods and force fields in SAMPL. New experimental data, such as binding affinity and hydration free energy, are withheld from participants until the prediction submission deadline, so that the true predictive power of methods can be revealed. The most recent SAMPL5 challenge contains two prediction categories: the binding affinity of host–guest systems, and the distribution coefficients of drug-like molecules between water and cyclohexane. Since 2008, the SAMPL challenge series has attracte interest from scientists engaged in the field of computer-aided drug design (CADD) The current SAMPL organizers include John Chodera, Michael K. Gilson, David Mobley, and Michael Shirts.

PrecisionFDA is a secure, collaborative, high-performance computing platform that has established a growing community of experts around the analysis of biological datasets in order to advance precision medicine, inform regulatory science, and enable improvements in health outcomes. This cloud-based platform is developed and served by the United States Food and Drug Administration (FDA). PrecisionFDA connects experts, citizen scientists, and scholars from around the world and provides them with a library of computational tools, workflow features, and reference data. The platform allows researchers to upload and compare data against reference genomes, and execute bioinformatic pipelines. The variant call file (VCF) comparator tool also enables users to compare their genetic test results to reference genomes. The platform's code is open source and available on GitHub. The platform also features a crowdsourcing model to sponsor community challenges in order to stimulate the development of innovative analytics that inform precision medicine and regulatory science. Community members from around the world come together to participate in scientific challenges, solving problems that demonstrate the effectiveness of their tools, testing the capabilities of the platform, sharing their results, and engaging the community in discussions. Globally, precisionFDA has more than 5,000 users.

<span class="mw-page-title-main">Artificial intelligence in healthcare</span> Overview of the use of artificial intelligence in healthcare

Artificial intelligence in healthcare is an overarching term used to describe the use of machine-learning algorithms and software, or artificial intelligence (AI), to mimic human cognition in the analysis, presentation, and comprehension of complex medical and health care data, or to exceed human capabilities by providing new ways to diagnose, treat, or prevent disease. Specifically, AI is the ability of computer algorithms to approximate conclusions based solely on input data.

A data science competition platform is used by businesses to host data science challenges that are hard to solve for one group. Historically, crowdsourcing challenges have been known to solve very complex problems. The Netflix Prize is one such competition. Since then there have been several platforms developed on the idea of data science competitions. Research has been completed on how competition can improve research performance. Companies like J.P. Morgan Chase also run internal contests involving large numbers of employees.

Xiaole Shirley Liu (刘小乐) is computational biologist, cancer researcher, and entrepreneur. She has been a Professor in the Department of Data Sciences at the Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health. She is now the co-founder and CEO of GV20 Therapeutics.

Synapse.org is an open source platform for collaborative scientific data analysis. It can store data, code, results, and descriptions research work. It is operated by nonprofit organization Sage Bionetworks.

Gustavo A. Stolovitzky is an Argentine-American computational systems biologist. He is an IBM Fellow and the Director of the Translational Systems Biology and Nano-Biotechnology Program at IBM Research. He serves as the program director of the Thomas J. Watson Research Center's Translational Systems Biology and Nanobiotechnology Program, as well as an Adjunct professor of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai and an Adjunct Associate Professor of Biomedical Informatics at Columbia University. His research has been cited more than 20,000 times

Katherine Snowden Pollard is the Director of the Gladstone Institute of Data Science and Biotechnology and a professor at the University of California, San Francisco (UCSF). She is a Chan Zuckerberg Biohub Investigator. She was awarded Fellowship of the International Society for Computational Biology in 2020 and the American Institute for Medical and Biological Engineering in 2021 for outstanding contributions to computational biology and bioinformatics.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

<span class="mw-page-title-main">Stacey Finley</span> American biologist and geneticist

Stacey Finley is the Nichole A. and Thuan Q. Pham Professor and associate professor of chemical engineering and materials science, and quantitative and computational biology at the University of Southern California. Finley has a joint appointment in the department of chemical engineering and materials science, and she is a member of the USC Norris Comprehensive Cancer Center. Finley is also a standing member of the MABS Study Section at NIH. Her research has been supported by grants from the NSF, NIH, and American Cancer Society.

References

  1. Meyer, Pablo; Saez-Rodriguez, Julio (2021-06-16). "Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges". Cell Systems. 12 (6): 636–653. doi: 10.1016/j.cels.2021.05.015 . ISSN   2405-4712. PMID   34139170. S2CID   235472517.
  2. 1 2 Vincent, Benjamin G.; Szustakowski, Joseph D.; Doshi, Parul; Mason, Michael; Guinney, Justin; Carbone, David P. (2021-11-01). "Pursuing Better Biomarkers for Immunotherapy Response in Cancer Through a Crowdsourced Data Challenge". JCO Precision Oncology. 5 (5): 51–54. doi:10.1200/PO.20.00371. PMC   9848594 . PMID   34994587. S2CID   234209297.
  3. 1 2 "Closed Challenges". DREAM Challenges. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  4. "DREAM Challenges (IBM Research)". Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  5. "DREAM Directors & Support Team". DREAM Challenges. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  6. "Open Challenges". DREAM Challenges. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  7. 1 2 3 Saez-Rodriguez, Julio; Costello, James C.; Friend, Stephen H.; Kellen, Michael R.; Mangravite, Lara; Meyer, Pablo; Norman, Thea; Stolovitzky, Gustavo (2016-07-15). "Crowdsourcing biomedical research: leveraging communities as innovation engines". Nature Reviews. Genetics. 17 (8): 470–486. doi:10.1038/nrg.2016.69. ISSN   1471-0056. PMC   5918684 . PMID   27418159.
  8. "Publications". DREAM Challenges. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  9. "HOME - RSGDREAM 2022". www.iscb.org. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  10. Ellrott, Kyle; Buchanan, Alex; Creason, Allison; Mason, Michael; Schaffter, Thomas; Hoff, Bruce; Eddy, James; Chilton, John M.; Yu, Thomas; Stuart, Joshua M.; Saez-Rodriguez, Julio; Stolovitzky, Gustavo; Boutros, Paul C.; Guinney, Justin (2019-09-10). "Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges". Genome Biology. 20 (1): 195. doi:10.1186/s13059-019-1794-0. ISSN   1474-760X. PMC   6737594 . PMID   31506093.
  11. Boutros, Paul (2020-11-20). "Can crowd-sourcing help advance open science?". FEBS Network. Archived from the original on 2023-01-07. Retrieved 2022-11-13.
  12. Azencott, Chloé-Agathe; Aittokallio, Tero; Roy, Sushmita; DREAM Idea Challenge Consortium; Norman, Thea; Friend, Stephen; Stolovitzky, Gustavo; Goldenberg, Anna (2017-09-29). "The inconvenience of data of convenience: computational research beyond post-mortem analyses". Nature Methods. 14 (10): 937–938. doi:10.1038/nmeth.4457. ISSN   1548-7105. PMID   28960198. S2CID   34550460.
  13. "DREAM OFFICIAL CHALLENGE RULES - Effective May 11, 2022 (on Synapse)". www.synapse.org. Archived from the original on 2023-01-06. Retrieved 2022-11-13.