BioCreative

Last updated

BioCreAtIvE (A critical assessment of text mining methods in molecular biology) consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain. [1]

Contents

It was preceded by the Knowledge Discovery and Data Mining (KDD) Challenge Cup for detection of gene mentions. [2]

Community Challenges

First edition (2004-2005)

Three main tasks were posed at the first BioCreAtIvE challenge: the entity extraction task, [3] the gene name normalization task, [4] [5] and the functional annotation of gene products task. [6] The data sets produced by this contest serve as a Gold Standard training and test set to evaluate and train Bio-NER tools and annotation extraction tools.

Second edition (2006-2007)

The second BioCreAtIvE challenge (2006-2007) had also 3 tasks: detection of gene mentions, extraction of unique idenfiers for genes and extraction information related to physical protein-protein interactions. [7] It counted with participation of 44 teams from 13 countries. [7]

Third edition (2011-2012)

The third edition of BioCreative included for the first time the InterActive Task (IAT), designed to evaluate the practical usability of text mining tools in real-world biocuration tasks. [8] [9]

Fifth edition (2016)

BioCreative V had 5 different tracks, [10] including an interactive task (IAT) for usability of text mining systems [11] and a track using the BioC format for curating information for BioGRID. [12]

See also

Related Research Articles

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical and molecular biology domains. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies developed through studies in this field are frequently applied to the biomedical and molecular biology literature available through services such as PubMed.

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression.

The Human Protein Reference Database (HPRD) is a protein database accessible through the Internet. It is closely associated with the premier Indian Non-Profit research organisation Institute of Bioinformatics (IOB), Bangalore, India. This database is a collaborative output of IOB and the Pandey Lab of Johns Hopkins University.

The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult - if not impossible - to analyze without the help of computer programs.

<span class="mw-page-title-main">Robert Stevens (scientist)</span>

Robert David Stevens is a professor of bio-health informatics. and former Head of Department of Computer Science at The University of Manchester

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the United Kingdom academic community.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain Wikipedia related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

<span class="mw-page-title-main">WikiPathways</span>

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems.

<span class="mw-page-title-main">International Society for Computational Biology Student Council</span> Student section of the International Society for Computational Biology

The International Society for Computational Biology Student Council (ISCB-SC) is a dedicated section of the International Society for Computational Biology created in 2004. It is composed by students from all levels in the fields of bioinformatics and computational biology. The organisation promotes the development of the students' community worldwide by organizing different events including symposia, workshops, webinars, internship coordination and hackathons. A special focus is made on the development of soft skills in order to develop potential in bioinformatics and computational biology students around the world.

The Critical Assessment of Functional Annotation (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. Different algorithms are evaluated by their ability to predict the Gene Ontology (GO) terms in the categories of Molecular Function, Biological Process, and Cellular Component.

Semantic Automated Discovery and Integration (SADI) is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. SADI services utilize Semantic Web technologies at every level of the Web services "stack". Services are described in OWL-DL, where the property restrictions in OWL classes are used to define the properties expected of the input and output data. Invocation of SADI Services is achieved through HTTP POST of RDF data representing OWL Individuals ('instances') of the defined input OWL Class, and the resulting output data will be OWL Individuals of the defined output OWL Class.

MG-RAST is an open-source web application server that suggests automatic phylogenetic and functional analysis of metagenomes. It is also one of the biggest repositories for metagenomic data. The name is an abbreviation of Metagenomic Rapid Annotations using Subsystems Technology. The pipeline automatically produces functional assignments to the sequences that belong to the metagenome by performing sequence comparisons to databases in both nucleotide and amino-acid levels. The applications supply phylogenetic and functional assignments of the metagenome being analysed, as well as tools for comparing different metagenomes. It also provides a RESTful API for programmatic access.

<span class="mw-page-title-main">Alfonso Valencia</span>

Alfonso Valencia is a Spanish biologist, ICREA Professor, current director of the Life Sciences department at Barcelona Supercomputing Center. and of Spanish National Bioinformatics Institute (INB-ISCIII). From 2015-2018, he was President of the International Society for Computational Biology. His research is focused on the study of biomedical systems with computational biology and bioinformatics approaches.

The Plant Genomics and Phenomics Research Data Repository (PGP) is a data publication infrastructure to comprehensively publish multi-domain plant research data. It is hosted at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben, Germany. The repository hosts DOI citeable datasets that are not being published in public repositories because of their volume or data scope. PGP enables the publication of gigabyte-scale datasets and is registered as a research data repository at FAIRSharing.org, re3data.org and OpenAIRE as a valid EU Horizon 2020 open data archive. The above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. The PGP repository was created using the e!DAL software infrastructure and applies an on-premises approach to "bring the infrastructure to the data" (I2D).

<span class="mw-page-title-main">Biocuration</span>

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Hirschman, L.; Yeh, A.; Blaschke, C.; Valencia, A. (2005). "Overview of BioCreAtIvE: Critical assessment of information extraction for biology". BMC Bioinformatics. 6 (Suppl 1): S1. doi:10.1186/1471-2105-6-S1-S1. PMC   1869002 . PMID   15960821.
  2. Yeh, A. S.; Hirschman, L.; Morgan, A. A. (2003-07-03). "Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup". Bioinformatics. 19 (Suppl 1): i331–i339. doi: 10.1093/bioinformatics/btg1046 . ISSN   1367-4803. PMID   12855478.
  3. Yeh, A.; Morgan, A.; Colosimo, M.; Hirschman, L. (2005). "BioCreAtIvE Task 1A: Gene mention finding evaluation". BMC Bioinformatics. 6 (Suppl 1): S2. doi:10.1186/1471-2105-6-S1-S2. PMC   1869012 . PMID   15960832.
  4. Hirschman, L.; Colosimo, M.; Morgan, A.; Yeh, A. (2005). "Overview of BioCreAtIvE task 1B: Normalized gene lists". BMC Bioinformatics. 6 (Suppl 1): S11. doi:10.1186/1471-2105-6-S1-S11. PMC   1869004 . PMID   15960823.
  5. Colosimo, M. E.; Morgan, A. A.; Yeh, A. S.; Colombe, J. B.; Hirschman, L. (2005). "Data preparation and interannotator agreement: BioCreAtIvE Task 1B". BMC Bioinformatics. 6 (Suppl 1): S12. doi:10.1186/1471-2105-6-S1-S12. PMC   1869005 . PMID   15960824.
  6. Blaschke, C.; Leon, E.; Krallinger, M.; Valencia, A. (2005). "Evaluation of BioCreAtIvE assessment of task 2". BMC Bioinformatics. 6 (Suppl 1): S16. doi:10.1186/1471-2105-6-S1-S16. PMC   1869008 . PMID   15960828.
  7. 1 2 Krallinger, Martin; Morgan, Alexander; Smith, Larry; Leitner, Florian; Tanabe, Lorraine; Wilbur, John; Hirschman, Lynette; Valencia, Alfonso (2008). "Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge". Genome Biology. 9 (Suppl 2): S1. doi: 10.1186/gb-2008-9-s2-s1 . ISSN   1465-6906. PMC   2559980 . PMID   18834487.
  8. Arighi, Cecilia; Roberts, Phoebe M.; Agarwal, Shashank; Bhattacharya, Sanmitra; Cesareni, Gianni; Chatr-Aryamontri, Andrew; Clematide, Simon; Gaudet, Pascale; Giglio, Michelle; Harrow, Ian; Huala, Eva (2011-10-03). "BioCreative III interactive task: an overview". BMC Bioinformatics. 12 Suppl 8 (Suppl 8): S4. doi: 10.1186/1471-2105-12-S8-S4 . PMC   3269939 . PMID   22151968.
  9. Carterette, Ben; Cohen, Kevin Bretonnel; Cooper, Laurel; Li, Donghui; Jimenez, Silvia; Roberts, Phoebe; Drabkin, Harold; Bello, Susan; Schaeffer, Mary L.; Park, Julie; Li, Yuling (2013-01-17). "An overview of the BioCreative 2012 Workshop Track III: interactive text mining task". Database. 2013: bas056. doi: 10.1093/DATABASE/BAS056 . PMC   3625048 . PMID   23327936.
  10. "BioCreative - Call for Participation". biocreative.bioinformatics.udel.edu. Retrieved 2021-04-21.
  11. Wang, Qinghua; Abdul, Shabbir S.; Almeida, Lara; Ananiadou, Sofia; Balderas-Martínez, Yalbi I.; Batista-Navarro, Riza; Campos, David; Chilton, Lucy; Chou, Hui-Jou; Contreras, Gabriela; Cooper, Laurel (2016-01-01). "Overview of the interactive task in BioCreative V". Database. 2016: baw119. doi: 10.1093/DATABASE/BAW119 . PMC   5009325 . PMID   27589961.
  12. Kim, Sun; Doğan, Rezarta Islamaj; Chatr-Aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sofia; Matos, Sérgio; Santos, André (2016-09-01). "BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID". Database. 2016: baw121. doi: 10.1093/DATABASE/BAW121 . PMC   5009341 . PMID   27589962.