DisGeNET

Last updated
DisGeNET
Content
Description Gene Disease Database
Data types
captured
Gene-disease associations
Organisms Homo sapiens
Contact
Research center GRIB
Laboratory IBI Group
AuthorsLaura I. Furlong and Ferran Sanz, Team Leaders
Primary citation PMID   25877637
Release date2010
Access
Website DisGeNET
Download URL Downloads
Sparql endpoint DisGeNET-RDF
Miscellaneous
License The DisGeNET data is made available on a Open Database License
Versioning DisGeNET v5.0

DisGeNET is a discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET is one of the largest and comprehensive repositories of human gene-disease associations (GDAs) currently available. [1] It also offers a set of bioinformatic tools to facilitate the analysis of these data by different user profiles. It is maintained by the Integrative Biomedical Informatics (IBI) Group Archived 2016-11-26 at the Wayback Machine , of the (GRIB)-IMIM/UPF, based at the Barcelona Biomedical Research Park (PRBB), Barcelona, Spain.

Contents

Scope and access

In the pursuit to gather different aspects of the current knowledge on the genetic basis of human diseases, DisGeNET covers information on all disease areas (Mendelian, complex and environmental diseases). With more than 400 000 genotype-phenotype relationships from different origins integrated and annotated with explicit provenance and evidence information, DisGeNET is a valuable knowledge and evidence-based discovery resource for Translational Research. DisGeNET is an open access resource that makes available a comprehensive knowledge base on disease genes and different tools for their exploitation and analysis. DisGeNET is available through a Web interface, a Cytoscape plugin, [2] as linked data for the Semantic Web, and supports programmatic access to its data. These valuable set of tools allows investigating the molecular mechanisms underlying diseases of genetic origin, [3] and are designed to support the data exploitation from different perspectives and to fulfill the needs of different types of users, including bioinformaticians, biologists and healthcare practitioners.

Integrated data

The DisGeNET database integrates over 400 000 associations between > 17 000 genes and > 14 000 diseases from human to animal model expert curated databases with text mined GDAs from MEDLINE using a NLP-based approach. [4] The highlights of DisGeNET are the data integration, standardisation and a fine-grained tracking of the provenance information. The integration is performed by means of gene and disease vocabulary mapping and by using the DisGeNET association type ontology. Furthermore, GDAs are organised according to their type and level of evidence as CURATED, PREDICTED and LITERATURE, and they are also scored based on the supporting evidence to prioritise and ease their exploration.

The DisGeNET Association Type Ontology

For a seamless integration of gene-disease association data, we developed the DisGeNET association type ontology. All association types as found in the original source databases are formally structured from a parent GeneDiseaseAssociation class if there is a relationship between the gene/protein and the disease, and represented as ontological classes. It is an OWL ontology that is integrated into the Sematicscience Integrated Ontology (SIO), which provides essential types and relations for the rich description of objects, processes and their attributes. [5] You can check SIO gene-disease association classes from this URL.

Cytoscape plugin

The DisGeNET Cytoscape plugin [2] offers a network representation of the gene-disease associations. It represents gene-disease associations in terms of bipartite graphs and additionally provides gene centric and disease centric views of the data. It assists the user in the interpretation and exploration of human complex diseases with respect to their genetic origin by a variety of built-in functions. Using the DisGeNET Cytoscape plugin you can perform queries restricted to (i) the original data source, (ii) the association type, (iii) the disorder class of interest and (iv) specific diseases or genes.

Linked Data

The information contained in DisGeNET can also be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Open Data cloud. DisGeNET is distributed as RDF and Nanopublications linked datasets. The DisGeNET-RDF linked dataset is an alternative way to access the DisGeNET data and provides new opportunities for data integration, querying and integrating DisGeNET data to other external RDF datasets. The RDF and Nanopublication distributions of DisGeNET have been developed in the context of the Open PHACTS project to provide disease relevant information to the knowledge base on pharmacological data.

European projects

See also

Related Research Articles

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.

<span class="mw-page-title-main">GenMAPP</span> Open-source bioinformatics software for genomics

GenMAPP is a free, open-source bioinformatics software tool designed to visualize and analyze genomic data in the context of pathways, connecting gene-level datasets to biological processes and disease. First created in 2000, GenMAPP is developed by an open-source team based in an academic research laboratory. GenMAPP maintains databases of gene identifiers and collections of pathway maps in addition to visualization and analysis tools. Together with other public resources, GenMAPP aims to provide the research community with tools to gain insight into biology through the integration of data types ranging from genes to proteins to pathways to disease.

Integrative bioinformatics is a discipline of bioinformatics that focuses on problems of data integration for the life sciences.

<span class="mw-page-title-main">Cytoscape</span>

Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are available for network and molecular profiling analyses, new layouts, additional file format support and connection with databases and searching in large networks. Plugins may be developed using the Cytoscape open Java software architecture by anyone and plugin community development is encouraged. Cytoscape also has a JavaScript-centric sister project named Cytoscape.js that can be used to analyse and visualise graphs in JavaScript environments, like a browser.

A Biositemap is a way for a biomedical research institution of organisation to show how biological information is distributed throughout their Information Technology systems and networks. This information may be shared with other organisations and researchers.

The Comparative Toxicogenomics Database (CTD) is a public website and research tool launched in November 2004 that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations, pathways, and interaction modules. The database is maintained by the Department of Biological Sciences at North Carolina State University.

BioPAX is a RDF/OWL-based standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data. Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.

<span class="mw-page-title-main">Ontology engineering</span> Field that studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

The National Center for Integrative Biomedical Informatics (NCIBI) is one of seven National Centers for Biomedical Computing funded by the National Institutes of Health's (NIH) Roadmap for Medical Research. The center is based at the University of Michigan and is part of the Center for Computational Medicine and Bioinformatics. NCIBI's mission is to create targeted knowledge environments for molecular biomedical research to help guide experiments and enable new insights from the analysis of complex diseases. It was established in October 2005.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

<span class="mw-page-title-main">Geworkbench</span> Genomic data analysis software

geWorkbench is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data.

Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.

<span class="mw-page-title-main">Experimental factor ontology</span>

Experimental factor ontology, also known as EFO, is an open-access ontology of experimental variables particularly those used in molecular biology. The ontology covers variables which include aspects of disease, anatomy, cell type, cell lines, chemical compounds and assay information. EFO is developed and maintained at the EMBL-EBI as a cross-cutting resource for the purposes of curation, querying and data integration in resources such as Ensembl, ChEMBL and Expression Atlas.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Piñero, J.; Queralt-Rosinach, N.; Bravo, A.; Deu-Pons, J.; Bauer-Mehren, A.; Baron, M.; Sanz, F.; Furlong, L. I. (15 April 2015). "DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes". Database. 2015: bav028. doi:10.1093/database/bav028. PMC   4397996 . PMID   25877637.
  2. 1 2 Bauer-Mehren, A; Rautschka, M; Sanz, F; Furlong, LI (15 November 2010). "DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks". Bioinformatics. 26 (22): 2924–6. doi: 10.1093/bioinformatics/btq538 . PMID   20861032.
  3. Bauer-Mehren, A; Bundschus, M; Rautschka, M; Mayer, MA; Sanz, F; Furlong, LI (14 June 2011). "Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases". PLOS ONE. 6 (6): e20284. Bibcode:2011PLoSO...620284B. doi: 10.1371/journal.pone.0020284 . PMC   3114846 . PMID   21695124.
  4. Bravo, À; Piñero, J; Queralt-Rosinach, N; Rautschka, M; Furlong, LI (21 February 2015). "Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research". BMC Bioinformatics. 16 (55): 55. doi: 10.1186/s12859-015-0472-9 . PMC   4466840 . PMID   25886734.
  5. Dumontier, Michel; Baker, Christopher JO; Baran, Joachim; Callahan, Alison; Chepelev, Leonid; Cruz-Toledo, José; Del Rio, Nicholas R; Duck, Geraint; Furlong, Laura I; Keath, Nichealla; Klassen, Dana; McCusker, James P; Queralt-Rosinach, Núria; Samwald, Matthias; Villanueva-Rosales, Natalia; Wilkinson, Mark D; Hoehndorf, Robert (2014). "The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery". Journal of Biomedical Semantics. 5 (1): 14. doi: 10.1186/2041-1480-5-14 . PMC   4015691 . PMID   24602174.