Reactome

Last updated

Reactome is a free online database of biological pathways. [1] [2] [3] It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.

Contents

Reactome is maintained by an international multidisciplinary team from OICR, OHSU, EMBL-EBI and NYULMC, with expertise in pathway curation and annotation, software development, and training and outreach, dedicated to providing the research community with openly accessible biological pathway knowledge. The Reactome project is led by Lincoln Stein (OICR). Peter D'Eustachio (NYULMC), Henning Hermjakob (EMBL-EBI), Guanming Wu (OHSU). The Reactome helpdesk can be reached via email.

Reactome: a database of reactions, pathways and biological processes.
R-purple small.png
Content
DescriptionReactome: a database of reactions, pathways and biological processes.
Contact
Primary citation PMID   37941124
Access
Data format BioPAX
SBML
Website https://reactome.org
Download URL https://reactome.org/download-data
Web service URL https://reactome.org/ContentService/
Miscellaneous
License https://reactome.org/license
Data release
frequency
https://reactome.org/about/release-calendar

The website can be used to browse pathways and submit data to a suite of data analysis tools. The underlying data is fully downloadable in a number of standard formats including PDF, SBML, Neo4j GraphDB, MySQL, PSI-MITAB, and BioPAX. Pathway diagrams use a Process Description (PD) Systems Biology Graphical Notation (SBGN)-based style.

The core unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes and small molecules) participating in reactions form a network of biological interactions and are grouped into pathways. Examples of biological pathways in Reactome include signal transduction, innate and acquired immune function, transcriptional regulation, programmed cell death and classical intermediary metabolism.

The pathways represented in Reactome are species-specific, with each pathway step supported by literature citations that contain an experimental verification of the process represented. If no experimental verification using human reagents exists, pathways may contain steps manually inferred from non-human experimental details, but only if an expert biologist, named as Author of the pathway, and a second biologist, names as Reviewer, agree that this is a valid inference to make. The human pathways are used to computationally generate by an orthology-based process derived pathways in other organisms.

Database organization

Reactome database releases occur quarterly.

In Reactome, human biological processes are annotated by breaking them down into series of molecular events. Like classical chemistry reactions each Reactome event has input physical entities (substrates) which interact, possibly facilitated by enzymes or other molecular catalysts, to generate output physical entities (products).

Reactions include the classical chemical interconversions of intermediary metabolism, binding events, complex formation, transport events that direct molecules between cellular compartments, and events such as the activation of a protein by cleavage of one or more of its peptide bonds. Individual events can be grouped together into pathways.

Physical entities can be small molecules like glucose or ATP, or large molecules like DNA, RNA, and proteins, encoded directly or indirectly in the human genome. Physical entities are cross-referenced to relevant external databases, such as UniProt for proteins and ChEBI for small molecules. Localization of molecules to subcellular compartments is a key feature of the regulation of human biological processes, so molecules in the Reactome database are associated with specific locations. Thus in Reactome instances of the same chemical entity in different locations (e.g., extracellular glucose and cytosolic glucose) are treated as distinct chemical entities.

The Gene Ontology controlled vocabularies are used to describe the subcellular locations of molecules and reactions, molecular functions, and the larger biological processes that a specific reaction is part of.

Database content

The database contains curated annotations that cover a diverse set of topics in molecular and cellular biology. Details of annotation topics can be found in the table of contents. Details of current and future annotation projects can be found in the calendar of annotation projects.

Reactome invites biological experts as reviewers for completed pathways that are ready for external review. Reviewers will be credited with authorship or reviewership for contributions. Each pathway is associated with a DOI and can be cited as a publication. Reactome contributions in can be easily claimed using the ORCID claiming feature.

Tools

There are tools on the website for viewing an interactive pathway diagram, performing pathway mapping and pathway over-representation analysis and for overlaying expression data onto Reactome pathways. The pathway mapping and over-representation tools take a single column of protein/compound identifiers, Uniprot and ChEBI accessions are preferred but the interface will accept and interpret many other identifiers or symbols. Mixed identifiers can be used. Over-representation results are presented as a list of statistically over-represented pathways.

Expression data is submitted in a multi-column format, the first column identifying the protein, additional columns are expected to be numeric expression values, they can in fact be any numeric value, e.g. differential expression, quantitative proteomics, GWAS scores. The expression data is represented as colouring of the corresponding proteins in pathway diagrams, using the colours of the visible spectrum so 'hot' red colours represent high values. If multiple columns of numeric data are submitted the overlay tool can display them as separate 'experiments', e.g. timepoints or a disease progression.

The database can be browsed and searched as an on-line textbook. [4] An online users guide is available. Users can also download the current data set or individual pathways and reactions in a variety of formats including PDF, BioPAX, and SBML [5]

Reactome also has a ReactomeGSA [6] tool, integrated into the Reactome Analysis Tools that allows comparative pathway analyses of multi-omics datasets, with compatibility with single-cell RNA-seq data. Public data from EBI Expression Atlas, Single Cell Expression Atlas, and NCBI GREIN GEO data can be integrated into the analysis. ReactomeGSA is also available as a R Bioconductor package.

Reactome also has a ReactomeIDG [7] web portal, since 2023, aimed to place dark proteins in the context of manually curated, highly reliable Reactome pathways, to facilitate the understanding of functions and predicting therapeutic potential of dark or understudied proteins. Enhanced visualization features implemented at the portal allow users to investigate the functional contexts for dark proteins based on tissue-specific gene or protein expression, drug-target interactions, or protein or gene pairwise relationships in the original Reactome's systems biology graph notation (SBGN) diagrams or the new simplified functional interaction (FI) network view of pathways.

ReactomeFIViz is a Cytoscape app designed to find pathways and network patterns related to diseases. The app accesses Reactome pathways, perform pathway enrichment analysis for a set of genes, visualize hit pathways, and investigate functional relationships among genes in hit pathways. The app also accesses the Reactome Functional Interaction (FI) network [8] .

See also

There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, described on this page.

See Plant Reactome.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

Molecular biology is a branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

A biochemical cascade, also known as a signaling cascade or signaling pathway, is a series of chemical reactions that occur within a biological cell when initiated by a stimulus. This stimulus, known as a first messenger, acts on a receptor that is transduced to the cell interior through second messengers which amplify the signal and transfer it to effector molecules, causing the cell to respond to the initial stimulus. Most biochemical cascades are series of events, in which one event triggers the next, in a linear fashion. At each step of the signaling cascade, various controlling factors are involved to regulate cellular actions, in order to respond effectively to cues about their changing internal and external environments.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

Chemical Entities of Biological Interest, also known as ChEBI, is a chemical database and ontology of molecular entities focused on 'small' chemical compounds, that is part of the Open Biomedical Ontologies (OBO) effort at the European Bioinformatics Institute (EBI). The term "molecular entity" refers to any "constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity". The molecular entities in question are either products of nature or synthetic products which have potential bioactivity. Molecules directly encoded by the genome, such as nucleic acids, proteins and peptides derived from proteins by proteolytic cleavage, are not as a rule included in ChEBI.

<span class="mw-page-title-main">Janet Thornton</span> British bioinformatician and academic

Dame Janet Maureen Thornton, is a senior scientist and director emeritus at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL). She is one of the world's leading researchers in structural bioinformatics, using computational methods to understand protein structure and function. She served as director of the EBI from October 2001 to June 2015, and played a key role in ELIXIR.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

<span class="mw-page-title-main">WikiPathways</span>

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is originally built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems. WikiPathways was founded in 2008 by Thomas Kelder, Alex Pico, Martijn Van Iersel, Kristina Hanspers, Bruce Conklin and Chris Evelo. Current architects are Alex Pico and Martina Summer-Kutmon.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">Lincoln Stein</span> American scientist and academic

Lincoln David Stein is a scientist and Professor in bioinformatics and computational biology at the Ontario Institute for Cancer Research.

Virtual Cell (VCell) is an open-source software platform for modeling and simulation of living organisms, primarily cells. It has been designed to be a tool for a wide range of scientists, from experimental cell biologists to theoretical biophysicists.

Cancer systems biology encompasses the application of systems biology approaches to cancer research, in order to study the disease as a complex adaptive system with emerging properties at multiple biological scales. Cancer systems biology represents the application of systems biology approaches to the analysis of how the intracellular networks of normal cells are perturbed during carcinogenesis to develop effective predictive models that can assist scientists and clinicians in the validations of new therapies and drugs. Tumours are characterized by genomic and epigenetic instability that alters the functions of many different molecules and networks in a single cell as well as altering the interactions with the local environment. Cancer systems biology approaches, therefore, are based on the use of computational and mathematical methods to decipher the complexity in tumorigenesis as well as cancer heterogeneity.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Croft, D.; O'Kelly, G.; Wu, G.; Haw, R.; Gillespie, M.; Matthews, L.; Caudy, M.; Garapati, P.; Gopinath, G.; Jassal, B.; Jupe, S.; Kalatskaya, I.; Mahajan, S.; May, B.; Ndegwa, N.; Schmidt, E.; Shamovsky, V.; Yung, C.; Birney, E.; Hermjakob, H.; d'Eustachio, P.; Stein, L. (2010). "Reactome: A database of reactions, pathways and biological processes". Nucleic Acids Research. 39 (Database issue): D691–D697. doi:10.1093/nar/gkq1018. PMC   3013646 . PMID   21067998.
  2. Joshi-Tope, G.; Gillespie, M.; Vastrik, I.; d'Eustachio, P.; Schmidt, E.; De Bono, B.; Jassal, B.; Gopinath, G.; Wu, G.; Matthews, L.; Lewis, S.; Birney, E.; Stein, L. (2004). "Reactome: A knowledgebase of biological pathways". Nucleic Acids Research. 33 (Database issue): D428–D432. doi:10.1093/nar/gki072. PMC   540026 . PMID   15608231.
  3. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E, Hermjakob H, Stein L, D'Eustachio P (2014). "The Reactome pathway knowledgebase". Nucleic Acids Res. 42 (Database issue): D472–7. doi:10.1093/nar/gkt1102. PMC   3965010 . PMID   24243840.
  4. Haw, R; Stein, L (Jun 2012). "Using the reactome database". Current Protocols in Bioinformatics. Chapter 8: 8.7.1–8.7.23. doi:10.1002/0471250953.bi0807s38. PMC   3427849 . PMID   22700314.
  5. Croft, D (2013). "Building Models Using Reactome Pathways as Templates". In Silico Systems Biology. Methods in Molecular Biology. Vol. 1021. pp. 273–83. doi:10.1007/978-1-62703-450-0_14. ISBN   978-1-62703-449-4. PMID   23715990.
  6. Griss, Johannes; Viteri, Guilherme; Sidiropoulos, Konstantinos; Nguyen, Vy; Fabregat, Antonio; Hermjakob, Henning (December 2020). "ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis". Molecular & Cellular Proteomics. 19 (12): 2115–2125. doi: 10.1074/mcp.TIR120.002155 . PMC   7710148 . PMID   32907876.
  7. Beavers, Deidre; Brunson, Timothy; Sanati, Nasim; Matthews, Lisa; Haw, Robin; Shorser, Solomon; Sevilla, Cristoffer; Viteri, Guilherme; Conley, Patrick; Rothfels, Karen; Hermjakob, Henning; Stein, Lincoln; D'Eustachio, Peter; Wu, Guanming (July 2023). "Illuminate the Functions of Dark Proteins Using the Reactome-IDG Web Portal". Current Protocols. 3 (7): e845. doi:10.1002/cpz1.845. ISSN   2691-1299. PMC  10399304. PMID   37467006.
  8. Wu, Guanming; Feng, Xin; Stein, Lincoln (2010). "A human functional protein interaction network and its application to cancer data analysis". Genome Biology. 11 (5): R53. doi: 10.1186/gb-2010-11-5-r53 . ISSN   1474-760X. PMC   2898064 . PMID   20482850.

Other molecular pathway databases

  1. Bohler, Anwesha; Wu, Guanming; Kutmon, Martina; Pradhana, Leontius Adhika; Coort, Susan L.; Hanspers, Kristina; Haw, Robin; Pico, Alexander R.; Evelo, Chris T.; Blackwell, Kim T. (20 May 2016). "Reactome from a WikiPathways Perspective". PLOS Computational Biology. 12 (5): e1004941. Bibcode:2016PLSCB..12E4941B. doi: 10.1371/journal.pcbi.1004941 . PMC   4874630 . PMID   27203685.