Reactome

Last updated

Reactome is a free online database of biological pathways. [1] [2] [3] It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.

Contents

Reactome is maintained by an international multidisciplinary team from OICR, OHSU, EMBL-EBI and NYULMC, with expertise in pathway curation and annotation, software development, and training and outreach, dedicated to providing the research community with openly accessible biological pathway knowledge. The Reactome team is led by Lincoln Stein (OICR). Peter D'Eustachio (NYULMC), Henning Hermjakob (EMBL-EBI), Guanming Wu (OHSU). The Reactome helpdesk can be reached via email.

Reactome: a database of reactions, pathways and biological processes.
R-purple small.png
Content
DescriptionReactome: a database of reactions, pathways and biological processes.
Contact
Primary citation PMID   37941124
Access
Data format BioPAX
SBML
Website https://reactome.org
Download URL https://reactome.org/download-data
Web service URL https://reactome.org/ContentService/
Miscellaneous
License https://reactome.org/license
Data release
frequency
https://reactome.org/about/release-calendar

The website can be used to browse pathways and submit data to a suite of data analysis tools. The underlying data is fully downloadable in a number of standard formats including PDF, SBML, Neo4j GraphDB, MySQL, PSI-MITAB, and BioPAX. Pathway diagrams use a Process Description (PD) Systems Biology Graphical Notation (SBGN)-based style.

The core unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes and small molecules) participating in reactions form a network of biological interactions and are grouped into pathways. Examples of biological pathways in Reactome include signal transduction, innate and acquired immune function, transcriptional regulation, programmed cell death and classical intermediary metabolism.

The pathways represented in Reactome are species-specific, with each pathway step supported by literature citations that contain an experimental verification of the process represented. If no experimental verification using human reagents exists, pathways may contain steps manually inferred from non-human experimental details, but only if an expert biologist, named as Author of the pathway, and a second biologist, names as Reviewer, agree that this is a valid inference to make. The human pathways are used to computationally generate by an orthology-based process derived pathways in other organisms.

Database organization

Reactome database releases occur quarterly.

In Reactome, human biological processes are annotated by breaking them down into series of molecular events. Like classical chemistry reactions each Reactome event has input physical entities (substrates) which interact, possibly facilitated by enzymes or other molecular catalysts, to generate output physical entities (products).

Reactions include the classical chemical interconversions of intermediary metabolism, binding events, complex formation, transport events that direct molecules between cellular compartments, and events such as the activation of a protein by cleavage of one or more of its peptide bonds. Individual events can be grouped together into pathways.

Physical entities can be small molecules like glucose or ATP, or large molecules like DNA, RNA, and proteins, encoded directly or indirectly in the human genome. Physical entities are cross-referenced to relevant external databases, such as UniProt for proteins and ChEBI for small molecules. Localization of molecules to subcellular compartments is a key feature of the regulation of human biological processes, so molecules in the Reactome database are associated with specific locations. Thus in Reactome instances of the same chemical entity in different locations (e.g., extracellular glucose and cytosolic glucose) are treated as distinct chemical entities.

The Gene Ontology controlled vocabularies are used to describe the subcellular locations of molecules and reactions, molecular functions, and the larger biological processes that a specific reaction is part of.

Database content

The database contains curated annotations that cover a diverse set of topics in molecular and cellular biology. Details of annotation topics can be found in the table of contents. Details of current and future annotation projects can be found in the calendar of annotation projects.

Reactome invites biological experts as reviewers for completed pathways that are ready for external review. Reviewers will be credited with authorship or reviewership for contributions. Each pathway is associated with a DOI and can be cited as a publication. Reactome contributions in can be easily claimed using the ORCID claiming feature.

The pathway content at Reactome is freely available for download in several data and image formats. Reactome is completely open access and open source. Usage of Reactome material is covered by two Creative Commons licenses. The terms of the Creative Commons Public Domain (CC0) License apply to all Reactome annotation files, e.g. identifier mapping data, specialized data files, and interaction data derived from Reactome. The terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License apply to all software and code, e.g. relating to the functionality of the reactome.org, derived websites and webservices, the Curator Tool, the Functional Interaction application, SQL and Graph Database data dumps, and Pathway Illustrations (Enhanced High-Level Diagrams), Icon Library, Art and Branding Materials. Reactome can be cited using their major publications or by individual pathways or images.

Tools

There are tools on the website for viewing an interactive pathway diagram, performing pathway mapping and pathway over-representation analysis and for overlaying expression data onto Reactome pathways. The pathway mapping and over-representation tools take a single column of protein/compound identifiers, Uniprot and ChEBI accessions are preferred but the interface will accept and interpret many other identifiers or symbols. Mixed identifiers can be used. Over-representation results are presented as a list of statistically over-represented pathways.

Expression data is submitted in a multi-column format, the first column identifying the protein, additional columns are expected to be numeric expression values, they can in fact be any numeric value, e.g. differential expression, quantitative proteomics, GWAS scores. The expression data is represented as colouring of the corresponding proteins in pathway diagrams, using the colours of the visible spectrum so 'hot' red colours represent high values. If multiple columns of numeric data are submitted the overlay tool can display them as separate 'experiments', e.g. timepoints or a disease progression.

The database can be browsed and searched as an on-line textbook. [4] An online users guide is available. Users can also download the current data set or individual pathways and reactions in a variety of formats including PDF, BioPAX, and SBML [5]

Reactome also has a ReactomeGSA [6] tool, integrated into the Reactome Analysis Tools that allows comparative pathway analyses of multi-omics datasets, with compatibility with single-cell RNA-seq data. Public data from EBI Expression Atlas, Single Cell Expression Atlas, and NCBI GREIN GEO data can be integrated into the analysis. ReactomeGSA is also available as a R Bioconductor package.

Reactome also has a ReactomeIDG [7] web portal, since 2023, aimed to place dark proteins in the context of manually curated, highly reliable Reactome pathways, to facilitate the understanding of functions and predicting therapeutic potential of dark or understudied proteins. Enhanced visualization features implemented at the portal allow users to investigate the functional contexts for dark proteins based on tissue-specific gene or protein expression, drug-target interactions, or protein or gene pairwise relationships in the original Reactome's systems biology graph notation (SBGN) diagrams or the new simplified functional interaction (FI) network view of pathways.

ReactomeFIViz is a Cytoscape app designed to find pathways and network patterns related to diseases. The app accesses Reactome pathways, perform pathway enrichment analysis for a set of genes, visualize hit pathways, and investigate functional relationships among genes in hit pathways. The app also accesses the Reactome Functional Interaction (FI) network. [8]

See also

There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, described on this page.

See Plant Reactome.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Interactome</span> Complete set of molecular interactions in a biological cell

In molecular biology, an interactome is the whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules but can also describe sets of indirect interactions among genes.

A biochemical cascade, also known as a signaling cascade or signaling pathway, is a series of chemical reactions that occur within a biological cell when initiated by a stimulus. This stimulus, known as a first messenger, acts on a receptor that is transduced to the cell interior through second messengers which amplify the signal and transfer it to effector molecules, causing the cell to respond to the initial stimulus. Most biochemical cascades are series of events, in which one event triggers the next, in a linear fashion. At each step of the signaling cascade, various controlling factors are involved to regulate cellular actions, in order to respond effectively to cues about their changing internal and external environments.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">Janet Thornton</span> British bioinformatician and academic

Dame Janet Maureen Thornton, is a senior scientist and director emeritus at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL). She is one of the world's leading researchers in structural bioinformatics, using computational methods to understand protein structure and function. She served as director of the EBI from October 2001 to June 2015, and played a key role in ELIXIR.

BioPAX is a RDF/OWL-based standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data. Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

The ConsensusPathDB is a molecular functional interaction database, integrating information on protein interactions, genetic interactions signaling, metabolism, gene regulation, and drug-target interactions in humans. ConsensusPathDB currently includes such interactions from 32 databases. ConsensusPathDB is freely available for academic use under http://ConsensusPathDB.org.

<span class="mw-page-title-main">Biological network</span> Method of representing systems

A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typical graphing representation consists of a set of nodes connected by edges.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play a key role in advanced studies of genomics.

<span class="mw-page-title-main">WikiPathways</span>

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is originally built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems. WikiPathways was founded in 2008 by Thomas Kelder, Alex Pico, Martijn Van Iersel, Kristina Hanspers, Bruce Conklin and Chris Evelo. Current architects are Alex Pico and Martina Summer-Kutmon.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">Lincoln Stein</span> American scientist and academic

Lincoln David Stein is a scientist and Professor in bioinformatics and computational biology at the Ontario Institute for Cancer Research.

Virtual Cell (VCell) is an open-source software platform for modeling and simulation of living organisms, primarily cells. It has been designed to be a tool for a wide range of scientists, from experimental cell biologists to theoretical biophysicists.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

References

  1. Croft, D.; O'Kelly, G.; Wu, G.; Haw, R.; Gillespie, M.; Matthews, L.; Caudy, M.; Garapati, P.; Gopinath, G.; Jassal, B.; Jupe, S.; Kalatskaya, I.; Mahajan, S.; May, B.; Ndegwa, N.; Schmidt, E.; Shamovsky, V.; Yung, C.; Birney, E.; Hermjakob, H.; d'Eustachio, P.; Stein, L. (2010). "Reactome: A database of reactions, pathways and biological processes". Nucleic Acids Research. 39 (Database issue): D691–D697. doi:10.1093/nar/gkq1018. PMC   3013646 . PMID   21067998.
  2. Joshi-Tope, G.; Gillespie, M.; Vastrik, I.; d'Eustachio, P.; Schmidt, E.; De Bono, B.; Jassal, B.; Gopinath, G.; Wu, G.; Matthews, L.; Lewis, S.; Birney, E.; Stein, L. (2004). "Reactome: A knowledgebase of biological pathways". Nucleic Acids Research. 33 (Database issue): D428–D432. doi:10.1093/nar/gki072. PMC   540026 . PMID   15608231.
  3. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E, Hermjakob H, Stein L, D'Eustachio P (2014). "The Reactome pathway knowledgebase". Nucleic Acids Res. 42 (Database issue): D472–7. doi:10.1093/nar/gkt1102. PMC   3965010 . PMID   24243840.
  4. Haw, R; Stein, L (Jun 2012). "Using the reactome database". Current Protocols in Bioinformatics. Chapter 8: 8.7.1–8.7.23. doi:10.1002/0471250953.bi0807s38. PMC   3427849 . PMID   22700314.
  5. Croft, D (2013). "Building Models Using Reactome Pathways as Templates". In Silico Systems Biology. Methods in Molecular Biology. Vol. 1021. pp. 273–83. doi:10.1007/978-1-62703-450-0_14. ISBN   978-1-62703-449-4. PMID   23715990.
  6. Griss, Johannes; Viteri, Guilherme; Sidiropoulos, Konstantinos; Nguyen, Vy; Fabregat, Antonio; Hermjakob, Henning (December 2020). "ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis". Molecular & Cellular Proteomics. 19 (12): 2115–2125. doi: 10.1074/mcp.TIR120.002155 . PMC   7710148 . PMID   32907876.
  7. Beavers, Deidre; Brunson, Timothy; Sanati, Nasim; Matthews, Lisa; Haw, Robin; Shorser, Solomon; Sevilla, Cristoffer; Viteri, Guilherme; Conley, Patrick; Rothfels, Karen; Hermjakob, Henning; Stein, Lincoln; D'Eustachio, Peter; Wu, Guanming (July 2023). "Illuminate the Functions of Dark Proteins Using the Reactome-IDG Web Portal". Current Protocols. 3 (7): e845. doi:10.1002/cpz1.845. ISSN   2691-1299. PMC  10399304. PMID   37467006.
  8. Wu, Guanming; Feng, Xin; Stein, Lincoln (2010). "A human functional protein interaction network and its application to cancer data analysis". Genome Biology. 11 (5): R53. doi: 10.1186/gb-2010-11-5-r53 . ISSN   1474-760X. PMC   2898064 . PMID   20482850.

Other molecular pathway databases

  1. Bohler, Anwesha; Wu, Guanming; Kutmon, Martina; Pradhana, Leontius Adhika; Coort, Susan L.; Hanspers, Kristina; Haw, Robin; Pico, Alexander R.; Evelo, Chris T.; Blackwell, Kim T. (20 May 2016). "Reactome from a WikiPathways Perspective". PLOS Computational Biology. 12 (5): e1004941. Bibcode:2016PLSCB..12E4941B. doi: 10.1371/journal.pcbi.1004941 . PMC   4874630 . PMID   27203685.