List of databases for oncogenomic research

Last updated January 28, 2025

Databases for oncogenomic research are biological databases dedicated to cancer data and oncogenomic research. They can be a primary source of cancer data, offer a certain level of analysis (processed data) or even offer online data mining.

List

The table below gives an overview of databases for that serve specifically for oncogenomic research. Note that this is not a comprehensive list and does not contain databases that have a generic focus. You may find databases containing cancer data among the List of biological databases or Microarray databases.

Database	Institute / Organization	Alteration Types	Primary Source ^{[t 1]}	Processed Data ^{[t 2]}	Organisms	Cell lines ^{[t 3]}	Public Data ^{[t 4]}	Restricted Data ^{[t 5]}
The BioExpress® Oncology Suite from Ocimum Bio Solutions contains gene expression data from primary, metastatic, and benign tumor samples, and normal samples, including matched adjacent controls. (BioExpress Oncology Suite) →	Ocimum Bio Solutions, United States	Gene Expression	Yes	Yes	Human, Rat and Mouse	Yes	No	Yes
ClinicalTrials.gov contains descriptions and some results from clinical trials, many of which are genomically targeted. →	National Institutes of Health, United States	Various	Yes	Yes	Human	No	Yes	No
Project Data Sphere from The CEO Life Sciences Consortium allows researchers to share, integrate, and analyze de-identified patient-level, comparator arm, phase III cancer data. →	The CEO Life Sciences Consortium, United States	Various	No	Yes	Human	No	Yes	Yes
Catalogue Of Somatic Mutations In Cancer (COSMIC) →	Wellcome Trust Sanger Institute, UK	Mutation	No	Yes	Human	Yes	Yes	Yes
cBio Cancer Genomics Portal →	Memorial Sloan-Kettering Cancer Center, United States	Copy number, Mutation, Methylation, Gene Expression, miRNA Expression, Protein, Phosphorylation	No	Yes	Human	No	Yes	No
International Cancer Genome Consortium →	Worldwide	Mutation	Yes	Yes	Human	No	Yes	Yes
Integrative Oncogenomics Cancer Browser (IntOGen) →	Universitat Pompeu Fabra, Spain	Copy number, Mutation, Gene Expression	No	Yes	Human	No	Yes	No
Mouse Retrovirus Tagged Cancer Gene Database →	Institute of Molecular and Cell Biology, Singapore	Mutations	Yes	Yes	Mouse	No	Yes	No
Mouse Tumor Biology Database → ^{[note 1]}	The Jackson Laboratory, United States	Copy number, Mutation, Methylation, Gene Expression	No	No	Mouse	No	No	No
OncoDB.HCC →	Academia Sinica, Taiwan	Copy number, Gene Expression, QTL	No	Yes	Human, Mouse, Rat	No	Yes	No
Genevestigator → contains data from numerous public repositories including GEO and renowned cancer research projects as TCGA.	Nebion AG, Switzerland	Gene Expression	No	Yes	Human, Mouse, Rat, Monkey, Dog and others	Yes	Yes	Yes
OncoLand from Omicsoft Corporation contains data from large-scale Genomic projects, include TCGA, ICGC and others]	Omicsoft Corporation, United States	Copy number, Mutation, Methylation, Gene Expression, miRNA Expression, Protein, Phosphorylation	Yes	Yes	Human, Rat and Mouse	Yes	Yes	Yes
Oncomine →	Compendia Bioscience, Inc., United States	Gene Expression	No	Yes	Human	Yes	No	Yes
Oncoreveal →	Boğaziçi University, Turkey	Gene Expression	No	Yes	Human	No	Yes	No
Progenetix →	Universität Zürich, Switzerland	Copy number	No	Yes	Human	Yes	Yes	No
The Cancer Genome Atlas →	National Cancer Institute, United States	Copy number, Mutation, Methylation, Gene Expression, miRNA expression	Yes	Yes	Human	No	Yes	Yes
CancerResource →	University Medicine Berlin, Germany
Roche Cancer Genome Database (RCGDB)	Roche Diagnostics, Penzberg, Germany
Network of Cancer Genes →	King's College London, UK	Mutation	No	Yes	Human	No	Yes	No
MutaGene	NCBI, NIH, USA	Mutation	No	Yes	Human	No	Yes	No

Notes

↑ Only contains references to biological data

Related Research Articles

Biostatistics is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and data science, the field also has foundations in applied mathematics, molecular biology, cell biology, chemistry, and genetics.

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

<span class="mw-page-title-main">Gene expression profiling</span> Detection of mRNA molecules

In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.

Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology. There are a number of applications that are developed from this research, and these are also considered parts of the process. The base system of analysis revolves around general genetics. Basic studies include identification of genes and inherited disorders. This research has been conducted for centuries on both a large-scale physical observation basis and on a more microscopic scale. Genetic analysis can be used generally to describe methods both used in and resulting from the sciences of genetics and molecular biology, or to applications resulting from this research.

Genevestigator is an application consisting of a gene expression database and tools to analyse the data. It exists in two versions, biomedical and plant, depending on the species of the underlying microarray and RNAseq as well as single-cell RNA-sequencing data. It was started in January 2004 by scientists from ETH Zurich and is currently developed and commercialized by Nebion AG.

Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.

A microarray database is a repository containing microarray gene expression data. The key uses of a microarray database are to store the measurement data, manage a searchable index, and make the data available to other applications for analysis and interpretation.

Geniom RT Analyzer is an instrument used in molecular biology for diagnostic testing. The Geniom RT Analyzer utilizes the dynamic nature of tissue microRNA levels as a biomarker for disease progression. The Geniom analyzer incorporates microfluidic and biochip microarray technology in order to quantify microRNAs via a Microfluidic Primer Extension Assay (MPEA) technique.

<span class="mw-page-title-main">BioMart</span>

BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.

Secretomics is a type of proteomics which involves the analysis of the secretome—all the secreted proteins of a cell, tissue or organism. Secreted proteins are involved in a variety of physiological processes, including cell signaling and matrix remodeling, but are also integral to invasion and metastasis of malignant cells. Secretomics has thus been especially important in the discovery of biomarkers for cancer and understanding molecular basis of pathogenesis. The analysis of the insoluble fraction of the secretome has been termed matrisomics.

References

↑ The database is the publication site for (some of) its cancer raw data
↑ The database contains cancer data at a certain level of analysis (non-raw data)
↑ The database also contains cell line data
↑ The database contains cancer data that is available for everyone
↑ The database contains cancer data that is only available under some restriction

External links

National Cancer Institute's List of Datasets and Databases

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[6] Only contains references to biological data

[1] The database is the publication site for (some of) its cancer raw data

[2] The database contains cancer data at a certain level of analysis (non-raw data)

[3] The database also contains cell line data

[4] The database contains cancer data that is available for everyone

[5] The database contains cancer data that is only available under some restriction

[t 1]

[t 2]

[t 3]

[t 4]

[t 5]

[note 1]