Databases for oncogenomic research are biological databases dedicated to cancer data and oncogenomic research. They can be a primary source of cancer data, offer a certain level of analysis (processed data) or even offer online data mining.
The table below gives an overview of databases for that serve specifically for oncogenomic research. Note that this is not a comprehensive list and does not contain databases that have a generic focus. You may find databases containing cancer data among the List of biological databases or Microarray databases.
Database | Institute / Organization | Alteration Types | Primary Source [t 1] | Processed Data [t 2] | Organisms | Cell lines [t 3] | Public Data [t 4] | Restricted Data [t 5] | |
---|---|---|---|---|---|---|---|---|---|
The BioExpress® Oncology Suite from Ocimum Bio Solutions contains gene expression data from primary, metastatic, and benign tumor samples, and normal samples, including matched adjacent controls. (BioExpress Oncology Suite) → | Ocimum Bio Solutions, United States | Gene Expression | Yes | Yes | Human, Rat and Mouse | Yes | No | Yes | |
ClinicalTrials.gov contains descriptions and some results from clinical trials, many of which are genomically targeted. → | National Institutes of Health, United States | Various | Yes | Yes | Human | No | Yes | No | |
Project Data Sphere from The CEO Life Sciences Consortium allows researchers to share, integrate, and analyze de-identified patient-level, comparator arm, phase III cancer data. → | The CEO Life Sciences Consortium, United States | Various | No | Yes | Human | No | Yes | Yes | |
Catalogue Of Somatic Mutations In Cancer (COSMIC) → | Wellcome Trust Sanger Institute, UK | Mutation | No | Yes | Human | Yes | Yes | Yes | |
cBio Cancer Genomics Portal → | Memorial Sloan-Kettering Cancer Center, United States | Copy number, Mutation, Methylation, Gene Expression, miRNA Expression, Protein, Phosphorylation | No | Yes | Human | No | Yes | No | |
International Cancer Genome Consortium → | Worldwide | Mutation | Yes | Yes | Human | No | Yes | Yes | |
Integrative Oncogenomics Cancer Browser (IntOGen) → | Universitat Pompeu Fabra, Spain | Copy number, Mutation, Gene Expression | No | Yes | Human | No | Yes | No | |
Mouse Retrovirus Tagged Cancer Gene Database → | Institute of Molecular and Cell Biology, Singapore | Mutations | Yes | Yes | Mouse | No | Yes | No | |
Mouse Tumor Biology Database → [note 1] | The Jackson Laboratory, United States | Copy number, Mutation, Methylation, Gene Expression | No | No | Mouse | No | No | No | |
OncoDB.HCC → | Academia Sinica, Taiwan | Copy number, Gene Expression, QTL | No | Yes | Human, Mouse, Rat | No | Yes | No | |
Genevestigator → contains data from numerous public repositories including GEO and renowned cancer research projects as TCGA. | Nebion AG, Switzerland | Gene Expression | No | Yes | Human, Mouse, Rat, Monkey, Dog and others | Yes | Yes | Yes | |
OncoLand from Omicsoft Corporation contains data from large-scale Genomic projects, include TCGA, ICGC and others] | Omicsoft Corporation, United States | Copy number, Mutation, Methylation, Gene Expression, miRNA Expression, Protein, Phosphorylation | Yes | Yes | Human, Rat and Mouse | Yes | Yes | Yes | |
Oncomine → | Compendia Bioscience, Inc., United States | Gene Expression | No | Yes | Human | Yes | No | Yes | |
Oncoreveal → | Boğaziçi University, Turkey | Gene Expression | No | Yes | Human | No | Yes | No | |
Progenetix → | Universität Zürich, Switzerland | Copy number | No | Yes | Human | Yes | Yes | No | |
The Cancer Genome Atlas → | National Cancer Institute, United States | Copy number, Mutation, Methylation, Gene Expression, miRNA expression | Yes | Yes | Human | No | Yes | Yes | |
CancerResource → | University Medicine Berlin, Germany | ||||||||
Roche Cancer Genome Database (RCGDB) | Roche Diagnostics, Penzberg, Germany | ||||||||
Network of Cancer Genes → | King's College London, UK | Mutation | No | Yes | Human | No | Yes | No | |
MutaGene | NCBI, NIH, USA | Mutation | No | Yes | Human | No | Yes | No | |
Biostatistics is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.
Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.
In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.
Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology. There are a number of applications that are developed from this research, and these are also considered parts of the process. The base system of analysis revolves around general genetics. Basic studies include identification of genes and inherited disorders. This research has been conducted for centuries on both a large-scale physical observation basis and on a more microscopic scale. Genetic analysis can be used generally to describe methods both used in and resulting from the sciences of genetics and molecular biology, or to applications resulting from this research.
Genevestigator is an application consisting of a gene expression database and tools to analyse the data. It exists in two versions, biomedical and plant, depending on the species of the underlying microarray and RNAseq as well as single-cell RNA-sequencing data. It was started in January 2004 by scientists from ETH Zurich and is currently developed and commercialized by Nebion AG.
Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.
Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.
ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.
A microarray database is a repository containing microarray gene expression data. The key uses of a microarray database are to store the measurement data, manage a searchable index, and make the data available to other applications for analysis and interpretation.
Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.
Geniom RT Analyzer is an instrument used in molecular biology for diagnostic testing. The Geniom RT Analyzer utilizes the dynamic nature of tissue microRNA levels as a biomarker for disease progression. The Geniom analyzer incorporates microfluidic and biochip microarray technology in order to quantify microRNAs via a Microfluidic Primer Extension Assay (MPEA) technique.
BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.