Gene Expression Omnibus

Last updated
Gene Expression Omnibus
US-NLM-NCBI-Logo.svg
Content
DescriptionGene expression profiling and RNA methylation database
Contact
Research center National Center for Biotechnology Information
Primary citationEdgar R & al. (2002) [1]
Access
Website https://www.ncbi.nlm.nih.gov/gds/?term=

Gene Expression Omnibus (GEO) is a database for gene expression profiling and RNA methylation profiling managed by the National Center for Biotechnology Information (NCBI). [1] These high-throughput screening genomics data are derived from microarray or RNA-Seq experimental data. [2] These data need to conform to the minimum information about a microarray experiment (MIAME) format. [3]

Glossary

AbbreviationDescription
GDSDataSet accession
GPLPlatform accession
GSESeries accession
GSMSample accession

Related Research Articles

DNA microarray Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. Also for identification of structural variations and measurement of gene expression.

Biological database

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

GC-content The percentage of guanine and cytosine in DNA or RNA molecules

In molecular biology and genetics, GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.

Generic Model Organism Database

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

PHI-base

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database is maintained by researchers at Rothamsted Research, together with external collaborators since 2005. Since April 2017 PHI-base is part of ELIXIR, the European life-science infrastructure for biological information via its ELIXIR-UK node.

Sib RNA

Sib RNA refers to a group of related non-coding RNA. They were originally named QUAD RNA after they were discovered as four repeat elements in Escherichia coli intergenic regions. The family was later renamed Sib when it was discovered that the number of repeats is variable in other species and in other E. coli strains.

MicrobesOnline

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

6C RNA

6C RNA is a class of non-coding RNA present in actinomycetes. 6C RNA was originally discovered as a conserved RNA structure having two stem-loops each containing six or more cytosine (C) residues. Later work revealed that 6C RNAs in Streptomyces coelicolor and Streptomyces avermitilis have predicted rho-independent transcription terminators, and microarray and reverse-transcriptase PCR experiments indicate that the S. coelicolor version is transcribed as RNA. Transcription of the S. coelicolor RNA increases during sporulation, and three transcripts were detected that overlap the 6C motif, but have different apparent start and stop sites.

Xenbase is a Model Organism Database (MOD), providing informatics resources, as well as genomic and biological data on Xenopus frogs. Xenbase has been available since 1999, and covers both X. laevis and X. tropicalis Xenopus varieties. As of 2013 all of its services are running on virtual machines in a private cloud environment, making it one of the first MODs to do so. Other than hosting genomics data and tools, Xenbase supports the Xenopus research community though profiles for researchers and laboratories, and job and events postings.

The Epigenomics database at the National Center for Biotechnology Information was a database for whole-genome epigenetics data sets. It was retired on 1 June 2016.

KIAA1704

KIAA1704, also known as LSR7, is a protein that in humans is encoded by the GPALPP1 gene. The function of KIAA1704 is not yet well understood. KIAA1704 contains one domain of unknown function, DUF3752. The protein contains a conserved, uncharged, repeated motif GPALPP(GF) near the N terminus and an unusual, conserved, mixed charge throughout. It is predicted to be localized to the nucleus.

Cancer systems biology encompasses the application of systems biology approaches to cancer research, in order to study the disease as a complex adaptive system with emerging properties at multiple biological scales. Cancer systems biology represents the application of systems biology approaches to the analysis of how the intracellular networks of normal cells are perturbed during carcinogenesis to develop effective predictive models that can assist scientists and clinicians in the validations of new therapies and drugs. Tumours are characterized by genomic and epigenetic instability that alters the functions of many different molecules and networks in a single cell as well as altering the interactions with the local environment. Cancer systems biology approaches, therefore, are based on the use of computational and mathematical methods to decipher the complexity in tumorigenesis as well as cancer heterogeneity.

The Expression Atlas is a database maintained by the European Bioinformatics Institute that provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part. Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology lies in understanding how the same genome can give rise to different cell types and how gene expression is regulated.

Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).

References

  1. 1 2 Edgar, R; Domrachev, M; Lash, AE (1 January 2002). "Gene Expression Omnibus: NCBI gene expression and hybridization array data repository". Nucleic Acids Research. 30 (1): 207–10. doi: 10.1093/nar/30.1.207 . PMC   99122 . PMID   11752295.
  2. Barrett, T; Wilhite, SE; Ledoux, P; Evangelista, C; Kim, IF; Tomashevsky, M; Marshall, KA; Phillippy, KH; Sherman, PM; Holko, M; Yefanov, A; Lee, H; Zhang, N; Robertson, CL; Serova, N; Davis, S; Soboleva, A (January 2013). "NCBI GEO: archive for functional genomics data sets--update". Nucleic Acids Research. 41 (Database issue): D991-5. doi: 10.1093/nar/gks1193 . PMC   3531084 . PMID   23193258.
  3. Brazma, A; Hingamp, P; Quackenbush, J; Sherlock, G; Spellman, P; Stoeckert, C; Aach, J; Ansorge, W; Ball, CA; Causton, HC; Gaasterland, T; Glenisson, P; Holstege, FC; Kim, IF; Markowitz, V; Matese, JC; Parkinson, H; Robinson, A; Sarkans, U; Schulze-Kremer, S; Stewart, J; Taylor, R; Vilo, J; Vingron, M (December 2001). "Minimum information about a microarray experiment (MIAME)-toward standards for microarray data". Nature Genetics. 29 (4): 365–71. doi: 10.1038/ng1201-365 . PMID   11726920. S2CID   6994467.