Expression Atlas

Last updated
Expression Atlas
Content
DescriptionGene expression across species and conditions
Data types
captured
Microarray Data and other Gene Expression
Organisms Human, Mouse, Rat, Fruit-fly, Chicken, Roundworm, Wild boar, Zebrafish, Cow and others.
Contact
Research center EMBL-EBI
Primary citation PMID   31665515
Release dateMarch 2020
Access
Website www.ebi.ac.uk/gxa/home
Tools
Web Advanced search, bulk retrieval/download
Miscellaneous
Data release
frequency
3 to 4 times a year
Curation policyManual curation of every study.

The Expression Atlas is a database maintained by the European Bioinformatics Institute that provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. [1] The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part (cell types/tissues). Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:

Contents

Baseline Atlas

The Baseline Atlas provides information about which gene products are present (and at what abundance) under "normal" conditions. This component of the Expression Atlas consists of RNA-seq experiments from ArrayExpress repositories. It aims to answer questions such as:

Differential Atlas

The Differential Atlas allows users to identify genes that are up- or down-regulated in different experimental conditions.

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">DNA microarray</span> Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">Gene expression profiling</span>

In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

Gene Expression Omnibus (GEO) is a database for gene expression profiling and RNA methylation profiling managed by the National Center for Biotechnology Information (NCBI). These high-throughput screening genomics data are derived from microarray or RNA-Seq experimental data. These data need to conform to the minimum information about a microarray experiment (MIAME) format.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a sequencing technique that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample, representing an aggregated snapshot of the cells' dynamic pool of RNAs, also known as transcriptome.

<span class="mw-page-title-main">Bgee</span>

Bgee is a database maintained by the Swiss Institute of Bioinformatics and the University of Lausanne for retrieval and comparison of gene expression patterns from RNA-Seq, scRNA-Seq, Microarray, In situ hybridization and EST studies, across multiple animal species. Bgee provides an intuitive answer to the question where is a gene expressed? and supports research in cancer and agriculture, as well as evolutionary biology.

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

<span class="mw-page-title-main">C1orf198</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 198 (C1orf198) is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.

<span class="mw-page-title-main">CCDC188</span> Protein found in humans

CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.

References

  1. Papatheodorou I, Moreno P, Manning J, Fuentes AM, George N, Fexova S, et al. (Jan 2020). "Expression Atlas update: from tissues to single cells". Nucleic Acids Research. 48 (D1): D77–D83. doi: 10.1093/nar/gkz947 . PMC   7145605 . PMID   31665515.

Further reading