Genomatix

Last updated
Genomatix GmbH
Type GmbH
Industry Bioinformatics
Founded1997
FounderDr. Thomas Werner
Headquarters Munich, Germany
Website http://www.genomatix.de

Genomatix GmbH is a computational biology company headquartered in Munich, Germany, with a seat of business in Ann Arbor, Michigan, United States.

Contents

History

Genomatix was founded in 1997 by Dr. Thomas Werner as a spin-off from the Helmholtz Zentrum München (formerly "GSF, National German Research Institute for Environment and Health"). Helmholtz Zentrum Munich is part of the Helmholtz Association of German Research Centers.

Genomatix software tools

Genomatix offers technologies and databases for genome annotation and regulation analysis.

Genomatix's product portfolio contains products for:

• Literature and pathway mining
Transcription factor analysis
• Genome annotation integrating a wide variety of transcript sources with a special focus on regulatory regions
• Analysis technology for high throughput genomic technologies (microarrays and next generation sequencing)

Literature mining
LitInspector is a literature search tool that provides gene and signal transduction pathway mining within NCBI's PubMed database. [1] [2]

Pathway mining
GePS
BiblioSphere

Current research

Personalized medicine has developed into a major field for Genomatix. [3] Genomatix is involved in several projects and international conferences, including the fifth Santorini Conference, "Functional Genomics Towards Personalized Health Care." [4]

Since 2008, Genomatix has strongly focused on next-generation sequencing data analysis. Because of the large amount of data and the need for high-end computing power, Genomatix deploys its products as in-house installations (hardware and software bundles).

Two systems are available:

1. The Genomatix Mining Station (GMS) is based on a proprietary genomic pattern recognition paradigm, or GenomeThesaurus, which allows for the input of raw sequence reads, along with optional quality files from any deep sequencing hardware. It provides mapping of sequences of any length (starting from 8bp) with no practical limits on the number of point mutations and/or insertions and deletions that can be taken into account during the mapping process. Depending on the nature of the experiment, the GMS can provide SNP detection and genotyping, copy number analysis, and small RNA analysis. For ChIPseq data, the GMS performs clustering, peak finding, and automated binding pattern identification. For RNAseq experiments, normalized expression values are calculated at the exon and transcript levels. A special GenomeThesaurus is also provided for potential splice junctions, which allows for splice junction analysis and the identification of new transcriptional units.

For genomic re-sequencing and newly sequenced genomes, a de-novo assembly is provided.

2. The Genomatix Genome Analyzer (GGA) provides downstream software tools and databases for the deep biological analysis of data coming from the GMS. It allows for integration and visualization of the terabytes of background annotation in the ElDorado genome database. GGA extensively annotates genomic coordinates and surrounding areas derived by the GMS or other mapping procedures. Clustering and peak finding, analysis for phylogenetic conservation, large-scale correlation analysis with annotated genomic elements, meta-analysis of data correlation between different experiments, pathway mining for groups of identified genes, transcription factor binding site (TFBS) analysis (identification, over-representation, binding partner analysis, framework identification, phylogenetic conservation, regulatory SNP effects) can be carried out on the GGA.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

<span class="mw-page-title-main">Tiling array</span>

Tiling arrays are a subtype of microarray chips. Like traditional microarrays, they function by hybridizing labeled DNA or RNA target molecules to probes fixed onto a solid surface.

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play a key role in advanced studies of genomics.

<span class="mw-page-title-main">TMEM242</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

ANNOVAR is a bioinformatics software tool for the interpretation and prioritization of single nucleotide variants (SNVs), insertions, deletions, and copy number variants (CNVs) of a given genome.

Personalized onco-genomics (POG) is the field of oncology and genomics that is focused on using whole genome analysis to make personalized clinical treatment decisions. The program was devised at British Columbia's BC Cancer Agency and is currently being led by Marco Marra and Janessa Laskin. Genome instability has been identified as one of the underlying hallmarks of cancer. The genetic diversity of cancer cells promotes multiple other cancer hallmark functions that help them survive in their microenvironment and eventually metastasise. The pronounced genomic heterogeneity of tumours has led researchers to develop an approach that assesses each individual's cancer to identify targeted therapies that can halt cancer growth. Identification of these "drivers" and corresponding medications used to possibly halt these pathways are important in cancer treatment.

H3R17me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 17th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

H3R8me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 8th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.

References

  1. Frisch, M; Klocke, B; Haltmeier, M; Frech, K (2009). "LitInspector: literature and signal transduction pathway mining in PubMed abstracts". Nucleic Acids Res. 37 (Web Server issue): W135-40. doi:10.1093/nar/gkp303. PMC   2703902 . PMID   19417065.
  2. Archived 2019-05-11 at the Wayback Machine LitInspector start page.
  3. Archived 2010-07-25 at the Wayback Machine Genomatix and personalized medicine.
  4. Archived 2010-03-04 at the Wayback Machine Conference about personalized health care.