TRANSFAC

Last updated
TRANSFAC
Database.png
Content
DescriptionTranscription Factor Database
Data types
captured
Eukaryotic transcription factors, their binding sites and binding profiles
Organisms eukaryotes
Contact
Research center Helmholtz Centre for Infection Research; BIOBASE GmbH; geneXplain GmbH
Primary citationWingender (2008) [1]
Release date1988
Access
Website TRANSFAC 7.0 Public 2005

TRANSFAC (TRANScription FACtor database) is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.

Contents

Introduction

The origin of the database was an early data collection published 1988. [2] The first version that was released under the name TRANSFAC was developed at the former German National Research Centre for Biotechnology and designed for local installation (now: Helmholtz Centre for Infection Research). [3] In one of the first publicly funded bioinformatics projects, launched in 1993, TRANSFAC developed into a resource that became available on the Internet. [4]

In 1997, TRANSFAC was transferred to a newly established company, BIOBASE, in order to secure long-term financing of the database. Since then, the most up-to-date version has to be licensed, whereas older versions are free for non-commercial users. [5] [6] Since July 2016, TRANSFAC is maintained and distributed by geneXplain GmbH, Wolfenbüttel, Germany. [7]

Content and features

The content of the database is organized in a way that it is centered around the interaction between transcription factors (TFs) and their DNA binding sites (TFBS). TFs are described with regard to their structural and functional features, extracted from the original scientific literature. They are classified to families, classes and superclasses according to the features of their DNA binding domains. [8] [9] [10] [11]

Binding of a TF to a genomic site is documented by specifying the localization of the site, its sequence and the experimental method applied. All sites that refer to one TF, or a group of closely related TFs, are aligned and used to construct a position-specific scoring matrix (PSSM), or count matrix. Many matrices of the TRANSFAC matrix library have been constructed by a team of curators, others were taken from scientific publications.

Applications

The TRANSFAC database can be used as an encyclopedia of eukaryotic transcription factors. The target sequences and the regulated genes can be listed for each TF, which can be used as benchmark for TFBS recognition tools or as training sets for new transcription factor binding sites (TFBS) recognition algorithms. [12] The TF classification enables to analyze such data sets with regard to the properties of the DNA-binding domains. [13] Another application is to retrieve all TFs that regulate a given (set of) gene(s). In the context of systems-biological studies, the TF-target gene relations documented in TRANSFAC were used to construct and analyze transcription regulatory networks. [14] [15] By far the most frequent use of TRANSFAC is the computational prediction of potential TFBS. A number of algorithms exist which either use the individual binding sites or the matrix library for this purpose:

Comparison of matrices with the matrix library of TRANSFAC and other sources:

A number of servers provide genomic annotations computed with the aid of TRANSFAC. [37] [38] Others have used such analyses to infer target gene sets. [39] [40]

See also

Related Research Articles

<span class="mw-page-title-main">Transcription factor</span> Protein that regulates the rate of DNA transcription

In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.

A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.

Cis-regulatory elements (CREs) or cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.

DNA footprinting is a method of investigating the sequence specificity of DNA-binding proteins in vitro. This technique can be used to study protein-DNA interactions both outside and within cells.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.

<span class="mw-page-title-main">DNA binding site</span> Regions of DNA capable of binding to biomolecules

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

Phyloscan is a web service for DNA sequence analysis that is free and open to all users. For locating matches to a user-specified sequence motif for a regulatory binding site, Phyloscan provides a statistically sensitive scan of user-supplied mixed aligned and unaligned DNA sequence data. Phyloscan's strength is that it brings together

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).

RegulonDB is a database of the regulatory network of gene expression in Escherichia coli K-12. RegulonDB also models the organization of the genes in transcription units, operons and regulons. A total of 120 sRNAs with 231 total interactions which all together regulate 192 genes are also included. RegulonDB was founded in 1998 and also contributes data to the EcoCyc database.

YEASTRACT is a curated repository of more than 48000 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae, based on more than 1200 bibliographic references. It also includes the description of about 300 specific DNA binding sites for more than a hundred characterized TFs. Further information about each Yeast gene has been extracted from the Saccharomyces Genome Database (SGD). For each gene the associated Gene Ontology (GO) terms and their hierarchy in GO was obtained from the GO consortium. Currently, YEASTRACT maintains more than 7100 terms from GO. The nucleotide sequences of the promoter and coding regions for Yeast genes were obtained from Regulatory Sequence Analysis Tools (RSAT). All the information in YEASTRACT is updated regularly to match the latest data from SGD, GO consortium, RSA Tools and recent literature on yeast regulatory networks.

The human gene Chromosome 3 open reading frame 14 is a gene of uncertain function located at 3p14.2 near fragile site FRBA3—which falls between this gene and the centromere. Its protein is expected to localize to the nucleus and bind DNA. Orthologs have been identified in all of the major animal groups, minus amphibians and insects, tracing as far back as the sea anemone; indicating an origin of over 1000 mya, highlighting its importance in the animal genome.

<span class="mw-page-title-main">WRKY protein domain</span> Protein domain

The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) cis-regulatory element.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

JASPAR is an open access and widely used database of manually curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFM) and transcription factor flexible models (TFFM) for TFs from species in six taxonomic groups. From the supplied PFMs, users may generate position-specific weight matrices (PWM). The JASPAR database was introduced in 2004. There were seven major updates and new releases in 2006, 2008, 2010, 2014, 2016, 2018, 2020 and 2022, which is the latest release of JASPAR.

HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

References

  1. Wingender E (July 2008). "The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation". Brief. Bioinformatics. 9 (4): 326–32. doi: 10.1093/bib/bbn016 . PMID   18436575.
  2. Wingender E (March 1988). "Compilation of transcription regulating proteins". Nucleic Acids Res. 16 (5): 1879–902. doi:10.1093/nar/16.5.1879. PMC   338188 . PMID   3282223.
  3. Wingender E, Heinemeyer T, Lincoln D (1991). "Regulatory DNA sequences: predictability of their function". Genome Analysis - from Sequence to Function; BioTechForum - Advances in Molecular Genetics (J. Collins, A.J. Driesel, Eds.). 4: 95–108.
  4. Wingender E, Dietze P, Karas H, Knüppel R (January 1996). "TRANSFAC: a database on transcription factors and their DNA binding sites". Nucleic Acids Res. 24 (1): 238–41. doi:10.1093/nar/24.1.238. PMC   145586 . PMID   8594589.
  5. TRANSFAC Public on the gene regulation portal of BIOBASE
  6. Access to TRANSFAC Public via TESS Archived 2012-07-24 at the Wayback Machine at the Computational Biology and Informatics Laboratory (CBIL) of University of Pennsylvania (Penn)
  7. TRANSFAC taken over by geneXplain
  8. Wingender E (1997). "[Classification of eukaryotic transcription factors]". Mol. Biol. (Mosk.) (in Russian). 31 (4): 584–600. PMID   9340487.
  9. Heinemeyer T, Chen X, Karas H, Kel AE, Kel OV, Liebich I, Meinhardt T, Reuter I, Schacherer F, Wingender E (January 1999). "Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms". Nucleic Acids Res. 27 (1): 318–22. doi:10.1093/nar/27.1.318. PMC   148171 . PMID   9847216.
  10. Stegmaier P, Kel AE, Wingender E (2004). "Systematic DNA-binding domain classification of transcription factors". Genome Inform. 15 (2): 276–86. PMID   15706513.
  11. Wingender, E: The classification of transcription factors
  12. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (January 2005). "Assessing computational tools for the discovery of transcription factor binding sites". Nat. Biotechnol. 23 (1): 137–44. doi:10.1038/nbt1053. PMID   15637633. S2CID   3234451.
  13. Narlikar L, Gordân R, Ohler U, Hartemink AJ (July 2006). "Informative priors based on transcription factor structural class improve de novo motif discovery". Bioinformatics. 22 (14): e384–92. doi: 10.1093/bioinformatics/btl251 . PMID   16873497.
  14. Goemann B, Wingender E, Potapov AP (2009). "An approach to evaluate the topological significance of motifs and other patterns in regulatory networks". BMC Syst Biol. 3: 53. doi: 10.1186/1752-0509-3-53 . PMC   2694767 . PMID   19454001.
  15. Kozhenkov S, Dubinina Y, Sedova M, Gupta A, Ponomarenko J, Baitaluk M (2010). "BiologicalNetworks 2.0--an integrative view of genome biology data". BMC Bioinformatics. 11: 610. doi: 10.1186/1471-2105-11-610 . PMC   3019228 . PMID   21190573.
  16. Patch on the free portal of BIOBASE
  17. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (January 2006). "TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes". Nucleic Acids Res. 34 (Database issue): D108–10. doi:10.1093/nar/gkj143. PMC   1347505 . PMID   16381825.
  18. SiteSeer Archived 2011-06-25 at the Wayback Machine of the University of Manchester
  19. Boardman PE, Oliver SG, Hubbard SJ (July 2003). "SiteSeer: Visualisation and analysis of transcription factor binding sites in nucleotide sequences". Nucleic Acids Res. 31 (13): 3572–5. doi:10.1093/nar/gkg511. PMC   168918 . PMID   12824368.
  20. Match on the free portal of BIOBASE
  21. Kel AE, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (July 2003). "MATCH: A tool for searching transcription factor binding sites in DNA sequences". Nucleic Acids Res. 31 (13): 3576–9. doi:10.1093/nar/gkg585. PMC   169193 . PMID   12824369.
  22. TESS (Transcription Element Search System) at CBIL of the University of Pennsylvania
  23. Site Search bei TESS Archived 2012-07-24 at the Wayback Machine
  24. AnGEL CRM Searches Archived 2012-07-24 at the Wayback Machine in the TESS system
  25. PROMO on the ALGGEN server of the Polytechnic University of Catalonia (UPC)
  26. Messeguer X, Escudero R, Farré D, Núñez O, Martínez J, Albà MM (February 2002). "PROMO: detection of known transcription regulatory elements using species-tailored searches". Bioinformatics. 18 (2): 333–4. doi: 10.1093/bioinformatics/18.2.333 . PMID   11847087.
  27. TFM Explorer Archived 2011-09-19 at the Wayback Machine on the bioinformatics software server of the SEQUOIA group
  28. Tonon L, Touzet H, Varré JS (July 2010). "TFM-Explorer: mining cis-regulatory regions in genomes". Nucleic Acids Res. 38 (Web Server issue): W286–92. doi:10.1093/nar/gkq473. PMC   2896114 . PMID   20522509.
  29. MotifMogul of the Institute for Systems Biology in Seattle
  30. ConTra of the Ghent University
  31. Hooghe B, Hulpiau P, van Roy F, De Bleser P (July 2008). "ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species". Nucleic Acids Res. 36 (Web Server issue): W128–32. doi:10.1093/nar/gkn195. PMC   2447729 . PMID   18453628.
  32. PMS Archived 2012-07-10 at archive.today , developed at the Nanjing University
  33. Su G, Mao B, Wang J (2006). "A web server for transcription factor binding site prediction". Bioinformation. 1 (5): 156–7. doi:10.6026/97320630001156. PMC   1891680 . PMID   17597879.
  34. T-Reg Comparator Archived 2012-07-18(Timestamp length) at archive.today on the server of the Max Planck Institute for Molecular Genetics
  35. MACO Archived 2012-07-10 at archive.today , developed at Nanjing University
  36. Su G, Mao B, Wang J (2006). "MACO: a gapped-alignment scoring tool for comparing transcription factor binding sites". In Silico Biol. (Gedrukt). 6 (4): 307–10. PMID   16922693.
  37. PReMOD Archived 2008-12-28 at the Wayback Machine : Human and mouse genome of the years 2004 & 2005; IRCM / McGill University, Montreal
  38. PRIMA: Human genome of 2004; Tel-Aviv University
  39. MSigDB: Mammalian transcription factor target gene sets; GSEA wiki server of Broad Institute of MIT and Harvard, Cambridge, MA
  40. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M (March 2005). "Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals". Nature. 434 (7031): 338–45. Bibcode:2005Natur.434..338X. doi:10.1038/nature03441. PMC   2923337 . PMID   15735639.