Transcription factor binding site databases

Last updated

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. [1] [2] The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

Transcription factor binding site databases

Comprehensive List of transcription factor binding sites (TFBSs) databases based on ChIP-seq data as follows:

NameDescriptiontypeLinkReferences
ChIPBaseChIPBase a database for Transcription factor-binding sites, motifs (~1290 transcription factors) and decoding the transcriptional regulation of LncRNAs, miRNAs and protein-coding genes from ~10,200 curated peak datasets derived from ChIP-seq methods in 10 speciesdatabase website [3]
ChEAtranscription factor regulation inferred from integrating genome-wide ChIP-X experiments.database website [4]
CIS-BPcollection of transcription factor binding sites models inferred by binding domains.database website [5]
CistromeMapa knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human.database website [6]
CTCFBSDBa database for CTCF binding sites and genome organizationdatabase website [7]
Factorbooka Wiki-based database for transcription factor-binding data generated by the ENCODE consortium.database website [8]
hmChIPa database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data.database website [9]
HOCOMOCO a comprehensive collection of human and mouse transcription factor binding sites models.database website [10]
JASPAR The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes.database website [11] [12]
MethMotifAn integrative cell-specific database of transcription factor binding motifs coupled with DNA methylation profiles.database website [13]
SwissRegulona database of genome-wide annotations of regulatory sites.database website [14]
TFLinkTFLink gateway provides comprehensive and highly accurate information on transcription factor - target gene interactions, nucleotide sequences and genomic locations of transcription factor binding sites for human and six model organisms.database website [15]
TRANSFACA long-standing curated database of regulatory sites, enhancers, binding site predictions, PSSMs and related analytical software.database website [16]

Related Research Articles

Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.

<span class="mw-page-title-main">DNA binding site</span> Regions of DNA capable of binding to biomolecules

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.

Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA. When the protein is a transcription factor, the enriched area is its transcription factor binding site (TFBS). Popular software programs include MACS. Wilbanks and colleagues is a survey of the ChIP-seq peak callers, and Bailey et al. is a description of practical guidelines for peak calling in ChIP-seq data.

The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).

<span class="mw-page-title-main">Sequence Read Archive</span>

The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).

TRANSFAC is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.

CollecTF is a database of transcription factor binding sites in the Bacteria domain.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

JASPAR is an open access and widely used database of manually curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFM) and transcription factor flexible models (TFFM) for TFs from species in six taxonomic groups. From the supplied PFMs, users may generate position-specific weight matrices (PWM). The JASPAR database was introduced in 2004. There were seven major updates and new releases in 2006, 2008, 2010, 2014, 2016, 2018, 2020 and 2022, which is the latest release of JASPAR.

<span class="mw-page-title-main">CCDC188</span> Protein found in humans

CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.

HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

References

  1. Park PJ (October 2009). "ChIP-seq: advantages and challenges of a maturing technology". Nature Reviews. Genetics. 10 (10): 669–680. doi:10.1038/nrg2641. PMC   3191340 . PMID   19736561.
  2. Farnham PJ (September 2009). "Insights from genomic profiling of transcription factors". Nature Reviews. Genetics. 10 (9): 605–616. doi:10.1038/nrg2636. PMC   2846386 . PMID   19668247.
  3. Yang JH, Li JH, Jiang S, Zhou H, Qu LH (January 2013). "ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data". Nucleic Acids Research. 41 (Database issue): D177–D187. doi:10.1093/nar/gks1060. PMC   3531181 . PMID   23161675.
  4. Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A (October 2010). "ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments". Bioinformatics. 26 (19): 2438–2444. doi:10.1093/bioinformatics/btq466. PMC   2944209 . PMID   20709693.
  5. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, et al. (September 2014). "Determination and inference of eukaryotic transcription factor sequence specificity". Cell. 158 (6): 1431–1443. doi:10.1016/j.cell.2014.08.009. PMC   4163041 . PMID   25215497.
  6. Qin B, Zhou M, Ge Y, Taing L, Liu T, Wang Q, et al. (May 2012). "CistromeMap: a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human". Bioinformatics. 28 (10): 1411–1412. doi:10.1093/bioinformatics/bts157. PMC   3348563 . PMID   22495751.
  7. Ziebarth JD, Bhattacharya A, Cui Y (January 2013). "CTCFBSDB 2.0: a database for CTCF-binding sites and genome organization". Nucleic Acids Research. 41 (Database issue): D188–D194. doi:10.1093/nar/gks1165. PMC   3531215 . PMID   23193294.
  8. Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, et al. (January 2013). "Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium". Nucleic Acids Research. 41 (Database issue): D171–D176. doi:10.1093/nar/gks1221. PMC   3531197 . PMID   23203885.
  9. Chen L, Wu G, Ji H (May 2011). "hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data". Bioinformatics. 27 (10): 1447–1448. doi:10.1093/bioinformatics/btr156. PMC   3087956 . PMID   21450710.
  10. Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, et al. (January 2016). "HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models". Nucleic Acids Research. 44 (D1): D116–D125. doi:10.1093/nar/gkv1249. PMC   4702883 . PMID   26586801.
  11. Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B (January 2004). "JASPAR: an open-access database for eukaryotic transcription factor binding profiles". Nucleic Acids Research. 32 (Database issue): D91–D94. doi:10.1093/nar/gkh012. PMC   308747 . PMID   14681366.
  12. Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, et al. (January 2020). "JASPAR 2020: update of the open-access database of transcription factor binding profiles". Nucleic Acids Research. 48 (D1): D87–D92. doi:10.1093/nar/gkz1001. PMC   7145627 . PMID   31701148.
  13. Xuan Lin QX, Sian S, An O, Thieffry D, Jha S, Benoukraf T (January 2019). "MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles". Nucleic Acids Research. 47 (D1): D145–D154. doi:10.1093/nar/gky1005. PMC   6323897 . PMID   30380113.
  14. Pachkov M, Balwierz PJ, Arnold P, Ozonov E, van Nimwegen E (January 2013). "SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates". Nucleic Acids Research. 41 (Database issue): D214–D220. doi:10.1093/nar/gks1145. PMC   3531101 . PMID   23180783.
  15. Liska O, Bohár B, Hidas A, Korcsmáros T, Papp B, Fazekas D, Ari E (September 2022). "TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species". Database. 2022. doi:10.1093/database/baac083. PMC   9480832 . PMID   36124642.
  16. Kel A, Voss N, Jauregui R, Kel-Margoulis O, Wingender E (September 2006). "Beyond microarrays: find key transcription factors controlling signal transduction pathways". BMC Bioinformatics. 7 (Suppl 2): S2–S13. doi: 10.1186/1471-2105-7-S2-S13 . PMC   1683568 . PMID   17118134.