Transcription factor binding site databases

Last updated January 23, 2024

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications.^[1]^[2] The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

Transcription factor binding site databases

Comprehensive List of transcription factor binding sites (TFBSs) databases based on ChIP-seq data as follows:

Name	Description	type	Link	References
ChIPBase	ChIPBase a database for Transcription factor-binding sites, motifs (~1290 transcription factors) and decoding the transcriptional regulation of LncRNAs, miRNAs and protein-coding genes from ~10,200 curated peak datasets derived from ChIP-seq methods in 10 species	database	website	^[3]
ChEA	transcription factor regulation inferred from integrating genome-wide ChIP-X experiments.	database	website	^[4]
CIS-BP	collection of transcription factor binding sites models inferred by binding domains.	database	website	^[5]
CistromeMap	a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human.	database	website	^[6]
CTCFBSDB	a database for CTCF binding sites and genome organization	database	website	^[7]
Factorbook	a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium.	database	website	^[8]
hmChIP	a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data.	database	website	^[9]
HOCOMOCO	a comprehensive collection of human and mouse transcription factor binding sites models.	database	website	^[10]
JASPAR	The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes.	database	website	^[11]^[12]
MethMotif	An integrative cell-specific database of transcription factor binding motifs coupled with DNA methylation profiles.	database	website	^[13]
SwissRegulon	a database of genome-wide annotations of regulatory sites.	database	website	^[14]
TFLink	TFLink gateway provides comprehensive and highly accurate information on transcription factor - target gene interactions, nucleotide sequences and genomic locations of transcription factor binding sites for human and six model organisms.	database	website	^[15]
TRANSFAC	A long-standing curated database of regulatory sites, enhancers, binding site predictions, PSSMs and related analytical software.	database	website	^[16]

Related Research Articles

Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.

Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA. When the protein is a transcription factor, the enriched area is its transcription factor binding site (TFBS). Popular software programs include MACS. Wilbanks and colleagues is a survey of the ChIP-seq peak callers, and Bailey et al. is a description of practical guidelines for peak calling in ChIP-seq data.

The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).

The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).

TRANSFAC is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.

CollecTF is a database of transcription factor binding sites in the Bacteria domain.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

JASPAR is an open access and widely used database of manually curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFM) and transcription factor flexible models (TFFM) for TFs from species in six taxonomic groups. From the supplied PFMs, users may generate position-specific weight matrices (PWM). The JASPAR database was introduced in 2004. There were seven major updates and new releases in 2006, 2008, 2010, 2014, 2016, 2018, 2020 and 2022, which is the latest release of JASPAR.

CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.

HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

References

↑ Park PJ (October 2009). "ChIP-seq: advantages and challenges of a maturing technology". Nature Reviews. Genetics. 10 (10): 669–680. doi:10.1038/nrg2641. PMC 3191340 . PMID 19736561.
↑ Farnham PJ (September 2009). "Insights from genomic profiling of transcription factors". Nature Reviews. Genetics. 10 (9): 605–616. doi:10.1038/nrg2636. PMC 2846386 . PMID 19668247.
↑ Yang JH, Li JH, Jiang S, Zhou H, Qu LH (January 2013). "ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data". Nucleic Acids Research. 41 (Database issue): D177–D187. doi:10.1093/nar/gks1060. PMC 3531181 . PMID 23161675.
↑ Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A (October 2010). "ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments". Bioinformatics. 26 (19): 2438–2444. doi:10.1093/bioinformatics/btq466. PMC 2944209 . PMID 20709693.
↑ Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, et al. (September 2014). "Determination and inference of eukaryotic transcription factor sequence specificity". Cell. 158 (6): 1431–1443. doi:10.1016/j.cell.2014.08.009. PMC 4163041 . PMID 25215497.
↑ Qin B, Zhou M, Ge Y, Taing L, Liu T, Wang Q, et al. (May 2012). "CistromeMap: a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human". Bioinformatics. 28 (10): 1411–1412. doi:10.1093/bioinformatics/bts157. PMC 3348563 . PMID 22495751.
↑ Ziebarth JD, Bhattacharya A, Cui Y (January 2013). "CTCFBSDB 2.0: a database for CTCF-binding sites and genome organization". Nucleic Acids Research. 41 (Database issue): D188–D194. doi:10.1093/nar/gks1165. PMC 3531215 . PMID 23193294.
↑ Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, et al. (January 2013). "Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium". Nucleic Acids Research. 41 (Database issue): D171–D176. doi:10.1093/nar/gks1221. PMC 3531197 . PMID 23203885.
↑ Chen L, Wu G, Ji H (May 2011). "hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data". Bioinformatics. 27 (10): 1447–1448. doi:10.1093/bioinformatics/btr156. PMC 3087956 . PMID 21450710.
↑ Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, et al. (January 2016). "HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models". Nucleic Acids Research. 44 (D1): D116–D125. doi:10.1093/nar/gkv1249. PMC 4702883 . PMID 26586801.
↑ Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B (January 2004). "JASPAR: an open-access database for eukaryotic transcription factor binding profiles". Nucleic Acids Research. 32 (Database issue): D91–D94. doi:10.1093/nar/gkh012. PMC 308747 . PMID 14681366.
↑ Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, et al. (January 2020). "JASPAR 2020: update of the open-access database of transcription factor binding profiles". Nucleic Acids Research. 48 (D1): D87–D92. doi:10.1093/nar/gkz1001. PMC 7145627 . PMID 31701148.
↑ Xuan Lin QX, Sian S, An O, Thieffry D, Jha S, Benoukraf T (January 2019). "MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles". Nucleic Acids Research. 47 (D1): D145–D154. doi:10.1093/nar/gky1005. PMC 6323897 . PMID 30380113.
↑ Pachkov M, Balwierz PJ, Arnold P, Ozonov E, van Nimwegen E (January 2013). "SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates". Nucleic Acids Research. 41 (Database issue): D214–D220. doi:10.1093/nar/gks1145. PMC 3531101 . PMID 23180783.
↑ Liska O, Bohár B, Hidas A, Korcsmáros T, Papp B, Fazekas D, Ari E (September 2022). "TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species". Database. 2022. doi:10.1093/database/baac083. PMC 9480832 . PMID 36124642.
↑ Kel A, Voss N, Jauregui R, Kel-Margoulis O, Wingender E (September 2006). "Beyond microarrays: find key transcription factors controlling signal transduction pathways". BMC Bioinformatics. 7 (Suppl 2): S2–S13. doi: 10.1186/1471-2105-7-S2-S13 . PMC 1683568 . PMID 17118134.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Park PJ (October 2009). "ChIP-seq: advantages and challenges of a maturing technology". Nature Reviews. Genetics. 10 (10): 669–680. doi:10.1038/nrg2641. PMC 3191340 . PMID 19736561.

[2] Farnham PJ (September 2009). "Insights from genomic profiling of transcription factors". Nature Reviews. Genetics. 10 (9): 605–616. doi:10.1038/nrg2636. PMC 2846386 . PMID 19668247.

[3] Yang JH, Li JH, Jiang S, Zhou H, Qu LH (January 2013). "ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data". Nucleic Acids Research. 41 (Database issue): D177–D187. doi:10.1093/nar/gks1060. PMC 3531181 . PMID 23161675.

[4] Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A (October 2010). "ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments". Bioinformatics. 26 (19): 2438–2444. doi:10.1093/bioinformatics/btq466. PMC 2944209 . PMID 20709693.

[5] Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, et al. (September 2014). "Determination and inference of eukaryotic transcription factor sequence specificity". Cell. 158 (6): 1431–1443. doi:10.1016/j.cell.2014.08.009. PMC 4163041 . PMID 25215497.

[6] Qin B, Zhou M, Ge Y, Taing L, Liu T, Wang Q, et al. (May 2012). "CistromeMap: a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human". Bioinformatics. 28 (10): 1411–1412. doi:10.1093/bioinformatics/bts157. PMC 3348563 . PMID 22495751.

[7] Ziebarth JD, Bhattacharya A, Cui Y (January 2013). "CTCFBSDB 2.0: a database for CTCF-binding sites and genome organization". Nucleic Acids Research. 41 (Database issue): D188–D194. doi:10.1093/nar/gks1165. PMC 3531215 . PMID 23193294.

[8] Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, et al. (January 2013). "Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium". Nucleic Acids Research. 41 (Database issue): D171–D176. doi:10.1093/nar/gks1221. PMC 3531197 . PMID 23203885.

[9] Chen L, Wu G, Ji H (May 2011). "hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data". Bioinformatics. 27 (10): 1447–1448. doi:10.1093/bioinformatics/btr156. PMC 3087956 . PMID 21450710.

[10] Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, et al. (January 2016). "HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models". Nucleic Acids Research. 44 (D1): D116–D125. doi:10.1093/nar/gkv1249. PMC 4702883 . PMID 26586801.

[11] Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B (January 2004). "JASPAR: an open-access database for eukaryotic transcription factor binding profiles". Nucleic Acids Research. 32 (Database issue): D91–D94. doi:10.1093/nar/gkh012. PMC 308747 . PMID 14681366.

[12] Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, et al. (January 2020). "JASPAR 2020: update of the open-access database of transcription factor binding profiles". Nucleic Acids Research. 48 (D1): D87–D92. doi:10.1093/nar/gkz1001. PMC 7145627 . PMID 31701148.

[13] Xuan Lin QX, Sian S, An O, Thieffry D, Jha S, Benoukraf T (January 2019). "MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles". Nucleic Acids Research. 47 (D1): D145–D154. doi:10.1093/nar/gky1005. PMC 6323897 . PMID 30380113.

[14] Pachkov M, Balwierz PJ, Arnold P, Ozonov E, van Nimwegen E (January 2013). "SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates". Nucleic Acids Research. 41 (Database issue): D214–D220. doi:10.1093/nar/gks1145. PMC 3531101 . PMID 23180783.

[15] Liska O, Bohár B, Hidas A, Korcsmáros T, Papp B, Fazekas D, Ari E (September 2022). "TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species". Database. 2022. doi:10.1093/database/baac083. PMC 9480832 . PMID 36124642.

[16] Kel A, Voss N, Jauregui R, Kel-Margoulis O, Wingender E (September 2006). "Beyond microarrays: find key transcription factors controlling signal transduction pathways". BMC Bioinformatics. 7 (Suppl 2): S2–S13. doi: 10.1186/1471-2105-7-S2-S13 . PMC 1683568 . PMID 17118134.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]