Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. [1] [2] The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.
Comprehensive List of transcription factor binding sites (TFBSs) databases based on ChIP-seq data as follows:
Name | Description | type | Link | References | |||||
---|---|---|---|---|---|---|---|---|---|
ChIPBase | ChIPBase a database for Transcription factor-binding sites, motifs (~1290 transcription factors) and decoding the transcriptional regulation of LncRNAs, miRNAs and protein-coding genes from ~10,200 curated peak datasets derived from ChIP-seq methods in 10 species | database | website | [3] | |||||
ChEA | transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. | database | website | [4] | |||||
CIS-BP | collection of transcription factor binding sites models inferred by binding domains. | database | website | [5] | |||||
CistromeMap | a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human. | database | website | [6] | |||||
CTCFBSDB | a database for CTCF binding sites and genome organization | database | website | [7] | |||||
Factorbook | a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. | database | website | [8] | |||||
hmChIP | a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data. | database | website | [9] | |||||
HOCOMOCO | a comprehensive collection of human and mouse transcription factor binding sites models. | database | website | [10] | |||||
JASPAR | The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. | database | website | [11] [12] | |||||
MethMotif | An integrative cell-specific database of transcription factor binding motifs coupled with DNA methylation profiles. | database | website | [13] | |||||
SwissRegulon | a database of genome-wide annotations of regulatory sites. | database | website | [14] | |||||
TFLink | TFLink gateway provides comprehensive and highly accurate information on transcription factor - target gene interactions, nucleotide sequences and genomic locations of transcription factor binding sites for human and six model organisms. | database | website | [15] | |||||
TRANSFAC | A long-standing curated database of regulatory sites, enhancers, binding site predictions, PSSMs and related analytical software. | database | website | [16] |
Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.
BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.
DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.
This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.
Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA. When the protein is a transcription factor, the enriched area is its transcription factor binding site (TFBS). Popular software programs include MACS. Wilbanks and colleagues is a survey of the ChIP-seq peak callers, and Bailey et al. is a description of practical guidelines for peak calling in ChIP-seq data.
The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).
The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).
TRANSFAC is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.
CollecTF is a database of transcription factor binding sites in the Bacteria domain.
EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.
Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.
JASPAR is an open access and widely used database of manually curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFM) and transcription factor flexible models (TFFM) for TFs from species in six taxonomic groups. From the supplied PFMs, users may generate position-specific weight matrices (PWM). The JASPAR database was introduced in 2004. There were seven major updates and new releases in 2006, 2008, 2010, 2014, 2016, 2018, 2020 and 2022, which is the latest release of JASPAR.
CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.
HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.