HOCOMOCO

Last updated

HOCOMOCO
Database.png
Content
DescriptionCurated collection of binding models for human and mouse transcription factors
Data types
captured
Transcription factor binding profiles
Organisms Homo sapiens, Mus musculus

laboratory: autosome.org

author: Vorontsov, Makeev, Kulakovskiy
Contact
Primary citationVorontsov et al [1]
Access
Website HOCOMOCO

HOCOMOCO [1] [2] [3] [4] is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

Contents

Introduction

Transcription factors (TFs) are proteins that bind DNA and thus regulate the trasncription process. The binding is sequence-specific. A sequence motif [5] is a model that describes the common pattern of the DNA binding sites [6] that a particular TF prefers to bind. One of the possible representations of the model is the Position-Weight Matrix (PWM) [7] .

Organisms

Recognition

According to the Web of Science, the 2018 publication of HOCOMOCO [2] has been cited 396 times (as of January 2024). The publications [3] [4] have been cited 144 and 151 times.

See also

Related Research Articles

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.

<span class="mw-page-title-main">DNA-binding protein</span> Proteins that bind with DNA, such as transcription factors, polymerases, nucleases and histones

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.

<span class="mw-page-title-main">Sequence logo</span>

In bioinformatics, a sequence logo is a graphical representation of the sequence conservation of nucleotides or amino acids . A sequence logo is created from a collection of aligned sequences and depicts the consensus sequence and diversity of the sequences. Sequence logos are frequently used to depict sequence characteristics such as protein-binding sites in DNA or functional units in proteins.

SOX genes encode a family of transcription factors that bind to the minor groove in DNA, and belong to a super-family of genes characterized by a homologous sequence called the HMG-box. This HMG box is a DNA binding domain that is highly conserved throughout eukaryotic species. Homologues have been identified in insects, nematodes, amphibians, reptiles, birds and a range of mammals. However, HMG boxes can be very diverse in nature, with only a few amino acids being conserved between species.

Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.

<span class="mw-page-title-main">Initiator element</span>

The initiator element (Inr), sometimes referred to as initiator motif, is a core promoter that is similar in function to the Pribnow box or the TATA box. The Inr is the simplest functional promoter that is able to direct transcription initiation without a functional TATA box. It has the consensus sequence YYANWYY in humans. Similarly to the TATA box, the Inr element facilitates the binding of transcription Factor II D (TFIID). The Inr works by enhancing binding affinity and strengthening the promoter.

<span class="mw-page-title-main">HMGA1</span> Protein-coding gene in the species Homo sapiens

High-mobility group protein HMG-I/HMG-Y is a protein that in humans is encoded by the HMGA1 gene.

<span class="mw-page-title-main">CEBPG</span> Protein-coding gene in the species Homo sapiens

CCAAT/enhancer-binding protein gamma (C/EBPγ) is a protein that in humans is encoded by the CEBPG gene. This gene has no introns.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.

<span class="mw-page-title-main">DNA binding site</span> Regions of DNA capable of binding to biomolecules

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

Phyloscan is a web service for DNA sequence analysis that is free and open to all users. For locating matches to a user-specified sequence motif for a regulatory binding site, Phyloscan provides a statistically sensitive scan of user-supplied mixed aligned and unaligned DNA sequence data. Phyloscan's strength is that it brings together

In molecular biology, the BEN domain is a protein domain which is found in diverse proteins including:

TRANSFAC is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.

<span class="mw-page-title-main">WRKY protein domain</span> Protein domain

The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) cis-regulatory element.

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

JASPAR is an open access and widely used database of manually curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFM) and transcription factor flexible models (TFFM) for TFs from species in six taxonomic groups. From the supplied PFMs, users may generate position-specific weight matrices (PWM). The JASPAR database was introduced in 2004. There were seven major updates and new releases in 2006, 2008, 2010, 2014, 2016, 2018, 2020 and 2022, which is the latest release of JASPAR.

<span class="mw-page-title-main">Ivan Erill</span> Spanish computational biologist

Ivan Erill is a Spanish computational biologist known for his research in comparative genomics and molecular microbiology. His work focuses primarily on bacterial comparative genomics, through the development of computational methods for analyzing regulatory networks and their evolution.

References

  1. 1 2 Vorontsov, Ilya E; Eliseeva, Irina A; Zinkevich, Arsenii; Nikonov, Mikhail; Abramov, Sergey; Boytsov, Alexandr; Kamenets, Vasily; Kasianova, Alexandra; Kolmykov, Semyon; Yevshin, Ivan S; Favorov, Alexander; Medvedeva, Yulia A; Jolma, Arttu; Kolpakov, Fedor; Makeev, Vsevolod J; Kulakovskiy, Ivan V (16 November 2023). "HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors". Nucleic Acids Research. 52 (D1): D154–D163. doi:10.1093/nar/gkad1077. ISSN   0305-1048. PMC   10767914 . PMID   37971293.
  2. 1 2 Kulakovskiy, Ivan V.; Vorontsov, Ilya E.; Yevshin, Ivan S.; Sharipov, Ruslan N.; Fedorova, Alla D.; Rumynskiy, Eugene I.; Medvedeva, Yulia A.; Magana-Mora, Arturo; Bajic, Vladimir B.; Papatsenko, Dmitry A.; Kolpakov, Fedor A.; Makeev, Vsevolod J. (4 January 2018). "HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis". Nucleic Acids Research. 46 (D1): 252–D259. doi:10.1093/nar/gkx1106. ISSN   1362-4962. PMC   5753240 . PMID   29140464.
  3. 1 2 Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, et al. (January 2016). "HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models". Nucleic Acids Research. 44 (D1): D116–D125. doi:10.1093/nar/gkv1249. PMC   4702883 . PMID   26586801.
  4. 1 2 Kulakovskiy, Ivan V.; Medvedeva, Yulia A.; Schaefer, Ulf; Kasianov, Artem S.; Vorontsov, Ilya E.; Bajic, Vladimir B.; Makeev, Vsevolod J. (1 January 2013). "HOCOMOCO: a comprehensive collection of human transcription factor binding sites models". Nucleic Acids Research. 41 (Database issue): 195–202. doi:10.1093/nar/gks1089. ISSN   1362-4962. PMC   3531053 . PMID   23175603.
  5. Inukai, Sachi; Kock, Kian Hong; Bulyk, Martha L (1 April 2017). "Transcription factor–DNA binding: beyond binding site motifs". Current Opinion in Genetics & Development. Genome architecture and expression. 43: 110–119. doi:10.1016/j.gde.2017.02.007. ISSN   0959-437X. PMC   5447501 . PMID   28359978.
  6. Wasserman, Wyeth W.; Sandelin, Albin (April 2004). "Applied bioinformatics for the identification of regulatory elements". Nature Reviews Genetics. 5 (4): 276–287. doi:10.1038/nrg1315. ISSN   1471-0064. PMID   15131651. S2CID   16599073.
  7. Stormo, G. D. (January 2000). "DNA binding sites: representation and discovery". Bioinformatics (Oxford, England). 16 (1): 16–23. doi:10.1093/bioinformatics/16.1.16. ISSN   1367-4803. PMID   10812473.