Histone Database

Last updated
Histone Database
US-NLM-NCBI-Logo.svg
Content
Descriptioncurated database of histone proteins and their variants
Contact
Research center National Center for Biotechnology Information
Primary citationDraizen EJ, Shaytan AK, Marino-Ramirez L, Talbert PB, Landsman D, Panchenko AR (2016) [1]
Access
Website https://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0

The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer [2] followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. [3] The first version of the Histone Database was released in 1995 [4] and several updates have been released since then. [5] [6] [7] [8] [9] [10] [11]

Current version of the Histone Database - HistoneDB 2.0 - with variants - includes sequence and structural annotations for all five histone types (H3, H4, H2A, H2B, H1) and major histone variants within each histone type. It has many interactive tools to explore and compare sequences of different histone variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant.

Related Research Articles

Histone octamer

A histone octamer is the eight protein complex found at the center of a nucleosome core particle. It consists of two copies of each of the four core histone proteins. The octamer assembles when a tetramer, containing two copies of both H3 and H4, complexes with two H2A/H2B dimers. Each histone has both an N-terminal tail and a C-terminal histone-fold. Both of these key components interact with DNA in their own way through a series of weak interactions, including hydrogen bonds and salt bridges. These interactions keep the DNA and histone octamer loosely associated and ultimately allow the two to re-position or separate entirely.

A histone fold is a structurally conserved motif found near the C-terminus in every core histone sequence in a histone octamer responsible for the binding of histones into heterodimers.

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

UniProt Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. It contains protein sequences databases

Pfam

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 34.0, was released in March 2021 and contains 19,179 families.

Amos Bairoch

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

Histone H2A One of the five main histone proteins

Histone H2A is one of the five main histone proteins involved in the structure of chromatin in eukaryotic cells.

PROSITE

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

MicrobesOnline

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

HMGN2

Non-histone chromosomal protein HMG-17 is a protein that in humans is encoded by the HMGN2 gene.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank. The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

TIGRFAMs is a database of protein families designed to support manual and automated genome annotation. Each entry includes a multiple sequence alignment and hidden Markov model (HMM) built from the alignment. Sequences that score above the defined cutoffs of a given TIGRFAMs HMM are assigned to that protein family and may be assigned the corresponding annotations. Most models describe protein families found in Bacteria and Archaea.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

Histone variants are proteins that substitute for the core canonical histones in nucleosomes in eukaryotes and often confer specific structural and functional features. The term might also include a set of linker histone (H1) variants, which lack a distinct canonical isoform. The differences between the core canonical histones and their variants can be summarized as follows: (1) canonical histones are replication-dependent and are expressed during the S-phase of cell cycle whereas histone variants are replication-independent and are expressed during the whole cell cycle; (2) in animals, the genes encoding canonical histones are typically clustered along the chromosome, are present in multiple copies and are among the most conserved proteins known, whereas histone variants are often single-copy genes and show high degree of variation among species; (3) canonical histone genes lack introns and use a stem loop structure at the 3’ end of their mRNA, whereas histone variant genes may have introns and their mRNA tail is usually polyadenylated. Complex multicellular organisms typically have a large number of histone variants providing a variety of different functions. Recent data are accumulating about the roles of diverse histone variants highlighting the functional links between variants and the delicate regulation of organism development.

Biocuration

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Draizen EJ, Shaytan AK, Marino-Ramirez L, Talbert PB, Landsman D, Panchenko AR (2016). "HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants". Database: The Journal of Biological Databases and Curation. 2016: baw014. doi:10.1093/database/baw014. PMC   4795928 . PMID   26989147.
  2. Arents G, Burlingame RW, Wang BC, Love WE, Moudrianakis EN (1991). "The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix". Proc Natl Acad Sci U S A. 88 (22): 10148–52. Bibcode:1991PNAS...8810148A. doi: 10.1073/pnas.88.22.10148 . PMC   52885 . PMID   1946434.
  3. Baxevanis AD, Arents G, Moudrianakis EN, Landsman D (1995). "A variety of DNA-binding and multimeric proteins contain the histone fold motif". Nucleic Acids Res. 23 (14): 2685–91. doi:10.1093/nar/23.14.2685. PMC   307093 . PMID   7651829.
  4. Baxevanis AD, Landsman D (1996). "Histone Sequence Database: a compilation of highly-conserved nucleoprotein sequences". Nucleic Acids Res. 24 (1): 245–7. doi:10.1093/nar/24.1.245. PMC   145601 . PMID   8594591.
  5. Marino-Ramirez L, Levine KM, Morales M, Zhang S, Moreland RT, Baxevanis AD, Landsman D (2011). "The Histone Database: an integrated resource for histones and histone fold-containing proteins". Database: The Journal of Biological Databases and Curation. 2011: bar048. doi:10.1093/database/bar048. PMC   3199919 . PMID   22025671.
  6. Marino-Ramirez L, Hsu B, Baxevanis AD, Landsman D (2006). "The Histone Database: a comprehensive resource for histones and histone fold-containing proteins". Proteins. 62 (4): 838–42. doi:10.1002/prot.20814. PMC   1800941 . PMID   16345076.
  7. Sullivan S, Sink DW, Trout KL, Makalowska I, Taylor PM, Baxevanis AD, Landsman D (2002). "The Histone Database". Nucleic Acids Res. 30 (1): 341–2. doi:10.1093/nar/30.1.341. PMC   99096 . PMID   11752331.
  8. Sullivan SA, Aravind L, Makalowska I, Baxevanis AD, Landsman D (2000). "The histone database: a comprehensive WWW resource for histones and histone fold-containing proteins". Nucleic Acids Res. 28 (1): 320–2. doi:10.1093/nar/28.1.320. PMC   102427 . PMID   10592260.
  9. Makalowska I, Ferlanti ES, Baxevanis AD, Landsman D (1999). "Histone Sequence Database: sequences, structures, post-translational modifications and genetic loci". Nucleic Acids Res. 27 (1): 323–4. doi:10.1093/nar/27.1.323. PMC   148172 . PMID   9847217.
  10. Baxevanis AD, Landsman D (1998). "Histone Sequence Database: new histone fold family members". Nucleic Acids Res. 26 (1): 372–5. doi:10.1093/nar/26.1.372. PMC   147196 . PMID   9399877.
  11. Baxevanis AD, Landsman D (1997). "Histone and histone fold sequences and structures: a database". Nucleic Acids Res. 25 (1): 272–3. doi:10.1093/nar/25.1.272. PMC   146383 . PMID   9016552.