Content | |
---|---|
Description | curated database of histone proteins and their variants |
Contact | |
Research center | National Center for Biotechnology Information |
Primary citation | Draizen EJ, Shaytan AK, Marino-Ramirez L, Talbert PB, Landsman D, Panchenko AR (2016) [1] |
Access | |
Website | https://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0 |
The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer [2] followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. [3] The first version of the Histone Database was released in 1995 [4] and several updates have been released since then. [5] [6] [7] [8] [9] [10] [11]
Current version of the Histone Database - HistoneDB 2.0 - with variants - includes sequence and structural annotations for all five histone types (H3, H4, H2A, H2B, H1) and major histone variants within each histone type. It has many interactive tools to explore and compare sequences of different histone variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant.
A histone octamer is the eight protein complex found at the center of a nucleosome core particle. It consists of two copies of each of the four core histone proteins. The octamer assembles when a tetramer, containing two copies of both H3 and H4, complexes with two H2A/H2B dimers. Each histone has both an N-terminal tail and a C-terminal histone-fold. Both of these key components interact with DNA in their own way through a series of weak interactions, including hydrogen bonds and salt bridges. These interactions keep the DNA and histone octamer loosely associated and ultimately allow the two to re-position or separate entirely.
A histone fold is a structurally conserved motif found near the C-terminus in every core histone sequence in a histone octamer responsible for the binding of histones into heterodimers.
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.
The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. It contains protein sequences databases
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 34.0, was released in March 2021 and contains 19,179 families.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
Histone H2A is one of the five main histone proteins involved in the structure of chromatin in eukaryotic cells.
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.
Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
Non-histone chromosomal protein HMG-17 is a protein that in humans is encoded by the HMGN2 gene.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank. The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).
TIGRFAMs is a database of protein families designed to support manual and automated genome annotation. Each entry includes a multiple sequence alignment and hidden Markov model (HMM) built from the alignment. Sequences that score above the defined cutoffs of a given TIGRFAMs HMM are assigned to that protein family and may be assigned the corresponding annotations. Most models describe protein families found in Bacteria and Archaea.
In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.
Histone variants are proteins that substitute for the core canonical histones in nucleosomes in eukaryotes and often confer specific structural and functional features. The term might also include a set of linker histone (H1) variants, which lack a distinct canonical isoform. The differences between the core canonical histones and their variants can be summarized as follows: (1) canonical histones are replication-dependent and are expressed during the S-phase of cell cycle whereas histone variants are replication-independent and are expressed during the whole cell cycle; (2) in animals, the genes encoding canonical histones are typically clustered along the chromosome, are present in multiple copies and are among the most conserved proteins known, whereas histone variants are often single-copy genes and show high degree of variation among species; (3) canonical histone genes lack introns and use a stem loop structure at the 3’ end of their mRNA, whereas histone variant genes may have introns and their mRNA tail is usually polyadenylated. Complex multicellular organisms typically have a large number of histone variants providing a variety of different functions. Recent data are accumulating about the roles of diverse histone variants highlighting the functional links between variants and the delicate regulation of organism development.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.