Histone Database

Last updated
Histone Database
US-NLM-NCBI-Logo.svg
Content
Descriptioncurated database of histone proteins and their variants
Contact
Research center National Center for Biotechnology Information
Primary citationDraizen EJ, Shaytan AK, Marino-Ramirez L, Talbert PB, Landsman D, Panchenko AR (2016) [1]
Access
Website https://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0

The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer [2] followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. [3] The first version of the Histone Database was released in 1995 [4] and several updates have been released since then. [5] [6] [7] [8] [9] [10] [11]

Current version of the Histone Database - HistoneDB 2.0 - with variants - includes sequence and structural annotations for all five histone types (H3, H4, H2A, H2B, H1) and major histone variants within each histone type. It has many interactive tools to explore and compare sequences of different histone variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant.

Related Research Articles

<span class="mw-page-title-main">Histone octamer</span> 8-protein complex forming the core of nucleosomes

In molecular biology, a histone octamer is the eight-protein complex found at the center of a nucleosome core particle. It consists of two copies of each of the four core histone proteins. The octamer assembles when a tetramer, containing two copies of H3 and two of H4, complexes with two H2A/H2B dimers. Each histone has both an N-terminal tail and a C-terminal histone-fold. Each of these key components interacts with DNA in its own way through a series of weak interactions, including hydrogen bonds and salt bridges. These interactions keep the DNA and the histone octamer loosely associated, and ultimately allow the two to re-position or to separate entirely.

A histone fold is a structurally conserved motif found near the C-terminus in every core histone sequence in a histone octamer responsible for the binding of histones into heterodimers.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. It contains protein sequences databases

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

<span class="mw-page-title-main">Amos Bairoch</span> Swiss bioinformatician

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">Histone H2A</span> One of the five main histone proteins

Histone H2A is one of the five main histone proteins involved in the structure of chromatin in eukaryotic cells.

<span class="mw-page-title-main">PHI-base</span> Biological database

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

<span class="mw-page-title-main">HMGN2</span> Protein-coding gene in the species Homo sapiens

Non-histone chromosomal protein HMG-17 is a protein that in humans is encoded by the HMGN2 gene.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

OMPdb is a dedicated database that contains beta barrel (β-barrel) outer membrane proteins from Gram-negative bacteria. Such proteins are responsible for a broad range of important functions, like passive nutrient uptake, active transport of large molecules, protein secretion, as well as adhesion to host cells, through which bacteria expose their virulence activity.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Histone variants are proteins that substitute for the core canonical histones in nucleosomes in eukaryotes and often confer specific structural and functional features. The term might also include a set of linker histone (H1) variants, which lack a distinct canonical isoform. The differences between the core canonical histones and their variants can be summarized as follows: (1) canonical histones are replication-dependent and are expressed during the S-phase of cell cycle whereas histone variants are replication-independent and are expressed during the whole cell cycle; (2) in animals, the genes encoding canonical histones are typically clustered along the chromosome, are present in multiple copies and are among the most conserved proteins known, whereas histone variants are often single-copy genes and show high degree of variation among species; (3) canonical histone genes lack introns and use a stem loop structure at the 3’ end of their mRNA, whereas histone variant genes may have introns and their mRNA tail is usually polyadenylated. Complex multicellular organisms typically have a large number of histone variants providing a variety of different functions. Recent data are accumulating about the roles of diverse histone variants highlighting the functional links between variants and the delicate regulation of organism development.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Draizen EJ, Shaytan AK, Marino-Ramirez L, Talbert PB, Landsman D, Panchenko AR (2016). "HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants". Database: The Journal of Biological Databases and Curation. 2016: baw014. doi:10.1093/database/baw014. PMC   4795928 . PMID   26989147.
  2. Arents G, Burlingame RW, Wang BC, Love WE, Moudrianakis EN (1991). "The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix". Proc Natl Acad Sci U S A. 88 (22): 10148–52. Bibcode:1991PNAS...8810148A. doi: 10.1073/pnas.88.22.10148 . PMC   52885 . PMID   1946434.
  3. Baxevanis AD, Arents G, Moudrianakis EN, Landsman D (1995). "A variety of DNA-binding and multimeric proteins contain the histone fold motif". Nucleic Acids Res. 23 (14): 2685–91. doi:10.1093/nar/23.14.2685. PMC   307093 . PMID   7651829.
  4. Baxevanis AD, Landsman D (1996). "Histone Sequence Database: a compilation of highly-conserved nucleoprotein sequences". Nucleic Acids Res. 24 (1): 245–7. doi:10.1093/nar/24.1.245. PMC   145601 . PMID   8594591.
  5. Marino-Ramirez L, Levine KM, Morales M, Zhang S, Moreland RT, Baxevanis AD, Landsman D (2011). "The Histone Database: an integrated resource for histones and histone fold-containing proteins". Database: The Journal of Biological Databases and Curation. 2011: bar048. doi:10.1093/database/bar048. PMC   3199919 . PMID   22025671.
  6. Marino-Ramirez L, Hsu B, Baxevanis AD, Landsman D (2006). "The Histone Database: a comprehensive resource for histones and histone fold-containing proteins". Proteins. 62 (4): 838–42. doi:10.1002/prot.20814. PMC   1800941 . PMID   16345076.
  7. Sullivan S, Sink DW, Trout KL, Makalowska I, Taylor PM, Baxevanis AD, Landsman D (2002). "The Histone Database". Nucleic Acids Res. 30 (1): 341–2. doi:10.1093/nar/30.1.341. PMC   99096 . PMID   11752331.
  8. Sullivan SA, Aravind L, Makalowska I, Baxevanis AD, Landsman D (2000). "The histone database: a comprehensive WWW resource for histones and histone fold-containing proteins". Nucleic Acids Res. 28 (1): 320–2. doi:10.1093/nar/28.1.320. PMC   102427 . PMID   10592260.
  9. Makalowska I, Ferlanti ES, Baxevanis AD, Landsman D (1999). "Histone Sequence Database: sequences, structures, post-translational modifications and genetic loci". Nucleic Acids Res. 27 (1): 323–4. doi:10.1093/nar/27.1.323. PMC   148172 . PMID   9847217.
  10. Baxevanis AD, Landsman D (1998). "Histone Sequence Database: new histone fold family members". Nucleic Acids Res. 26 (1): 372–5. doi:10.1093/nar/26.1.372. PMC   147196 . PMID   9399877.
  11. Baxevanis AD, Landsman D (1997). "Histone and histone fold sequences and structures: a database". Nucleic Acids Res. 25 (1): 272–3. doi:10.1093/nar/25.1.272. PMC   146383 . PMID   9016552.