HUGO Gene Nomenclature Committee

Last updated
HGNC
HUGO Gene Nomenclature Committee logo.png
Content
DescriptionHGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, RNA genes and pseudogenes, to allow unambiguous scientific communication.
Data types
captured
Gene nomenclature
Organisms Human
Contact
Research center EMBL-EBI, UK;
Primary citationBraschi et al. (2019) [1]
Access
Website www.genenames.org
www.genenames.org/news
Download URL Statistics & Downloads
Custom Downloads
HGNC Biomart
Web service URL rest.genenames.org
Tools
Web HGNC Comparison of Orthology Predictions , [2] [3] Search
Miscellaneous
Curation policyYes

The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a unique and meaningful name for every known human gene, [4] [5] based on a query of experts. In addition to the name, which is usually 1 to 10 words long, the HGNC also assigns a symbol (a short group of characters) to every gene. As with an SI symbol, a gene symbol is like an abbreviation but is more than that, being a second unique name that can stand on its own just as much as substitute for the longer name. It may not necessarily "stand for" the initials of the name, although many gene symbols do reflect that origin.

Contents

Purpose

Full gene names, and especially gene abbreviations and symbols, are often not specific to a single gene. A marked example is CAP which can refer to any of 6 different genes ( BRD4 Archived 2013-10-27 at the Wayback Machine , CAP1 Archived 2013-11-02 at the Wayback Machine , HACD1 Archived 2013-10-07 at the Wayback Machine , LNPEP Archived 2012-09-13 at the Wayback Machine , SERPINB6 Archived 2013-10-08 at the Wayback Machine , and SORBS1 Archived 2012-10-12 at the Wayback Machine ).

The HGNC short gene names, or gene symbols, unlike previously used or published symbols, are specifically assigned to one gene only. This can result in less common abbreviations being selected but reduces confusion as to which gene is referred to.

Naming guidelines

The HGNC published its latest human gene naming guidelines in 2020. [5] These may be summarized as: [6]

  1. gene symbols must be unique
  2. symbols should only contain Latin letters and Arabic numerals
  3. symbols should not contain punctuation or "G" for gene
  4. symbols do not contain any reference to the species they are encoded in, i.e. "H/h" for human

The HGNC states that "gene nomenclature should evolve with new technology rather than be restrictive, as sometimes occurs when historical and single gene nomenclature systems are applied." [7] The HGNC has also issued guides to specific locus types such as endogenous retroviral loci, [8] structural variants [9] and non-coding RNAs. [10] [11] [12]

Naming procedure

When assigning new gene nomenclature the HGNC make efforts to contact authors who have published on the human gene in question by email, and their responses to the proposed nomenclature are requested. HGNC also coordinates with the related Mouse and Rat Genomic Nomenclature Committees, other database curators, and experts for given specific gene families or sets of genes.

Revision

The gene name revision procedure is similar to the naming procedure, but changing a standardized gene name after establishment of a consensus can create confusion, therefore the merit of this is controversial. For this reason the HGNC aims to change a gene name only if agreement for that change can be reached among a majority of researchers working on that gene.

See also

Related Research Articles

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

In molecular biology, small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. There are two main classes of snoRNA, the C/D box snoRNAs, which are associated with methylation, and the H/ACA box snoRNAs, which are associated with pseudouridylation. SnoRNAs are commonly referred to as guide RNAs but should not be confused with the guide RNAs that direct RNA editing in trypanosomes or the guide RNAs (gRNAs) used by Cas9 for CRISPR gene editing.

The solute carrier (SLC) group of membrane transport proteins include over 400 members organized into 66 families. Most members of the SLC group are located in the cell membrane. The SLC gene nomenclature system was originally proposed by the HUGO Gene Nomenclature Committee (HGNC) and is the basis for the official HGNC names of the genes that encode these transporters. A more general transmembrane transporter classification can be found in TCDB database.

Secretoglobins (SCGBs) are a family of small, alpha-helical, disulfide linked, dimeric proteins found only in mammals. This family was formerly known as the Uteroglobin/Clara cell 10-kDa family, after the two aliases of its founding member Uteroglobin.

Gene nomenclature is the scientific naming of genes, the units of heredity in living organisms. It is also closely associated with protein nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international committee published recommendations for genetic symbols and nomenclature in 1957. The need to develop formal guidelines for human gene names and symbols was recognized in the 1960s and full guidelines were issued in 1979. Several other genus-specific research communities have adopted nomenclature standards, as well, and have published them on the relevant model organism websites and in scientific journals, including the Trends in Genetics Genetic Nomenclature Guide. Scientists familiar with a particular gene family may work together to revise the nomenclature for the entire set of genes when new information becomes available. For many genes and their corresponding proteins, an assortment of alternate names is in use across the scientific literature and public biological databases, posing a challenge to effective organization and exchange of biological information. Standardization of nomenclature thus tries to achieve the benefits of vocabulary control and bibliographic control, although adherence is voluntary. The advent of the information age has brought gene ontology, which in some ways is a next step of gene nomenclature, because it aims to unify the representation of gene and gene product attributes across all species.

<span class="mw-page-title-main">HSP90AB1</span> Protein-coding gene in the species Homo sapiens

Heat shock protein HSP 90-beta also called HSP90beta is a protein that in humans is encoded by the HSP90AB1 gene.

<span class="mw-page-title-main">DHRS3</span> Protein-coding gene in the species Homo sapiens

Short-chain dehydrogenase/reductase 3 is an enzyme that in humans is encoded by the DHRS3 gene.

<span class="mw-page-title-main">Mitochondrial ribosomal protein L40</span> Protein-coding gene in the species Homo sapiens

39S ribosomal protein L40, mitochondrial is a protein that in humans is encoded by the MRPL40 gene.

<span class="mw-page-title-main">KRT71</span> Protein-coding gene in humans

KRT71 is a keratin gene. Keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into epithelial keratins and hair keratins. This gene encodes a protein that is expressed in the inner root sheath of hair follicles. The type II keratins are clustered in a region of chromosome 12q13.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

<span class="mw-page-title-main">EGOT (gene)</span>

EGOT, also known as Eosinophil Granule Ontogeny (EGO)† Transcript, is a human gene at 3p26.1 that produces a long noncoding RNA molecule. EGOT is nested within an intron of the inositol triphosphate receptor type 1 (ITPR1) gene. The EGOT transcript is expressed during eosinophil development and is possibly involved in regulating eosinophil granule protein expression. Comparison of EGO-B, the spliced isoform, suggests EGOT may be conserved across placental mammals.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

<span class="mw-page-title-main">SERPINA2</span> Protein-coding gene in the species Homo sapiens

Serpin peptidase inhibitor, clade A, member 2 is a protein that in humans is encoded by the SERPINA2 gene. Serine peptidase inhibitor, clade A member 2 belongs to the member of serine family of proteins which have a functional activity of inhibiting serine proteases.

<span class="mw-page-title-main">Kelch-like protein 2</span> Protein-coding gene in the species Homo sapiens

Kelch-like family member 2 is a protein that in humans is encoded by the KLHL2 gene.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks.

<span class="mw-page-title-main">Phyllis McAlpine</span> Canadian geneticist

Phyllis Jean McAlpine was a Canadian geneticist. She was a pioneer in mapping the human genome and served as Chair of the HUGO Gene Nomenclature Committee.

References

  1. Braschi B, Denny P, Gray K, Jones T, Seal R, Tweedie S, et al. (January 2019). "Genenames.org: the HGNC and VGNC resources in 2019". Nucleic Acids Research. 47 (D1): D786–D792. doi:10.1093/nar/gky930. PMC   6324057 . PMID   30304474.
  2. Wright MW, Eyre TA, Lush MJ, Povey S, Bruford EA (November 2005). "HCOP: the HGNC comparison of orthology predictions search tool". Mammalian Genome. 16 (11): 827–8. doi:10.1007/s00335-005-0103-2. PMID   16284797. S2CID   1091618.
  3. Eyre TA, Wright MW, Lush MJ, Bruford EA (January 2007). "HCOP: a searchable database of human orthology predictions". Briefings in Bioinformatics. 8 (1): 2–5. doi:10.1093/bib/bbl030. PMID   16951416.
  4. "About the HGNC | HUGO Gene Nomenclature Committee". Archived from the original on 2023-03-26. Retrieved 2018-03-23.
  5. 1 2 Bruford, Elspeth A.; Braschi, Bryony; Denny, Paul; Jones, Tamsin E. M.; Seal, Ruth L.; Tweedie, Susan (August 2020). "Guidelines for human gene nomenclature". Nature Genetics. 52 (8): 754–758. doi:10.1038/s41588-020-0669-3. PMC   7494048 . PMID   32747822.
  6. "HGNC Guidelines | HUGO Gene Nomenclature Committee". www.genenames.org. Retrieved 26 April 2021.
  7. Shows TB, McAlpine PJ, Boucheix C, Collins FS, Conneally PM, Frézal J, et al. (1987). "Guidelines for human gene nomenclature. An international system for human gene nomenclature (ISGN, 1987)". Cytogenetics and Cell Genetics. 46 (1–4): 11–28. doi:10.1159/000132471. PMC   7494048 . PMID   3507270.
  8. Mayer J, Blomberg J, Seal RL (May 2011). "A revised nomenclature for transcribed human endogenous retroviral loci". Mobile DNA. 2 (1): 7. doi: 10.1186/1759-8753-2-7 . PMC   3113919 . PMID   21542922.
  9. Seal RL, Wright MW, Gray KA, Bruford EA (May 2013). "Vive la différence: naming structural variants in the human reference genome". Human Genomics. 7 (1): 12. doi: 10.1186/1479-7364-7-12 . PMC   3648363 . PMID   23634723.
  10. Wright MW, Bruford EA (January 2011). "Naming 'junk': human non-protein coding RNA (ncRNA) gene nomenclature". Human Genomics. 5 (2): 90–8. doi: 10.1186/1479-7364-5-2-90 . PMC   3051107 . PMID   21296742.
  11. Wright MW (April 2014). "A short guide to long non-coding RNA gene nomenclature". Human Genomics. 8 (1): 7. doi: 10.1186/1479-7364-8-7 . PMC   4021045 . PMID   24716852.
  12. Seal R, Chen L, Griffiths-Jones S, Lowe TM, Mathews MB, O'Reilly D, Pierce AJ, Stadler PF, Ulitsky I, Wolin SL, Bruford EA (Feb 2020). "A guide to naming human non-coding RNA genes". EMBO J. 39 (6): e103777. doi:10.15252/embj.2019103777. PMC   7073466 . PMID   32090359.