GeneRIF

Last updated

A GeneRIF or Gene Reference Into Function is a short (255 characters or fewer) statement about the function of a gene. GeneRIFs provide a simple mechanism for allowing scientists to add to the functional annotation of genes described in the Entrez Gene database. In practice, function is constructed quite broadly. For example, there are GeneRIFs that discuss the role of a gene in a disease, GeneRIFs that point the viewer towards a review article about the gene, and GeneRIFs that discuss the structure of a gene. However, the stated intent is for GeneRIFs to be about gene function. Currently over half a million geneRIFs have been created for genes from almost 1000 different species. [1]

Contents

GeneRIFs are always associated with specific entries in the Entrez Gene database. Each GeneRIF has a pointer to the PubMed ID (a type of document identifier) of a scientific publication that provides evidence for the statement made by the GeneRIF. GeneRIFs are often extracted directly from the document that is identified by the PubMed ID, very frequently from its title or from its final sentence.

GeneRIFs are usually produced by NCBI indexers, but anyone may submit a GeneRIF. To be processed, a valid Gene ID must exist for the specific gene, or the Gene staff must have assigned an overall Gene ID to the species. The latter case is implemented via records in Gene with the symbol NEWENTRY. Once the Gene ID is identified, only three types of information are required to complete a submission:

  1. a concise phrase describing a function or functions (less than 255 characters in length, preferably more than a restatement of the title of the paper);
  2. a published paper describing that function, implemented by supplying the PubMed ID of a citation in PubMed;
  3. a valid e-mail address (which will remain confidential).

Example

Here are some GeneRIFs taken from Entrez Gene for GeneID 7157, the human gene TP53. The PubMed document identifiers have been omitted from the examples. Note the wide variability with respect to the presence or absence of punctuation and of sentence-initial capital letters.

GeneRIFs are an unusual type of textual genre, and they have recently been the subject of a number of articles from the natural language processing community.

Related Research Articles

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the database as part of the Entrez system of information retrieval.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">GPR113</span>

GPR113 is a gene that encodes the Probable G-protein coupled receptor 113 protein.

<span class="mw-page-title-main">GPR162</span> Protein-coding gene in the species Homo sapiens

Probable G-protein coupled receptor 162 is a protein that in humans is encoded by the GPR162 gene.

<span class="mw-page-title-main">RECQL4</span> Protein-coding gene in the species Homo sapiens

ATP-dependent DNA helicase Q4 is an enzyme that in humans is encoded by the RECQL4 gene.

<span class="mw-page-title-main">KLK15</span> Protein-coding gene in the species Homo sapiens

Kallikrein-15 is a protein that in humans is encoded by the KLK15 gene.

<span class="mw-page-title-main">MYH14</span> Protein-coding gene in the species Homo sapiens

Myosin-14 is a protein that in humans is encoded by the MYH14 gene.

<span class="mw-page-title-main">OR2T12</span>

Olfactory receptor 2T12 is a protein that in humans is encoded by the OR2T12 gene.

<span class="mw-page-title-main">ALG12</span> Enzyme-coding gene in humans

Dolichyl-P-Man:Man(7)GlcNAc(2)-PP-dolichyl-alpha-1,6-mannosyltransferase is an enzyme that in humans is encoded by the ALG12 gene.

<span class="mw-page-title-main">ALG3</span> Protein-coding gene in the species Homo sapiens

Dolichyl-P-Man:Man(5)GlcNAc(2)-PP-dolichyl mannosyltransferase is an enzyme that, in humans, is encoded by the ALG3 gene.

<span class="mw-page-title-main">HPS6</span>

Hermansky–Pudlak syndrome 6 (HPS6), also known as ruby-eye protein homolog (Ru), is a protein that in humans is encoded by the HPS6 gene.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science.

<span class="mw-page-title-main">KIAA0895L</span>

Uncharacterized protein KIAA0895-like also known as LOC653319, is a protein that in humans is encoded by the KIAA0895L gene.

<span class="mw-page-title-main">DPM3</span> Protein-coding gene in the species Homo sapiens

dolichyl-phosphate mannosyltransferase polypeptide 3, also known as DPM3, is a human gene.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

The Conserved Domain Database (CDD) is a database of well-annotated multiple sequence alignment models and derived database search models, for ancient domains and full-length proteins.

<span class="mw-page-title-main">Fam158a</span> Protein-coding gene in the species Homo sapiens

UPF0172 protein FAM158A, also known as c14orf122 or CGI112, is a protein that in humans is encoded by the FAM158A gene located on chromosome 14q11.2.

<span class="mw-page-title-main">CAMK1D</span>

Calcium/calmodulin-dependent protein kinase ID is a protein in humans that is encoded by the CAMK1D gene on chromosome 10.

<span class="mw-page-title-main">NTN5</span> Protein-coding gene in the species Homo sapiens

Netrin-5 (NTN5), also known as netrin-1-like protein, is a protein that in humans is encoded by the NTN5 gene. Netrin-5 is included in the family of secreted laminin-related proteins.

References