HIKESHI

Last updated
HIKESHI
Available structures
PDB Ortholog search: PDBe RCSB
Identifiers
Aliases HIKESHI , HSPC179, Hikeshi, L7RN6, OPI10, HSPC138, C11orf73, HLD13, chromosome 11 open reading frame 73, Hikeshi, heat shock protein nuclear import factor, heat shock protein nuclear import factor hikeshi
External IDs OMIM: 614908 MGI: 96738 HomoloGene: 6908 GeneCards: HIKESHI
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_016401
NM_001322404
NM_001322407
NM_001322409

NM_001291286
NM_001291287
NM_001291288
NM_001291289
NM_026304

Contents

RefSeq (protein)

NP_001309333
NP_001309336
NP_001309338
NP_057485

NP_001278215
NP_001278216
NP_001278217
NP_001278218
NP_080580

Location (UCSC) Chr 11: 86.3 – 86.35 Mb Chr 7: 89.57 – 89.59 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

HIKESHI is a protein important in lung and multicellular organismal development [5] that, in humans, is encoded by the HIKESHI gene. [6] HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene. [7]

Gene

HIKESHI is a protein-coding gene in Homo sapiens. Alternate names for the gene are FLJ43020, HSPC138, HSPC179, and L7RN6. Located on long arm of chromosome 11 at area q14.2, the entire gene including introns and exons is 42,698 base pairs on the plus strand. The mRNA of HIKESHI Variant 1 includes exons 1, 3, 4, 5, and 7 amounting to 1,183 base pairs, with base pairs 239 to 832 representing the coding regions.

Alternative Splicing

Variant 1 is the longest and most common protein coding variant. The three other main variants use an alternate exon sequence that throws off the reading frame, causing early termination of the mRNA sequence and undergoes protein decay. The table below shows the different variants and exon usage.

VariantExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7Protein Coding
1xxxxxYes
2xxxxxxNo
3xxxxxNo
4xxxxNo

The four variants shown in the table above are the most common isoforms found in human cells. There are a total of 13 alternatively spliced sequences and three unspliced forms that utilize two alternative promoters. The mRNA variants differ on the combination of 8 different exons, alternate, overlapping exons, and the retention of introns. Besides alternative splicing, the mRNAs differ by truncation on the 3’ end. Variant 1 is one of ten mRNAs that has been shown to code for a protein, while the rest seem bound for nonsense mediated mRNA decay. AceView [8] representation of C11orf73 isoforms Isoforms of C11orf73.jpg

Promoter

The Promoter region, GXP 47146, was found using the ElDorado [9] tool from Genomatix. The 840 bp sequence is located before the HIKESHI gene at DNA points 86012753 to 86013592. The promoter is conserved in 12 of 12 orthologs and codes for 6 relevant transcripts.

Conserved transcription factor binding sites from Genomatix ElDorado tool:

Detailed Family InformationFromToAnchorOrientationConserved in Mus MusculusMatrix SimSequenceOccurrence
Cell cycle regulators: Cell cycle homology element137149143+ strandconserved0.943ggacTTGAattca1
GATA binding factors172184178+ strandconserved0.946taaAGATttgagg1
Vertebrate TATA binding protein factor193209201+ strandconserved0.983tcctaTAAAatttggat1
Heat schock factors291315303+ strandconserved0.992cacagaaacgttAGAAgcatctctt4
Human and murine ETS1 factors512532522+ strandconserved0.984taagccccGGAAgtacttgtt3
Zinc finger transcription factor RU49, Zipro1522528525+ strandconserved0.989aAGTAct2
Krueppel like transcription factors618634626+ strandconserved0.925tggaGGGGcagacaccc1
SOX/SRY-sex/testis determining and HMG box factors636658647+ strandconserved0.925cccgcaAATTctggaaggttctt1
Predicted promoter region of C11orf73 Promoter C11orf73.jpg
Predicted promoter region of C11orf73

Termination

Termination of the mRNA product is encoded for within the cDNA of the gene. The end termination of an mRNA product generally has three main features: the poly A signal, the poly A tail, and an area of sequence that can form a stem loop structure. The poly A signal is a highly conserved site, six nucleotide long sequence. In eukaryotes the sequence is AATAAA and is located about 10–30 nucleotides from the poly A site. The AATAAA sequence is a highly conserved, eukaryotic polyA signal that signals for polyadenylation of the mRNA product 10–30 base pairs after the signal sequence. The polyA site for C11orf73 is GTA.

Gene expression

HIKESHI was determined to be expressed ubiquitously at a high level of 2.3 times above the average. C11orf73 is expressed in a large number of human tissues. [10] [11] Between the Expression Profiles and the EST Profile on UniGene, only 11 tissues were shown not to express C11orf73, most likely due to small sample sizes in the tissue.

Protein

The human HIKESHI gene encodes for a protein called uncharacterized protein C11orf73. [6] The homologous mouse L7rn6 gene encodes a protein called lethal gene on chromosome 7 Rinchik 6. [7]

1 mfgclvagrl vqtaaqqvae dkfvfdlpdy esinhvvvfm lgtipfpegm ggsvyfsypd 61 sngmpvwqll gfvtngkpsa ifkisglksg egsqhpfgam nivrtpsvaq igisvellds 121 maqqtpvgna avssvdsftq ftqkmldnfy nfassfavsq aqmtpspsem fipanvvlkw 181 yenfqrrlaq nplfwkt 

The encoded human protein is 197 amino acids long and weighs 21,628 daltons. Through analogy to the mouse protein, the hypothetical function of the human HIKESHI protein is the organization and function of the secretory apparatus in lung cells. [5]

Protein of unknown function (DUF775)
Identifiers
SymbolDUF775
Pfam PF05603
InterPro IPR008493
Available protein structures:
Pfam   structures / ECOD  
PDB RCSB PDB; PDBe; PDBj
PDBsum structure summary

The protein domain known as DUF775 (Domain of Unknown Function 775) is located within both the human HIKESHI and mouse L7rn6 proteins. The DUF775 domain is 197 amino acids long, the same length as the protein. Other proteins that make up the DUF 775 super family by definition include all the orthologs of C11orf73.

Hydropathy analysis shows that there are no extensive hydrophobic regions in the protein and, hence, it is concluded that HIKESHI is a cytoplasmic protein. The isoelectric point for C11orf73 is 5.108 suggesting it functions optimally in a more acidic environment.

Hydropathy Plot for C11orf73 HydropathyPlotC11orf73.JPG
Hydropathy Plot for C11orf73

[12]

SNP

The only SNP, [13] or single-nucleotide polymorphism, for the C11orf73 sequence results in an amino acid change within the protein. The lack of other SNPs are most likely due to the high level of conservation of HIKESHI and the lethal effect a mutation in the protein bestows upon the organism. The phenotype for the SNP is unknown.

FunctiondbSNP AlleleProtein ResidueCodon PositionAmino Acid Position
ReferenceCProline [P]147
MissenseGAlanine [A]147

Gene Neighborhood

The surrounding genes of HIKESHI are CCDC81, ME3, and EED. The genetic neighborhood is looked at in order to get a better understanding of the possible function of the gene by looking at the function of the surrounding genes.

Gene neighborhood Chromosome 11q14.2 GeneNeighborhoodC11orf73.JPG
Gene neighborhood Chromosome 11q14.2

[14]

The CCDC81 gene codes for an uncharacterized protein product and is oriented on the plus strand. CCDC81stands for coiled-coil domain containing 81 isoform 1.

The ME3 gene stands for mitochondrial malic enzyme 3 precursor. Malic enzyme catalyzes the oxidative decarboxylation of malate to pyruvate using either NAD+ or NADP+ as a cofactor. Mammalian tissues contain 3 distinct isoforms of malic enzyme: a cytosolic NADP(+)-dependent isoform, a mitochondrial NADP(+)-dependent isoform, and a mitochondrial NAD(+)-dependent isoform. This gene encodes a mitochondrial NADP(+)-dependent isoform. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. [15]

The EED gene stands for embryonic ectoderm development isoform b and is a member of the Polycomb-group (PcG) family. PcG family members form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. This protein interacts with enhancer of zeste 2, the cytoplasmic tail of integrin beta7, immunodeficiency virus type 1 (HIV-1) MA protein, and histone deacetylase proteins. This protein mediates repression of gene activity through histone deacetylation, and may act as a specific regulator of integrin function. Two transcript variants encoding distinct isoforms have been identified for this gene. [16]

Interactions

The programs STRING [17] and Sigma-Aldrich's Favorite Gene [18] suggested possible protein interactions with C11orf73. ARGUL1, CRHBP, and EED were derived from textmining and HNF4A came from Sigma-Aldrich.

ProteinDescriptionMethodScore
ARGUL1UnknownTextmining0.712
CRHBPCorticotropin releasing hormone binding proteinTextmining0.653
EEDEmbryonic ectoderm developmentTextmining0.420
HNF4ATranscription regulatorSigma-AldrichN/A

ARGUL1 is an unknown protein with an unknown function. CRHBP is a corticotrophin releasing hormone binding protein which could possibly play a role in a signal cascade that involves or activates HIKESHI. EED, a neighboring protein of C11orf73, is an embryonic ectoderm development protein and is a member of the Polycomb-group (PcG) family. PcG family members form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. HNF4A is a transcription regulator and it is unknown if HNF4A regulates C11orf73's expression or simply interacts with it. [12

Evolutionary History

Phylogenetic tree of C11orf73. PhylogeneticTreeC11orf73.jpg
Phylogenetic tree of C11orf73.

The evolutionary history of organisms can be determined using the sequences of orthologs as time references to create a phylogenetic tree. The CLUSTALW [19] compares multiple sequences, the program can also be used to create such a phylogenetic tree based on the orthologs of C11orf73. The tree to the right shows the generated phylogenetic tree with a time line based on time of divergence. The tree made from the HIKESHI orthologs is identical to the literature phylogenetic tree, even grouping together similar organisms such as fish, birds, and fungi.

Orthologs

Homologous sequences are orthologous if they were separated by a speciation event: when a species diverges into two separate species, the divergent copies of a single gene in the resulting species are said to be orthologous. Orthologs, or orthologous genes, are genes in different species that are similar to each other because they originated from a common ancestor. Orthologous sequences provide useful information in taxonomic classification and phylogenetic studies of organisms. The pattern of genetic divergence can be used to trace the relatedness of organisms. Two organisms that are very closely related are likely to display very similar DNA sequences between two orthologs. Conversely, an organism that is further removed evolutionarily from another organism is likely to display a greater divergence in the sequence of the orthologs being studied.

Table of Chromosome 11 open reading frame 73 Orthologs

SpeciesCommon NameProtein NameAccession NumberNT LengthNT IdentityAA LengthAA IdentityE-Value
Homo sapiensHumanC11orf73 NM_016401 1187 bp100%197 aa100%0
Bon taurusCowLOC504867 NP_001029398 996 bp73.60%197 aa98%5.30E-84
Mus musculusMousel7Rn6 NP_080580 1045 bp72.90%197 aa97%4.80E-83
Gallus gallusChickenLOC427034N/A851 bp56.20%197 aa88.3%5.60E-76
Taeniopygia guttataZebra FinchLOC100190155 ACH44077 997 bp61.60%997 aa87.80%1.20E-75
Xenopus laevisFrogMGC80709 NP_001087012 2037 bp36.50%197 aa86.80%1.70E-75
Oncorhynchus mykissRainbow TroutCK073 NP_001158574 940 bp52.20%197 aa75.10%2.70E-66
Tetradon nigroviridisTetradonunnamed protein product CAF89643 N/AN/A197 aa70.90%1.40E-61
Trichoplax adhaerensTrichoplax adhaerensTRIADDRAFT_19969 XP_002108733 600 bp33.10%199 aa52.30%2.00E-47
Culex quinquefasciatusMosquitoconserved hypothetical protein XP_001843282 594 bp30.70%197 aa49.30%2.50E-41
Drosophilia melanogasterFlyCG13926 NP_647633 594 bp31.50%197 aa48.50%4.50E-39
Laccaria bicolorMushroompredicted protein XP_001878996 696 bp36.40%202 aa35.20%8.30E-24
Candida albicansFungiCaO19.13758 XP_716157 666 bp36.10%221 aa24%5.70E-11

The table shows the 13 sequences (12 orthologs, 1 original sequence) along with protein name, accession numbers, nucleotide identity, protein identity, and E-values. The accession numbers are the identification numbers from the NCBI Protein database. The nucleotide sequence can be accessed from the protein's sequence page from DBSOURCE, which gives the accession number and is a link to the nucleotide's sequence page. The length of both the nucleotide and protein sequence for each ortholog and its respective organism are listed in the table as well. Next to the sequence lengths are the identities of the ortholog to the original HIKESHI gene. The identities and E-values were acquired using the global alignment program, ALIGN, from the SDSC Biology Workbench and BLAST from NCBI.

The graph shows the percent identity of the ortholog against the divergence time of the organism to produce a mostly linear curve. The two main joints within the curve suggest times of gene duplication, around 450 million years and 1150 million years ago respectively. The paralogs from the gene duplications are probably so dissimilar from the highly conserved orthologs of HIKESHI that it was not found using the Blink or BLAST tools.

CLUSTALW of related orthologs CloseHomologCLUSTALWC11orf73.jpg
CLUSTALW of related orthologs
Graph of the percent identity of C11orf73 orthologs against the divergence time of the organism. IdentityvsDivergenceC11orf73.jpg
Graph of the percent identity of C11orf73 orthologs against the divergence time of the organism.

The value m (total number of amino acid changes that have occurred in a 100 amino acid segment), which is the corrected value of n (number of amino acid differences from the template sequence), is also used to calculate λ (the average amino acid changes per year, usually represented in values of λE9).

m/100 = –ln(1-n/100) λ = (m/100)/(2*T) 
CLUSTALW of distant orthologs DistantHomologCLUSTALWC11orf73.jpg
CLUSTALW of distant orthologs
Graph of number of amino acid changes vs. evolutionary divergence time. MvsDivergenceC11orf73.jpg
Graph of number of amino acid changes vs. evolutionary divergence time.

Related Research Articles

<span class="mw-page-title-main">TMEM260</span> Protein-coding gene in the species Homo sapiens

TMEM260 is a protein that in humans is encoded by the TMEM260 gene. The function of TMEM260 is not yet clearly understood. TMEM260 is also known as UPF0679, c14orf101, and FLJ0392.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">Fam78b</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 78-Member B (FAM78B) is a protein of unknown function in humans that is encoded by the FAM78B gene (1q24.1). It has orthologous genes and predicted proteins in vertebrates and several invertebrates, but not in arthropods. It has a nuclear localization signal in the protein sequence and a miRNA target region in the mRNA sequence.

TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.

<span class="mw-page-title-main">Transmembrane protein 268</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">C8orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.

C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">C17orf78</span> Mammalian protein found in Homo sapiens

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">TMEM101</span>

Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">TMEM212</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of 5 transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.

<span class="mw-page-title-main">C1orf159</span> Protein encoded on a gene

C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">Chromosome 12 open reading frame 71</span> Protein encoded in humans by c12orf71 gene

Chromosome 12 open reading frame 71 (c12orf71) is a protein which in humans is encoded by c12orf71 gene. The protein is also known by the alias LOC728858.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000149196 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000062797 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 Fernández-Valdivia R, Zhang Y, Pai S, Metzker ML, Schumacher A (January 2006). "l7Rn6 Encodes a Novel Protein Required for Clara Cell Function in Mouse Lung Development". Genetics. 172 (1): 389–99. doi:10.1534/genetics.105.048736. PMC   1456166 . PMID   16157679.
  6. 1 2 Zhang QH, Ye M, Wu XY, Ren SX, Zhao M, Zhao CJ, Fu G, Shen Y, Fan HY, Lu G, Zhong M, Xu XR, Han ZG, Zhang JW, Tao J, Huang QH, Zhou J, Hu GX, Gu J, Chen SJ, Chen Z (October 2000). "Cloning and Functional Analysis of cDNAs with Open Reading Frames for 300 Previously Undefined Genes Expressed in CD34+ Hematopoietic Stem/Progenitor Cells". Genome Res. 10 (10): 1546–60. doi:10.1101/gr.140200. PMC   310934 . PMID   11042152.
  7. 1 2 Rinchik EM, Carpenter DA (1993). "N-ethyl-N-nitrosourea-induced prenatally lethal mutations define at least two complementation groups within the embryonic ectoderm development (eed) locus in mouse chromosome 7". Mamm. Genome. 4 (7): 349–53. doi:10.1007/BF00360583. PMID   8358168. S2CID   24689449.
  8. AceView NCBI Gene Information AceView Archived November 28, 2005, at the Wayback Machine
  9. Genomatix ElDorade tool for promoter analysis ElDorado Product Page Archived October 6, 2008, at the Wayback Machine
  10. "Expression profile for C11orf73". GeneNote version 2.4. Weizmann Institute of Science. September 2009. Archived from the original on 2012-03-05. Retrieved 2010-03-27.
  11. "EST Profile - Hs.283322". National Center for Biotechnology Information, United States National Library of Medicine.
  12. Saier Lab Bioinformatics Group http://www.tcdb.org/progs/hydro.php
  13. NCBI SNP Database https://www.ncbi.nlm.nih.gov/snp/ Archived February 4, 2006, at the Wayback Machine
  14. NCBI Entrez https://www.ncbi.nlm.nih.gov/nuccore/NC_000011.9?from=86011067&to=86059171&report=graph
  15. RefSeq NCBI Database https://www.ncbi.nlm.nih.gov/RefSeq/ Archived April 11, 2006, at the Wayback Machine
  16. "RefSeq: NCBI Reference Sequence Database".
  17. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) http://string-db.org/ Archived July 26, 2010, at the Wayback Machine
  18. Sigma-Aldrich's Favorite Gene http://www.sigmaaldrich.com/life-science/your-favorite-gene-search.html
  19. CLUSTALW Program Julie D. Thompson, Desmond G. Higgins and Toby J. Gibson http://workbench.sdsc.edu/ Archived April 8, 2006, at the Wayback Machine