C1orf167

Last updated

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. [1] The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length. [2] [3]

Contents

Gene

Locus

It can be located on chromosome 1 at position 1p36.22 on the plus strand and spans from positions 11,824,457 to 1,849,503. [2] [4]

Aliases

C1orf167 has one known alias with the name Chromosome 1 Open Reading Frame 167. [5]

Number of Exons

There are 26 exons associated with the protein. [1]

mRNA

Alternative Splicing

A splice region that is conserved in primate orthologs of the C1orf167 mRNA was located between exon 1 and exon 2. [6]

Known mRNA Isoforms

The mRNA sequence has 8 known splice isoforms as determined by the conserved domains. [7] The isoforms span the regions 426-863, 981-1418, 954-1391, 999-1329, 999-1400, 999-1436, 999-1404. and 999-1463 of the mRNA sequence. [8]

Protein

Conceptual Translation of C1orf167 showcasing the conserved Domain of Unknown Function that begins at the break between exon 13 and exon 14. Conceptual translation table without red lines.png
Conceptual Translation of C1orf167 showcasing the conserved Domain of Unknown Function that begins at the break between exon 13 and exon 14.

Known Protein Isoforms

Alternative splicing produces two known isoforms of the human protein. They are XP_006711141.1 which is 1489aa in length and XP_003307860.2 which is 713aa in length. [9] [10]

Composition

The protein has an isoelectric point (pI) of 11. The predicted molecular weight (mW) is 160kDa for the human protein, but ranges from 140-180kDa for more distant orthologs. [11] Compositional analysis revealed the most abundant amino acid to be Alanine (A) at 12.4% of the total protein. The analysis also revealed C1orf167 protein to be rich in Tryptophan (W) and deficient in Tyrosine (Y) and Isoleucine (I). [12]

Subcellular Localization

C1orf167 is predicted to be localized to the cell nucleus. [13]

Post-Translational Modifications

C1orf167 is predicted to undergo phosphorylation, O-Glycosylation, SUMOylation, glycation, and cleavage by staphylococcal peptidase I (Q105, Q321) and Glutamyl endopeptidase (Q1101). [14] [15] [16] [17] [18]

Species
H. sapiensT. manatus latirostrisU. parryiiD. novaehollandiaeP. vitticepsC. milli
SUMOylationK22IVTLE447-451,

K604,

K605,

VRVVP 684-688,

VAVVD502-506K434K57,K128,K578,

K993, K1388

ISILH 121-125,

K264,K477,

K497, K522

IVSIC 621-625

LCLVY 703-707

VVVLR 975-979,

VLQLR 1027-1031

K1199

K1208

O-GlcNAcylationMany*Similar Distribution (but more sites)Similar Distribution

(but fewer sites)

Similar Distribution

(but fewer sites)

Similar DistributionSimilar Distribution

(but fewer sites)

Glycation of ε amino groups of lysinesK -22, 114, 323,399,

433,505,

701, 710,720,

832,975,

1138,1279,

1306,1394, 1418

K-335,516,

534,605,

747,757,

1080,1125,

1189, 1382

K-114, 123,333,

462,651,

660,661, 938, 1111, 1149

K-72,103,128,

133, 183,240,241,

248,290,398,

437,466,483,

494,505,552,

589,718, 767,772,820, 974,1106

K-14,57,60,89,96,

128,133,157,275,

423,488,578,619,

647,890,900,952,

983,993,1208,1279, 1288,

K-4,56,106,131,163,169,

177,235,291, 480,

566,660,666,717,

780,814,827, 853,

857, 936, 954,

964,974, 986,

1015, 1079, 1208

Nuclear Export SignalL84L808L84L589V869, L874L186, L188, L1117
PhosphorylationMany*Similar DistributionSimilar DistributionSimilar DistributionSimilar DistributionSimilar Distribution
Proteinase Cleavage SitesQ105, Q321, Q1101Q441, Q1030Q72Q60Q90, Q155, Q498Q520, Q809, Q908,

Table 1. Post-Translational Modifications determined for C1orf167.

Schematic Illustration of predicted post-translational modifications for C1orf167 made using the Dog 2.0 The DUF at locations 954-1418 is labeled Dog2.0editedx2.png
Schematic Illustration of predicted post-translational modifications for C1orf167 made using the Dog 2.0 The DUF at locations 954-1418 is labeled

*GPS, NetPhos results indicated hyper-phosphorylation of C1orf167 in H. sapiens and five of its orthologs.

Domain and Motifs by Homology

One domain of unknown function, located from 954aa-1418aa, is 465 amino acids in length.

Secondary Structure

C1orf167 was determined to be rich in alpha helices. No notable regions of beta pleated sheets or coils were predicted. [20] In particular, high confidence was indicated for 42 alpha helices with the longest alpha helix region spanning from residues 450aa to 1182aa. This long alpha helix region includes a significant portion of the conserved DUF which spans 954aa-1418aa. [21] [22] [23] [24] [25]

Tertiary Structure

The best-aligned structural analog, generated by I-TASSER, of C1orf167 had a confidence (c-score) score of -0.68 given a range of [-5,2] with higher values indicating a higher confidence. [25] Per Swiss Model, two monomers are predicted to form an alpha helix. [26] Both of the helices are aligned facing outwards with hydrophobic amino acids such as glutamic acid (E) on the interior and asparagine (R), Serine (and lysine (K) on the exterior. Asparagine residues may serve as an important oligosaccharide binding site. [27]

Expression

C1orf167 has high expression in the larynx, blood, placenta, testis and prostate, with the highest expression found in the testis. [28] The promoter GXP_5109290 spans 1507 base pairs on chromosome 1. [29] GXP_5109290 was found to be conserved in the bonobo (Pan Paniscus), gorilla (Gorilla Gorilla Gorilla), mouse (Mus musculus), chimp (Pan Troglodytes), and rhesus monkey (Macaca mulata). [30] [31]

Protein Interactions

There were 10 interactions identified by STRING. [32]

Homology

Paralogs

No known paralogs or paralogous domains were identified for C1orf167.

Orthologs

Using NCBI BLAST, orthologs of C1orf167 were determined. No orthologs could be found in single-celled organisms, or fungi whose genomes have been sequenced. In terms of multi-cellular organisms, orthologs were found in mammals, aves, reptiles, and cartilaginous fishes. The table below shows a representative sample of 20 of the orthologs for C1orf167. The table is organized based on the time of divergence from humans in millions of years (MYA) and then by sequence similarity.

Genus and SpeciesCommon NameTaxonomic GroupDate of DivergenceAccession #Sequence LengthSequence IdentitySequence Similarity
Homo sapiensHumansMammalia0NP_001010881.11449aa100%100%
Pan troglodytesChimpanzeeMammalia (primate)6.6XP_024212133.11442 aa97%97%
Piliocolobus tephroscelesUgandan Red ColobusMammalia (primate)29XP_026303745.11453aa87%90%
Macaca fascicularisCrab-eating MacaqueMammalia (primate)29.4XP_015298104.11444aa87%90%
Trichechus manatus latirostrisAmerican ManateeMammalia (sirenia)76XP_023587965.11631aa49%56%
Marmota flaviventrisYellow-bellied MarmotMammalia (rodentia)90XP_027803235.11284aa49.16%57%
Galeopterus variegatusSunda Flying LemurMammalia (primate)90XP_008588133.11439aa54%60%
Camelus ferusBactrian CamelMammalia (artiodactyla)90XP_014421294.11442aa53%62%
Miniopterus natalensisNatal Clinging BatMammalia (chiroptera)96XP_016061116.11644aa48.64%56%
Desmodus rotundusCommon Vampire BatMammalia (chiroptera)96XP_024410696.11548aa47.97%56%
Ictidomys tridecemlineatusThirteen-lined Ground SquirrelMammalia (rodentia)96XP_021576066.11349aa47.59%56%
Urocitellus parryiiArctic Ground SquirrelMammalia (rodentia)96XP_026253666.11299aa46.47%55%
Myotis brandtiiBrandt's BatMammalia (chiroptera)105XP_014400940.11390aa50.19%59%
Dromaius novaehollandiaeEmuAves312XP_025951247.11154aa31.56%47%
Pseudopodoces humilisGround TitAves312XP_014112713.11415aa30.34%47%
Columba liviaRock DoveAves312XP_021137589.11430aa30.45%46%
anser cygnoides domesticusSwan GooseAves312XP_013043263.11126aa27%40%
Alligator sinensisChinese AlligatorReptilia312XP_025067177.11626aa34%45%
Pogona vitticepsCentral Bearded DragonReptilia312XP_020637641.11388aa27.76%38%
Callorhinchus miliiAustralian GhostsharkChondrichthyes473XP_007896104.11210aa29%43%

Table 2. This table shows the divergence timeline of the C1orf167 orthologs. It is sorted by date of divergence, color according to taxonomic group or class and then by sequence similarity.

Multiple Sequence Alignment of Strict Orthologs for C1orf167. Beginning of the conserved DUF region at the break between exon 13 and 14 is shown. Correct MSA screenshot.png
Multiple Sequence Alignment of Strict Orthologs for C1orf167. Beginning of the conserved DUF region at the break between exon 13 and 14 is shown.

Function

At this time the function of C1orf167 is uncharacterized.

Clinical Significance

Pathology

According to the EST profile for breakdown by healthy state, the expression levels of C1orf167 were higher than healthy cells for leukemia, head, neck and lung cancers. [28] Based on the results from NCBI GeoProfiles, C1orf167 was found to have increased expression on dendritic cells for patients experiencing Chlamydia pneumoniae infections. Increased expression of C1orf167 was also indicated for Human Pulmonary Tuberculosis tissues given the presence of caseous tuberculosis granulomas in the lungs when compared to normal lung tissues. [34]

Related Research Articles

CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.

<span class="mw-page-title-main">C17orf53</span>

C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">C16orf86</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

<span class="mw-page-title-main">C20orf202</span>

C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.

<span class="mw-page-title-main">C17orf78</span> Mammalian protein found in Homo sapiens

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">LSMEM2</span> Protein-coding gene in the species Homo sapiens

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">C14orf180</span> Protein-coding gene in the species Homo sapiens

C14orf180 is found on chromosome 14 in humans: 14q32.33. It consists of 1832 bp and 160 amino acids post translation. There is a total number of 6 exons. C14orf180 is also known as NRAC, C14orf77, and Chromosome 14 Open Reading Frame 180.

<span class="mw-page-title-main">C11orf98</span> Protein-coding gene in the species Homo sapiens

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

Chromosome 4 open reading frame 50 is a protein that in humans is encoded by the C4orf50 gene. The protein localizes in the nucleus. C4orf50 has orthologs in vertebrates but not invertebrates

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

References

  1. 1 2 NCBI. "C1orf167 chromosome 1 open reading frame 167 [ Homo sapiens (human) ])". NCBI. Retrieved February 9, 2019.
  2. 1 2 "C1orf167 Gene". www.genecards.org. Retrieved 9 February 2019.
  3. "Homo sapiens chromosome 1 open reading frame 167 (C1orf167), mRNA". NCBI. 30 June 2018. Retrieved 8 February 2019.
  4. "RCSB PDB - Gene View - C1orf167 - chromosome 1 open reading frame 167". www.rcsb.org. Archived from the original on 2019-05-05. Retrieved 2019-03-04.
  5. "C1orf167 chromosome 1 open reading frame 167 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-04-22.
  6. "Genome Browser FAQ". genome.ucsc.edu. Retrieved 2019-04-22.
  7. "C1orf167 GeneCards".
  8. "C1orf167 chromosome 1 open reading frame 167 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-04-27.
  9. "C1orf167 (human)". www.phosphosite.org. Retrieved 2019-03-04.
  10. "HomoloGene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-03-04.
  11. "ExPASy: SIB Bioinformatics Resource Portal - Categories". www.expasy.org. Retrieved 2019-04-27.
  12. "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-04-27.
  13. "PSORT II Tool". PSORT II.[ permanent dead link ]
  14. "SUMOplot analysis program". SUMOplot. Archived from the original on 2005-01-03. Retrieved 2019-05-05.
  15. "GPS 3.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.org. Retrieved 2019-04-22.
  16. "YinOYang O-GLcNAc sties". YinOYang.
  17. "NetOGlyc 4.0 Server". www.cbs.dtu.dk. Retrieved 2019-04-22.
  18. "C1orf167 NetCorona entry".
  19. "DOG 2.0 - Protein Domain Structure Visualization". dog.biocuckoo.org. Retrieved 2019-05-02.
  20. "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2019-04-22.
  21. "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2019-04-22.
  22. "Phyre2 Database". Phyre2.
  23. "SOPMA secondary prediction".
  24. "GOR protein prediction".
  25. 1 2 "I-TASSER results". zhanglab.ccmb.med.umich.edu. Archived from the original on 2019-05-05. Retrieved 2019-05-05.
  26. "SWISS-MODEL Interactive Workspace". swissmodel.expasy.org. Retrieved 2019-05-05.
  27. Kornfeld, R.; Kornfeld, S. (1985). "Assembly of asparagine-linked oligosaccharides" (PDF). Annual Review of Biochemistry. 54: 631–664. doi:10.1146/annurev.bi.54.070185.003215. PMID   3896128.
  28. 1 2 "EST Profile - Hs.585415". www.ncbi.nlm.nih.gov. Retrieved 2019-04-22.
  29. "ElDorado Introduction". www.genomatix.de. Archived from the original on 2016-06-02. Retrieved 2019-04-22.
  30. "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2019-04-22.
  31. "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-04-22.
  32. "C1orf167 protein (human) - STRING interaction network". string-db.org. Retrieved 2019-04-19.
  33. "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-05-01.
  34. "c1orf167 - GEO Profiles - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-01.