C20orf202

Last updated
C20orf202
Ideogram human chromosome 20.svg
Chromosome 20
Identifiers
SymbolC20orf202
NCBI gene 400831
HGNC 37254
RefSeq NM_001009612.3
UniProt A1L168

C20orf202 (chromosome 20 open reading frame 202) is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta. [1]

Contents

Gene

C20orf202 is located on the plus strand of chromosome 20 at 20p13. [2] The gene is 4,826 base pairs long. It spans from chr20:1,184,098-1,188,918, and contains 2 exons. [3]

Transcript

There is one transcript of C20orf202. The mRNA sequence is 1,609 base pairs long. [4]

Protein

The protein encoded by C20orf202 is 122 amino acids in length with a predicted molecular mass of 13591 Da and a predicted isoelectric point of 9.13 pl/MW. [5] [6] C20orf202 contains PFAM domain DUF3461 at amino acids 3-67. [7] This domain codes for a protein of unknown function. The structure of C20orf202 consists of 46.72% random coils, 42.62% alpha helixes, and 10.66% extended strands. [8]

C20orf202 quartenary structure C20orf202 structure.png
C20orf202 quartenary structure

Regulation

Gene level regulation

C20orf202 transcription binding factor sites in the promoter C20orf202 promoter.jpg
C20orf202 transcription binding factor sites in the promoter

The C20orf202 promoter has many transcription factor binding sites, most notably at the beginning and end of the promoter. These sites are shown in the figure to the right and are listed below with their respective functions. [10]

C20orf202 transcription binding factor sites C20or202 transcription binding factor sites.png
C20orf202 transcription binding factor sites

Transcript level regulation

MicroRNA binding sites are only found in the 3' UTR of C20orf202. Most of these sites are found in the beginning or end of the 3' UTR, with many located in close proximity to each other. [11] These 3' UTR microRNA binding sites can be seen in the figure to the right.

C20orf202 3' UTR microRNA binding sites C20orf202 3' UTR microRNA binding sites.png
C20orf202 3' UTR microRNA binding sites

Protein level regulation

C20orf202 has many phosphorylation and glycosylation sites throughout the protein. [12] [13] A few of the phosphorylation sites are located in highly conserved regions of the protein.

Expression

In humans, C20orf202 has moderate mRNA abundance across cells types, though higher than average expression in the kidney and heart, it is not significantly so. [14] Additionally C20orf202 expression increases during fetal development of kidney, lung, and intestinal tissues. [15]

C20orf202 expression profile via NCBI GEO GDS3113. The graph shows expression levels of C20orf202 within various normal tissues within humans. The most notable higher expression levels are in the kidney and the heart, however not significantly so. Red bars indicate the absolute expression levels and blue dots are the percentile rank of gene expression within the sample. C20orf202 expression levels.png
C20orf202 expression profile via NCBI GEO GDS3113. The graph shows expression levels of C20orf202 within various normal tissues within humans. The most notable higher expression levels are in the kidney and the heart, however not significantly so. Red bars indicate the absolute expression levels and blue dots are the percentile rank of gene expression within the sample.

Homology

Paralogs

C20orf202 has three known paralogs- FAM167a, FAM167b, and AARD. [17]

C20orf202 paralog phylogeny C20orf202 paralogs.png
C20orf202 paralog phylogeny

Orthologs

C20orf202 orthologs can be found in major groups such as mammals, reptiles, amphibians, and distantly fish.

This table lists several orthologs for C20orf202 including genus and species, common name, taxonomic group, evolutionary date of divergence, accession number, sequence length, sequence identity, and sequence similarity.

C20orf202 orthologs
Genus and speciesCommon nameTaxonomic groupEstimated date of divergenceAccession numberLength (amino acids)Sequence identitySequence similarity
Homo sapiens Human Mammalia (Primata)0NP_001009612.1122100%100%
Pan troglodytes Chimpanzee Mammalia (Primata)6.7XP_001167776.112299%99%
Dipodomys ordii Ord's kangaroo rat Mammalia (Rodentia)90XP_012866108.115174%87%
Peromyscus maniculatus Deer mouse Mammalia (Rodentia)90XP_006985608.119268%81%
Fukomys damarensis Damaraland mole-rat Mammalia (Rodentia)90XP_010611217.113167%77%
Grammomys surdasterAfrican woodland thicket rat Mammalia (Rodentia)90XP_028630806.119060%71%
Sus scrofa Wild boar Mammalia (Artiodactyla)96XP_013840663.112384%90%
Monodon monoceros Narwhal Mammalia (Artiodactyla)96XP_029077716.115379%90%
Delphinapterus leucas Beluga whale Mammalia (Artiodactyla)96XP_022448507.115379%90%
Orcinus orca Orca Mammalia (Artiodactyla)96XP_004273009.113678%90%
Eptesicus fuscus Big brown bat Mammalia (Chiroptera)96XP_028000906.117979%88%
Vulpes vulpes Red fox Mammalia (Carnivora)96XP_025841919.116379%88%
Canis lupus familiaris Dog Mammalia (Carnivora)96XP_022265199.112577%87%
Lacerta agilis Sand lizard Reptilia (Squamata)312XP_033006940.19547%64%
Podarcis muralis Common wall lizard Reptilia (Squamata)312XP_028591365.19545%58%
Chelonoidis abingdonii Pinta Island tortoise Reptilia (Testudines)312XP_032622241.19846%61%
Alligator sinensis Chinese alligator Reptilia (Crocodilia)312XP_006030613.19842%59%
Nanorana parkeri High Himalaya frog Amphibia (Anura)351.8XP_018411922.111444%58%
Cyprinodon variegatus Sheepshead minnow Actinopterygii (Cyprinodontiformes)435XP_015232405.114654%72%
Cyprinus carpio Common carp Actinopterygii (Cyprinodontiformes)435KTF83963.119047%68%

Interacting proteins

Two hybrid prey pooling followed by two hybrid array revealed that C20orf202 is predicted to interact with two other proteins: SNAPAP and HNRNPCL1. [19] SNAPAP (SNARE-associated protein) has a role in the SNARE binding complex, and is also associated with Hermansky–Pudlak syndrome and tricuspid valve stenosis. [20] HNRNPCL1 (Heterogeneous Nuclear Ribonucleoprotein C Like 1) has a role in RNA binding and nucleosome assembly. [21]

Clinical significance

C20orf202 has been associated by GWAS to multiple sclerosis. [22] Additionally, through in silico analysis, C20orf202 was identified as being involved in chromosome 20p inv dup del syndrome, a syndrome similar to trisomy 20p. [23]

Related Research Articles

PRR29

PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating possibility of regulated alternate expression.

FAM231B, or family with sequence similarity 231B, is a protein found in humans and is encoded by FAM231B gene. Orthologs of FAM231B are only found back to primates.

Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.

C12orf60

Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.

C6orf62

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

C17orf53

C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.

C21orf58

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

C16orf46 Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

C15orf39

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

C9orf25

Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.

C19orf44

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

C16orf86

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

C9orf50

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

C1orf94

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

C3orf56 is a protein encoding gene found on chromosome 3. Although, the structure and function of the protein is not well understood, it is known that the C3orf56 protein is exclusively expressed in metaphase II of oocytes and degrades as the oocyte develops towards the blastocyst stage. Degradation of the C3orf56 protein suggests that this gene plays a role in the progression from maternal to embryonic genome and in embryonic genome activation.

FAM214B

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

C6orf136

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

References

  1. "C20orf202 chromosome 20 open reading frame 202 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov.
  2. "uncharacterized protein C20orf202 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov.
  3. "C20orf202 chromosome 20 open reading frame 202 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov.
  4. "Homo sapiens chromosome 20 open reading frame 202 (C20orf202), mRNA". 26 October 2019.
  5. "ExPASy - Compute pI/Mw tool". web.expasy.org. Retrieved 27 April 2020.
  6. "C20orf202 Gene - GeneCards | CT202 Protein | CT202 Antibody". www.genecards.org. Retrieved 27 April 2020.
  7. "MOTIF: Searching Protein Sequence Motifs". www.genome.jp. Retrieved 1 May 2020.
  8. "NPS@ : GOR4 secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 1 May 2020.
  9. "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 3 May 2020.
  10. "Genomatix: Genome Annotation and Browser: Query Input". www.genomatix.de. Retrieved 1 May 2020.
  11. "miRDB - MicroRNA Target Prediction Database". mirdb.org. Retrieved 1 May 2020.
  12. "NetPhos 3.1 Server". www.cbs.dtu.dk. Retrieved 1 May 2020.
  13. "NetGlycate 1.0 Server". www.cbs.dtu.dk. Retrieved 1 May 2020.
  14. "GDS3113 / 228292". www.ncbi.nlm.nih.gov. Retrieved 1 May 2020.
  15. "C20orf202 chromosome 20 open reading frame 202 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 1 May 2020.
  16. "GDS3113 / 228292". www.ncbi.nlm.nih.gov. Retrieved 1 May 2020.
  17. "C20orf202 Gene - GeneCards | CT202 Protein | CT202 Antibody". www.genecards.org. Retrieved 27 April 2020.
  18. "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 3 May 2020.
  19. "C20orf202 protein (human) - STRING interaction network". string-db.org. Retrieved 1 May 2020.
  20. "SNAPIN Gene - GeneCards | SNAPN Protein | SNAPN Antibody". www.genecards.org. Retrieved 1 May 2020.
  21. "HNRNPCL1 Gene - GeneCards | HNRC1 Protein | HNRC1 Antibody". www.genecards.org. Retrieved 1 May 2020.
  22. "GWAS Catalog". www.ebi.ac.uk. Retrieved 3 May 2020.
  23. Trachoo O, Assanatham M, Jinawath N, Nongnuch A (June 2013). "Chromosome 20p inverted duplication deletion identified in a Thai female adult with mental retardation, obesity, chronic kidney disease and characteristic facial features". European Journal of Medical Genetics. 56 (6): 319–24. doi:10.1016/j.ejmg.2013.03.011. PMID   23542666.