C20orf202 | |||||||
---|---|---|---|---|---|---|---|
Chromosome 20 | |||||||
Identifiers | |||||||
Symbol | C20orf202 | ||||||
NCBI gene | 400831 | ||||||
HGNC | 37254 | ||||||
RefSeq | NM_001009612.3 | ||||||
UniProt | A1L168 | ||||||
|
C20orf202 (chromosome 20 open reading frame 202) is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta. [1]
C20orf202 is located on the plus strand of chromosome 20 at 20p13. [2] The gene is 4,826 base pairs long. It spans from chr20:1,184,098-1,188,918, and contains 2 exons. [3]
There is one transcript of C20orf202. The mRNA sequence is 1,609 base pairs long. [4]
The protein encoded by C20orf202 is 122 amino acids in length with a predicted molecular mass of 13591 Da and a predicted isoelectric point of 9.13 pl/MW. [5] [6] C20orf202 contains PFAM domain DUF3461 at amino acids 3-67. [7] This domain codes for a protein of unknown function. The structure of C20orf202 consists of 46.72% random coils, 42.62% alpha helixes, and 10.66% extended strands. [8]
The C20orf202 promoter has many transcription factor binding sites, most notably at the beginning and end of the promoter. These sites are shown in the figure to the right and are listed below with their respective functions. [10]
MicroRNA binding sites are only found in the 3' UTR of C20orf202. Most of these sites are found in the beginning or end of the 3' UTR, with many located in close proximity to each other. [11] These 3' UTR microRNA binding sites can be seen in the figure to the right.
C20orf202 has many phosphorylation and glycosylation sites throughout the protein. [12] [13] A few of the phosphorylation sites are located in highly conserved regions of the protein.
In humans, C20orf202 has moderate mRNA abundance across cells types, though higher than average expression in the kidney and heart, it is not significantly so. [14] Additionally C20orf202 expression increases during fetal development of kidney, lung, and intestinal tissues. [15]
C20orf202 has three known paralogs- FAM167a, FAM167b, and AARD. [17]
C20orf202 orthologs can be found in major groups such as mammals, reptiles, amphibians, and distantly fish.
This table lists several orthologs for C20orf202 including genus and species, common name, taxonomic group, evolutionary date of divergence, accession number, sequence length, sequence identity, and sequence similarity.
Two hybrid prey pooling followed by two hybrid array revealed that C20orf202 is predicted to interact with two other proteins: SNAPAP and HNRNPCL1. [19] SNAPAP (SNARE-associated protein) has a role in the SNARE binding complex, and is also associated with Hermansky–Pudlak syndrome and tricuspid valve stenosis. [20] HNRNPCL1 (Heterogeneous Nuclear Ribonucleoprotein C Like 1) has a role in RNA binding and nucleosome assembly. [21]
C20orf202 has been associated by GWAS to multiple sclerosis. [22] Additionally, through in silico analysis, C20orf202 was identified as being involved in chromosome 20p inv dup del syndrome, a syndrome similar to trisomy 20p. [23]
PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating possibility of regulated alternate expression.
FAM231B, or family with sequence similarity 231B, is a protein found in humans and is encoded by FAM231B gene. Orthologs of FAM231B are only found back to primates.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C3orf56 is a protein encoding gene found on chromosome 3. Although, the structure and function of the protein is not well understood, it is known that the C3orf56 protein is exclusively expressed in metaphase II of oocytes and degrades as the oocyte develops towards the blastocyst stage. Degradation of the C3orf56 protein suggests that this gene plays a role in the progression from maternal to embryonic genome and in embryonic genome activation.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.