This article may be too technical for most readers to understand.(May 2020) |
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. [1] C16orf90's protein has four predicted alpha-helix domains [2] [3] [4] [5] and is mildly expressed in the testes [6] [7] and lowly expressed throughout the body. [8] While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays [9] and post-translational modification data. [10] [11]
C16orf90 or Chromosome 16 open reading frame 90 has no aliases [12] and spans 3169 nucleotides from 3,493,484 - 3,496,652 on the short arm of chromosome 16. [7] It is located in position 16p13.3 on the reverse strand. [1] There are 3 exons and the mRNA strand contains 972 base pairs. The C16orf90 protein is 182 amino acids in length. [13]
C16orf90 contains 3 exon regions and 2 intron regions. The exon boundaries occur between amino acids 30 & 31 and 147 & 148. [14] The first exon is poorly conserved, but exons 2 & 3 are highly conserved. [15]
C16orf90 has a molecular weight of 21 kDa and an alkaline isoelectric point of 9.2. [16] It is a soluble protein. [17]
There are 3 isoforms of C16orf90. [1] They are uncharacterized protein C16orf90 isoform a (197aa) producing all 3 exons, uncharacterized protein C16orf90 isoform b (175aa) producing the 2nd and 3rd exons, and uncharacterized protein C16orf90 isoform c (95aa) producing the last 95 amino acids of C16orf90. [1]
C16orf90 has relatively high tissue expression in the testes [6] [7] and very low (0.213) expression in all other tissues [8] in healthy humans. [9] Under stressful conditions, C16orf90 appears upregulated in graphs found at NCBI Geo. [9]
The nucleus is the most likely home of C16orf90's produced protein [18] and is not a transmembrane protein. [19] These results were verified by comparing the results of the homologous mouse and dolphin C16orf90 proteins.
The mRNA secondary structure found by RNAfold appeared to show medium to high affinity for the structure produced with stem-loop and hairpin turns. Only two areas indicated a low probability for the secondary structure produced. [20]
C16orf90's protein contains 4 alpha helices [4] and no beta-sheets with coiled-coils likely connecting the helices. [2] [3] These helices are approximately equally spaced across the protein. [5] A nuclear localization signal [21] was identified as well as four alpha-helix domains [22] which help determine C16orf90s secondary structure.
C16orf90's tertiary structure includes linear [5] alpha-helices separated by a disordered or coiled-coil region. [24]
Using the Genomatix [25] tool Gene2Promoter, C16orf90 was found to have 4 possible promoter sequences. The promoter set 3, GXP_644807, is the promoter for the reverse strand because it contained the most CAGE tags, aligned on the 5' end of the gene and contained the correct GeneID.
A nuclear localization signal (NLS) at the C-terminus of the protein from 173-197 supports the subcellular localization prediction. [26] [18]
Phosphorylation occurs at many amino acids on C16orf90. [10] The red markers on the protein schematic indicate likely phosphorylation sites. NetPhos, a phosphorylation site predictor, returned many sites including amino acids 16, 34, 56, 63, 67, 86, 130, 144, 147, 148, 150, 151, 152, 153, 165, 167, 174, 177, 189, and 191. [10]
A CTCF binding site (CCCTC-binding factor) is an 11-zinc finger transcription factor that generally represses transcription. [12] There is one indicated location for this binding site on the C16orf90 protein [27] and its effects could contribute to C16orf90's low expression levels.
O-GlcNAc sites inhibit phosphorylation. C16orf90 has two serine amino acids that are home to potential O-GlcNAc sites at 34 & 144. [11] O-GlcNAc sites compete with phosphorylation for control of the protein’s activation site so in C16orf90 this property might inactivate the protein until a severe circumstance when the protein is needed and then can be activated.
NetGlycate [28] (a glycation prediction tool) found 2 lysine residues at amino acids 70 (.709) and 158 (.595) that predict glycationsites. Glycation sites add sugars to lysines post-translationally and can be necessary for protein folding or stability [29]
There is a cleavage site located between 172R & 173K on C16orf90's protein. [21] [30] This location is also where the nuclear localization signal begins, indicating the NLS may be cleaved to possibly to remove the protein from the nucleus or when the protein requires degradation.
C16orf90 orthologs have a relatively high mutation rate as seen in the graph to the right comparing C16orf90 with fibrinopeptides, hemoglobin, and cytochrome C. [31]
The orthologs are sorted by increasing date of divergence and sequence similarity. C16orf90 is limited to mammals but is found in monotremes and marsupials indicating the gene entered the genome around 180 million years ago. [32]
Genus | Species | Common name | Taxonomic group | Date of divergence (MYA) | Accession number | Sequence length (AA) | Sequence identity to human | Sequence similarity to human |
---|---|---|---|---|---|---|---|---|
Homo | sapiens | Humans | Primates | 0.00 | XP_024306160.1 | 197 | 100.00% | 100.00% |
Gorilla | gorilla | Gorilla | Primates | 8.6 | XP_004057139 | 185 | 82.00% | 82.50% |
Mus | musculus | Mouse | Rodentia | 89 | NP_082760.2 | 171 | 63.50% | 66.50% |
Bison | bison | Bison | Even-Toed Ungulate | 94 | XP_010838682 | 186 | 65.50% | 69.00% |
Zalophus | californianus | Sea lion | Carnivora | 94 | XP_027973424.1 | 185 | 65.20 | 69.10% |
Canis lupus | familiaris | Dog | Carnivora | 94 | XP_003434913.2 | 214 | 64.50% | 69.60% |
Equus | caballus | Horse | Odd-Toed ungulate | 94 | XP_001502184.1 | 183 | 63.70% | 67.60% |
Sorex | araneus | Common shrew | Soricomorphas | 94 | XP_004600963.1 | 320 | 63.87% | 70.00% |
Acinonyx | jubatus | Cheetah | Carnivora | 94 | XP_026899211 | 225 | 61.90% | 67.30% |
Pteropus | vampyrus | Large flying fox | Chiroptera | 94 | XP_023376984.1 | 224 | 61.80% | 64.50% |
Lagenorhynchus | obliquidens | Pacific white-sided dolphin | Artiodactyla | 94 | XP_026974160 | 192 | 54.30% | 58.00% |
Dasypus | novemcinctus | Nine-banded armadillo | Cingulata | 102 | XP_004474400.1 | 185 | 61.30% | 67.20% |
Orycteropus | afer | Aardvark | Tubulidentata | 102.00 | XP_007937762.1 | 185 | 59.80% | 65.70% |
Monodelphis | domestica | Gray short-tailed opossum | Marsupial | 160.00 | XP_001363889.1 | 187 | 53.80% | 60.50% |
Phascolarctos | cinereus | Koala | Marsupial | 160.00 | XP_020851162.1 | 187 | 53.10% | 60.20% |
Ornithorhynchus | anatinus | Platypus | Monotreme | 180.00 | XP_016082126.2 | 216 | 33.90% | 40.40% |
In research, the sequence has been identified as containing a possible pathogenic recessive variant (K53N) for various intellectual disabilities among 31 others. [33] The protein is suspected to be an adaptor/cofactor that binds to other molecules. In this case a non-homologous substitution could change binding to other molecules and potentially cause intellectual disability, inguinal hernia, frontal upsweep of hair, macrotia, high palate, hypertonia, hyperreflexia, abnormality of the cerebrum, or vitamin D deficiency [33]
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Transmembrane protein 151A, also known as TMEM151A, is a protein that is encoded by the TMEM151A gene.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
C1orf122 is a gene in the human genome that encodes the cytosolic protein ALAESM.. ALAESM is present in all tissue cells and highly up-regulated in the brain, spinal cord, adrenal gland and kidney. This gene can be expressed up to 2.5 times the average gene in its highly expressed tissues. Although the function of C1orf122 is unknown, it is predicted to be used for mitochondria localization.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.
SMIM15(small integral membrane protein 15) is a protein in humans that is encoded by the SMIM15 gene. It is a transmembrane protein that interacts with PBX4. Deletions where SMIM15 is located have produced mental defects and physical deformities. The gene has been found to have ubiquitous but variable expression in many tissues throughout the body.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
C12orf54 is a protein in humans that is encoded by the C12orf54 gene.