C2orf16 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C2orf16 , chromosome 2 open reading frame 16 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | HomoloGene: 82476 GeneCards: C2orf16 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein (NCBI ID: CAH18189.1 [4] henceforth referred to as C2orf16) is 1,984 amino acids long. [5] The gene contains 1 exon and is located at 2p23.3. [6] Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence. [7]
68 orthologs are known for this gene, including in mice and sheep, but no paralogs have been found. [8]
The C2orf16 isoform 2 is a 6.2 kb, 1 exon gene at locus 2p23.3, and contains P-S-E-R-S-H-H-S repeats on the C-terminal side of the gene from amino acid 1,559 to 1,903. These repeats appear to have arisen from a transposable element. Primates show more P-S-E-R-S-H-H-S repeats than other mammalian orthologs do. [6]
C2orf16 is found to be highly expressed in the testes [9] and a retinoic acid and mitogen-treated human embryonic stem cell line, [10] but is not known to be expressed differently in age or disease phenotypes. [11] C2orf16 is also seen to have high expression in the pre-implantation embryo from the 4-cell embryo stage to the blastocyst stage. [12]
C2orf16 is not seen to have rapamycin sensitive expression. [13] C2orf16 is also seen to significantly increase expression in c-MYC knockdown breast cancer cells. [14]
Two isoforms exist of C2orf16. Isoform 1 is 5,388 amino acids long encoded in 5 exons over 16,401 base pairs. Isoform 2 uses an alternate start site of transcription and is considerably shorter at 1,984 amino acids long encoded in 1 exon over 6,200 base pairs. [8]
One miRNA is predicted to bind to the 3'UTR of C2orf16, accession number MI0005564. [15] [16]
C2orf16 has a predicted molecular weight of 224kD and a predicted isoelectric point of 10.08, [17] values that are relatively constant between orthologs. The protein includes higher than average composition of serine, histidine, and arginine and a lower than average composition of alanine. [18]
A positive charge cluster is found from amino acid residues 1,274 to 1,302. [18]
An arginine rich region is found from amino acids 1,545 to 1,933, a serine rich region is found from amino acids 1,568 to 1,934, and a histidine rich region is found from amino acids 1,630 to 1,853. [18]
A dot matrix analysis [19] reveals a heavily repeated region from approximately residue 1,500 to 1,984, this being the P-S-E-R-S-H-H-S repeat. a small band of dots at approximately amino acid 1,200 denotes a half repeat of the P-S-E-R-S-H-H-S sequence.
C2orf16 isoform 2 has no transmembrane domains, [20] and is predicted to be localized to the nucleus after translation due to two nuclear localization sequences predicted at residues 1,233 and 1,281. [21] No nuclear export sequence is conserved amongst orthologs, [22] suggesting C2orf16 is not meant to leave the nucleus after import. No N- or C- terminal modifications were predicted. [23] [24] [25] [26]
C2orf16 is predicted to be localized to the nucleus after transcription. [8]
The 3D structure of C2orf16 is predicted to have three major domains. Domain 1 is from amino acids 1 to 662, domain 2 is from amino acids 674 to 1,487, and domain 3 is from amino acids 1,488 to 1,984. [27] Domain 1 and 2 are predicted to be connected via a stretch of 12 amino acids not otherwise organized into a secondary structure allowing flexibility between domains 1 and 2. Domain 2 is predicted to have protein interacting domains for transcription factors. [27] Domain 3 is predicted to follow a "balls on a string" structure [27] and has many sites for possible phosphorylation. [28]
C2orf16 has been shown to have a physical interaction with proto-oncogene Myc by tandem affinity purification. [29]
68 orthologs are known for C2orf16. [8] The protein seems to have appeared in the mammalian evolutionary history 320 million years ago, around the divergence of mammals from reptiles. This history would explain why orthologs do not exist in amphibians, reptiles, birds, nor other more distantly related species. [30]
Any orthologs from species more distant from humans than other mammals are likely not related in function, however, the P-S-E-R-S-H-H-S repeat is present in bony fishes, crustaceans, stramenopiles including potato blight, plantae, and prokaryotes. [30]
The transposon repeat may have been reintroduced to mammals by a viral vector.
The P-S-E-R-S-H-H-S repeat sequence is seen to be conserved in orthologs for C2orf16, and is conserved in organisms as distantly related as oomycete slime mold [31] and plants including the chloroplasts of Ashby's Wattle. [32] The S-P-S-E-R portion of the repeat is seen to be the most important for conservation, as seen by alignment with these orthologs and by creation of a Logo. [33]
The conservation analysis of the repeat shows the initial S-P-S is highly conserved, possibly for phosphorylation(S) and structure(P), and the R is almost completely conserved, mutating to a Lysine in some orthologs, [32] implying the positive charge is necessary for the purpose of the repeat.
The 3D shape of the repeat sequence is unclear as it has been predicted to be either balls-on-a-string [34] or an antiparallel beta-sheet [6] structure.
C2orf16 isoform 2 is predicted to have a possible function in mitosis regulation through its nuclear localization, [8] [21] predicted transcription factor binding site, [27] physical association with Myc, [29] and increased expression in c-MYC knockdown breast cancer cells. [14]
There are four patents on record for C2orf16, one each involving: cancerous PPP2RIA and ARID1A mutations, [35] Alzheimer's predisposition, [36] viral vaccine diversity, [37] and copy number variation relation to common variable immunodeficiency. [38] C2orf16 is also shown to have increased expression in some breast cancer lines, [14] as well as being involved with Myc [29] which is a common oncogene, making C2orf16 a possible oncogene to target in cancer treatments.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.
Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.
DEP Domain Containing Protein 1B also known as XTP1, XTP8, HBV XAg-Transactivated Protein 8, [formerly referred to as BRCC3] is a human protein encoded by a gene of similar name located on chromosome 5.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.
Zinc Finger Protein 821, also known as ZNF821, is a protein encoded by the ZNF821 gene. This gene is located on the 16th chromosome and is expressed highly in the testes, moderately expressed in the brain and low expression in 23 other tissues. The protein encoded is 412 amino acids long with 2 Zinc Finger motifs and a 23 amino acid long STPR domain.
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of 5 transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.
Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
{{cite journal}}
: Cite journal requires |journal=
(help)[ permanent dead link ]{{cite journal}}
: Cite journal requires |journal=
(help)[ permanent dead link ]{{cite journal}}
: Cite journal requires |journal=
(help)[ permanent dead link ]{{cite journal}}
: Cite journal requires |journal=
(help)[ permanent dead link ]