C20orf96 (Chromosome 20 open reading frame 96) is a protein-coding gene in humans. [1] It codes for an unknown protein known as uncharacterized protein C20orf96, predicted to be a nuclear protein. The function and biological processes of the gene is not well understood by the scientific community yet.
C20orf96 is also known by the alias DJ1103G7.2. Orthologs found in other organisms are known as C20orf96 homolog isoform X [Species].
C20orf96 is located on the short arm of chromosome 20 at 20p13, base pairs 270,863 to 290,778 on the complementary strand. It is a member of the DUF4618 superfamily, with the DUF position being from amino acid 104 to 363. [1] Neighboring genes are DEFB129, DEFB132, ZCCHC3, and NRSN2-AS1.
Aceview states that the gene is expressed 2.2 times the amount an average gene is expressed, and the sequence has been seen in the brain, testis, uterus, kidney, thymus, breast, kidney tumor, and 84 other places. [2] C20orf96 has been found to bind to transcription factor binding sites AREB6, GATA-1, GATA-2, GATA-3, ATF6, c-Myc, Max, CHOP-10, AMLa, and C/EBPalpha. [3] TYW5 and ALOXE3 are two proteins that interact with C20orf96. TYW5 is a protein coding gene and also incorporates tRNA modification in the nucleus. [4] ALOXE3 is also a protein coding gene, more focusing on Prostaglandin 2 biosynthesis. [5]
C20orf96 can be split up into five different variants. All variants code for similar proteins. The variant used for this article was Variant 1. The splicing is shown below.
Exon | Exon Size (bp) |
---|---|
1 | 18 |
2 | 48 |
3 | 117 |
4 | 117 |
5 | 159 |
6 | 99 |
7 | 156 |
8 | 102 |
9 | 171 |
10 | 117 |
11 | 60 |
The protein made by C20orf96 is 363 amino acids in length. The predicted molecular weight is 42.9 kdal, and the isoelectric point is 8.99. [6] [7]
The ratios of this protein are similar to others. The unusual amounts of amino acids are found as alanine, glutamic acid, glycine, and glutamine. Glutamic acid and glutamine both have a slightly larger amount than normal, alanine is slightly below normal, and glycine is low compared to other proteins.
A- : 13( 3.6%) | C : 4( 1.1%) | D : 15( 4.1%) | E+ : 38(10.5%) | F : 10( 2.8%) |
---|---|---|---|---|
G--: 4( 1.1%) | H : 9( 2.5%) | I : 17( 4.7%) | K : 34( 9.4%) | L : 40(11.0%) |
M : 13( 3.6%) | N : 10( 2.8%) | P : 19( 5.2%) | Q+ : 35( 9.6%) | R : 24( 6.6%) |
S : 25( 6.9%) | T : 21( 5.8%) | V : 22( 6.1%) | W : 4( 1.1%) | Y : 6( 1.7%) |
C20orf96 has a neutral charge, with no additional positive, negative, or mixed charge clusters.
NetPhos predicts 10 serine sites, 4 threonine sites, and 3 tyrosine sites over the 363 amino acids. [8] No signal peptides were detected. There were also no transmembrane sequences.
Most of the structure is made up of alpha-helices, with short coils on both ends. At the end of the sequence, there is also a small, four amino acid beta-strand.
There are no known crystal structures for this protein. The tertiary structure is mainly composed of coiled-coil regions.
C20orf96 is found in many eukaryotes. Orthologs have been found in most organisms in the kingdom animalia, with the lineage going back to the phylum Chordata. No paralogs for C20orf96 have been found.
The most prevalent mutations of C20orf96 in humans are M1V, Q6K, T13I, T13S, M94I, S117G, S117T, S117N, L142V, D159N, M215T, D238E, R251C, R251G, Q263E, R279C, E304G, I305F, M313I, R328W, R342I, and L353V. The format these mutations are given in are the amino acid that is supposed to be there, the amino acid number, and then the mutation.
C20orf96 has been linked to Moyamoya Disease, increasing the odds of contracting the disease by 4.5 times. [9] It is also associated with lymph node metastasis in colorectal cancer. [10] A study found that C20orf96 has shown positive selection for containing a hub gene, which is a gene that is ranked in the top 20 for connectivity. [11]
Transmembrane protein 134 is a protein encoded by the TMEM134 gene. TMEM134 does not have any other known aliases. There are two transmembrane domains and a domain of unknown function (DUF872). Evolutionary, the majority of the organisms that have this gene are primates and mammals, although there are some organisms dating back to Drosophila and C. elegans. Through current research, there has not been any confirmed function of TMEM134.
Family with sequence similarity 63, member A is a protein that, in humans, is encoded by the FAM63A gene. It is located on the minus strand of chromosome 1 at locus 1q21.3.
Transmembrane protein 261 is a protein that in humans is encoded by the TMEM261 gene located on chromosome 9. TMEM261 is also known as C9ORF123 and DMAC1, Chromosome 9 Open Reading Frame 123 and Transmembrane Protein C9orf123 and Distal membrane-arm assembly complex protein 1.
Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.
Vexin is a protein encoded by VXN gene. VXN is found to be highly expressed in regions of the brain and spinal cord.
PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.
OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
C12orf66 is a protein that in humans is encoded by the C12orf66 gene. The C12orf66 protein is one of four proteins in the KICSTOR protein complex which negatively regulates mechanistic target of rapamycin complex 1 (mTORC1) signaling.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.