C5orf22 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C5orf22 , chromosome 5 open reading frame 22 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1925127 HomoloGene: 10149 GeneCards: C5orf22 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens . [5] The primary alias is unknown protein family 0489 (UPF0489). [5]
C5orf22 is located on the positive strand of Chromosome 5 at 5P13.3, spanning 22,779 nucleotides, from base pair 31532275 to 31555053. [6] C5orf22 encodes 9 total exons and contains 7 isoforms. [5] Isoform variants differ in their exon configuration and untranslated region. Transcript variant 1 is the canonical isoform, encoding 442 amino acids across 9 exons. [7]
C5orf22 displays ubiquitous RNA expression across tissue types from all 3 germ layers and from all phases of development in humans, mice, chickens, and zebrafish. [5] There are statistically significant differences in RNA expression between select tissues, with skeletal muscle containing the greatest abundance (7.8 RPKM) [5] [9]
C5orf22 contains 1 predicted promoter directly upstream of the gene (GXP_55076). [8] This promoter is 1,081 base pairs and partially overlaps with the 5’ untranslated region. [8] GXP_55076 is assigned to all transcript variants. [8] Transcription factor binding elements consist of TATA box binding elements, SMAD transcription factors, MAF/AP1 binding factors, and several others. [8]
C5orf22 closest neighboring element is Drosha, a ribonuclease which is encoded by the minus strand proximal to C5orf22. [5] [10] Drosha is a double stranded endoribonuclease that assists with the first step of microRNA biogenesis. [11]
C5orf22 contains 2 globular domains and 3 small disordered regions. [12] The molecular-weight is approximately 50 kDa. [13] The isoelectric point is 4.7. [13] C5orf22 contains relatively average amino acid proportions compared to most proteins. [14] There were no significant outliers in abundance of individual amino acids. C5orf22 contains several predicted post-translational modifications including phosphorylation sites, ubiquitination sites, glycosylation sites, SH2 domain, and a myristylation site. [12]
C5orf22 is most likely to exist as a soluble protein located within the cytoplasm and nucleus. [15] Amino acid sequence predictions and immunohistochemical staining support the localization of C5orf22 to cytoplasm and nucleus. [9] [16] Furthermore, amino acid sequence analysis indicated a predicted partial nuclear localization signal (NLS) from AA 175-185. [17]
The precise function of C5orf22 is still unknown however it is hypothesized to be a component of a DNA splicing complex. [18] Proteomic research implicated the protein product as a novel component of the WBP11/PQBP1 splicing complex which regulates expression of genes involved in a spectrum of processes ranging from DNA repair to immunomodulation. [18] C5orf22 knockdown was associated with downregulation of alternative splicing events that led to aberrant gene expression of select genes and ultimately cell cycle dysfunction. [18] Cell localization evidence and the presence of a NLS further support this hypothesized function.
Experimental evidence has indicated over 20 interactors with C5orf22. [19] [20] [21] Interactants are localized to both the nucleus and cytoplasm. [22] The most likely interactors are WBP11, OSM, Surf2, ELOF1, and DDITL4. [20]
C5orf22 initially appeared in invertebrates approximately 797 million years ago. [23] It is the only member of its gene family. Human UPF0489 C5orf22 is conserved through invertebrates. [23] C5orf22 orthologs showed conservation of the two globular domains through bony fish and conservation of 1 globular domain within arthropods. [12] Isoelectric point and molecular weights of C5orf22 orthologs were within ∓ 0.15 and ∓ 3kDa through bony fish. [12] There are no paralogs to c5orf22 in humans. [23]
UPF0489 C5orf22 is slow evolving protein, based on comparisons of the percent corrected divergence of orthologous proteins. [24]
Taxonomic Class | Common Name | Genus species | Date of Divergence Millions of Years Ago (MYA) | Sequence Identity (%) | Sequence Similarity (%) | Sequence Length (AA) | Query Coverage (%) | Accession Number |
---|---|---|---|---|---|---|---|---|
Mammal | Human | Homo sapiens | N/A | 100 | 100 | 442 | 100 | NP_060826.2 |
Mouse | Mus musculus | 90 | 78 | 86 | 442 | 100 | NP_084274.1 | |
Whale | Balaenoptera musculus | 96 | 89 | 94 | 467 | 100 | XP_036705025.1 | |
Aves | Chicken | Gallus gallus | 312 | 68 | 79 | 446 | 98 | XP_418996.3 |
Reptile | Tiger rattlesnake | Crotalus tigris | 312 | 65 | 75 | 476 | 98 | XP_039212189.1 |
Amphibian | African clawed frog | Xenopus laevis | 352 | 67 | 78 | 459 | 95 | XP_018121838.1 |
Fish | Zebrafish | Danio rerio | 435 | 57 | 71 | 439 | 95 | NP_956625.1 |
Sea lamprey | Petromyzon marinus | 615 | 51 | 69 | 589 | 89 | XP_032827184.1 | |
Invertebrate | Fruit fly | Drosophila suzukii | 797 | 33 | 50 | 481 | 95 | XP_036671373.1 |
Recent studies on miRNA's role in breast cancer pathogenesis has correlated upregulation of C5orf22 with reduced survival of breast cancer patients. [26]
Patient's with tibial muscular dystrophy, exhibit decreased expression of C5orf22. [27] Patient's with non-ischemic cardiomyopathy exhibit increased expression of C5orf22.
Transmembrane protein 151B is a protein that in humans is encoded by the TMEM151B gene.
Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
SMIM15(small integral membrane protein 15) is a protein in humans that is encoded by the SMIM15 gene. It is a transmembrane protein that interacts with PBX4. Deletions where SMIM15 is located have produced mental defects and physical deformities. The gene has been found to have ubiquitous but variable expression in many tissues throughout the body.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
Chromosome 3 open reading frame 38 (C3orf38) is a protein which in humans is encoded by the C3orf38 gene.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.
Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.
Zinc Finger Protein 62, also known as "ZNF62," "ZNF755," or "ZET," is a protein that in humans is encoded by the ZFP62 gene. ZFP62 is part of the C2H2 Zinc Finger family of genes.