Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. [1] This protein has a function that is not yet fully understood by the scientific community.
Aliases of this gene include MGC41906 and LOC147685. [1] The gene is located on chromosome 19 at 19q13.43. [2] The gene spans from 58,485,905 bp to 58,469,805 bp on the minus strand and contains 6 exons and 5 introns. [1] Transcription of this gene produces one spliced mRNA which codes for the protein c19orf18.
C19orf18 is ubiquitously expressed at moderate levels. [1] In humans, there is higher expression in the testis, prostate, lung, liver, pancreas, uterus, heart, and other connective tissues. [3] [4]
There are no known paralogs of this gene in the human genome. [5]
The gene is exclusive to mammals. [1] The transmembrane domain is the most conserved region among close orthologs and distant homologs. The following table presents some of the orthologs found using searches in BLAST. [6] This list does not contain all of the orthologs for c19orf18. It is meant to display the diversity of species for which orthologs are found. They are sorted by date of divergence and then protein similarity.
Species | Date of Divergence (MYA) | Accession Number | Sequence length (aa) | Identity | Similarity |
---|---|---|---|---|---|
Homo sapiens (Humans) | 0 | NP_689687.1 | 215 | 100% | 100% |
Pongo abelii (Orangutan) | 15.2 | XP_002829939.1 | 216 | 92% | 94% |
Rhinopithecus roxellana (Golden snub-nosed Monkey) | 28.1 | XP_010385277.1 | 216 | 84% | 90% |
Carlito syrichta (Philippine tarsier) | 66.7 | XP_008066887.1 | 217 | 70% | 81% |
Otolemur garnettii (Galago) | 73 | XP_012663984.1 | 183 | 50% | 62% |
Mus musculus (Mouse) | 88 | XP_017167821.1 | 183 | 46% | 63% |
Oryctolagus cuniculus (European rabbit) | 88 | XP_008247222.1 | 242 | 49% | 62% |
Rhinolophus sinicus (Horseshoe bat) | 94 | XP_019567114.1 | 284 | 70% | 82% |
Vicugna pacos (Alpaca) | 94 | XP_015107013.1 | 214 | 65% | 80% |
Canis lupus familiaris (Dog) | 94 | XP_005616108.1 | 223 | 49% | 61% |
Bos taurus (Cow) | 94 | XP_015313970.1 | 250 | 44% | 53% |
Ornithorhynchus anatinus (Platypus) | 169 | XP_007664656.1 | 308 | 34% | 57% |
The coding sequence contains 215 amino acids. The molecular weight of c19orf18 is 24.151 kdal and the isoelectric point for the unphosphorylated state is 9.06. [7] The protein sequence is rich in leucine and is deficient in tryptophan, cysteine, and tyrosine. There is a negative charge cluster from amino acid 149 to 172. [8]
There is a cross-program consensus between GOR4, CFSSP, and PHYRE2 that the protein structure contains mostly coiled regions and alpha helices. [9] [10] [11]
The protein sequence is predicted to contain a signal peptide (1 aa to 24 aa), an extracellular domain (25 aa to 100 aa), a transmembrane domain (101 aa to 121 aa), and a cytoplasmic domain (122 aa to 215 aa). [12]
PSORTII and CELLO predicted that the human protein would localize to the plasma membrane and part of it would be in the extracellular region. [13] [14] Immunofluorescent staining of human cell line U-2 OS shows localization to the Golgi apparatus. [15]
C19orf18 protein has been predicted to interact with several proteins listed in the table below. The interactions have been identified and verified through affinity capture-MS. [16]
Predicted interacting protein name | Score | Experimental verification |
---|---|---|
Nedd4 family interacting protein 1 | 0.9165 | Affinity capture-MS |
Activin A receptor, type IIA | 0.7829 | Affinity capture-MS |
Syntaxin 6 | 0.9679 | Affinity capture-MS |
Bone morphogenetic protein receptor type 1A | 0.8914 | Affinity capture-MS |
Fibroblast growth factor receptor 2 | 0.8789 | Affinity capture-MS |
Microfibrillar-associated protein 3 | 0.8756 | Affinity capture-MS |
C19orf18 protein interacts with Nedd4 family interacting protein 1 (NDFIP1) which promotes pancreatic beta cell death reduces insulin secretion. [17] Activin A receptor type 2A (ACVR2A) is a transmembrane receptor that is involved in ligand-binding and mediates the functions of activins. [18] Syntaxin 6 functions in trans-Golgi network vesicle trafficking, perhaps targeting to endosomes in mammalian cells. [19] Bone morphogenetic protein receptor type 1A(BMPR1A) is expressed almost exclusively in skeletal muscle and is a transcriptional regulator. [20] Fibroblast growth factor receptor 2 (FGFR2) plays an essential role in the regulation of osteoblast differentiation, proliferation and apoptosis, and is required for normal skeleton development. [21] Microfibrillar-associated protein 3 (MFAP3) has a function that is not fully understood but may be involved in nuclear signaling and may play a role in metastasis. [22]
The c19orf18 protein is down-regulated in pancreatic cancer [23] and contains CpG sites found to be replicated for association with epithelial ovarian cancer risk. [24] The gene also decreases in expression in teratozoospermia [25] and increases in expression in polycystic ovary syndrome. [26] The gene may also be involved in prostate cancer and various tumors [3]
Protein KIAA1958 is a protein that in humans is encoded by the KIAA1958 gene. Orthologs of KIAA1958 go as far back in evolution to chordates, although, it is closer in homology to primates than any other orthologs. KIAA1958 has no known paralogs.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.
C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Transmembrane and coiled-coil domains 4, TMCO4, is a protein in humans that is encoded by the TMCO4 gene. Currently, its function is not well defined. It is transmembrane protein that is predicted to cross the endoplasmic reticulum membrane three times. TMCO4 interacts with other proteins known to play a role in cancer development, hinting at a possible role in the disease of cancer.
Cilia and flagella associated protein 157 (CFAP157) also known as chromosome 9 open reading frame 117 (c9orf117) is a protein that in humans is encoded by the CFAP157 gene.
Golgin subfamily A member 8H, also known as GOLGA8H, is a protein that in Homo sapiens is encoded by the GOLGA8H gene. Function of the GOLGA8H involves a process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of the Golgi apparatus.
Transmembrane protein 125 is a protein that, in humans, is encoded by the TMEM125 gene. It has 4 transmembrane domains and is expressed in the lungs, thyroid, pancreas, intestines, spinal cord, and brain. Though its function is currently poorly understood by the scientific community, research indicates it may be involved in colorectal and lung cancer networks. Additionally, it was identified as a cell adhesion molecule in oligodendrocytes, suggesting it may play a role in neuron myelination.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
C4orf36 is a protein that in humans is encoded by the c4orf36 gene.
Chromosome 20 open reading frame 85, or most commonly known as C20orf85 is a gene that encodes for the C20orf85 Protein. This gene is not yet well understood by the scientific community.
C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.
Transmembrane Protein 144 (TMEM144) is a protein in humans encoded by the TMEM144 gene.
Transmembrane protein 248, also known as C7orf42, is a gene that in humans encodes the TMEM248 protein. This gene contains multiple transmembrane domains and is composed of seven exons.TMEM248 is predicted to be a component of the plasma membrane and be involved in vesicular trafficking. It has low tissue specificity, meaning it is ubiquitously expressed in tissues throughout the human body. Orthology analyses determined that TMEM248 is highly conserved, having homology with vertebrates and invertebrates. TMEM248 may play a role in cancer development. It was shown to be more highly expressed in cases of colon, breast, lung, ovarian, brain, and renal cancers.