This article cites Wikipedia (or sources that take information from Wikipedia), in a circular manner .(May 2020) |
C14orf119 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C14orf119 , chromosome 14 open reading frame 119 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1920893 HomoloGene: 9921 GeneCards: C14orf119 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. [5] Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals. [6] [7]
The common aliases of c14orf119 are chromosome open reading frame 119 and My028. [8] The gene is located on chromosome 14, with the specific location of 14q11.2. [9] It contains two exons and covers 5.76 kb, from 23563900 to 23569660 on the forward strand. [10] The span of the c14orf119 gene, from the start of transcription to the polyA site, is 4951 basepairs in length. [11]
The c14orf119 mRNA is composed of 2914 basepairs. [9] C14orf119 has two isoforms, shown in the table below.
Name | Accession Number [13] | Transcript ID | Length |
---|---|---|---|
C14orf119-201 | NM_017924.4 | ENST00000319074.6 | 2914 nt |
C14orf119-202 | XM_017021390.2 | ENST00000554203.1 | 725 nt |
The c14orf119 protein is composed of 140 amino acids. [14] The molecular weight of the c14orf119 protein is approximately 16 kDa and the basal isoelectric point is 4.86. [15] There is a long section of hydrophobic amino acids at the start of the protein. [16] There are no additional significant compositional features of the c14orf119 protein, including charge clusters, charge runs, patterns, repetitive structures or multiplets. [17] The primary sequence of the c14orf119 protein is as follows,
MPLESSSSMP LSFPSLLPSV PHNTNPSPPL MSYITSQEMK CILHWFANWS GPQRERFLED LVAKAVPEKL
QPLLDSLEQL SVSGADRPPS IFECQLHLWD QWFRGWAEQE RNEFVRQLEF SEPDFVAKFY QAVAATAGKD [18]
There are two known c14orf119 protein isoforms, as shown in the table below.
Name | Accession Number | Size | Domain Inclusion |
---|---|---|---|
Uncharacterized c14orf119 protein | NP_060394.1 | 140 aa | DUF4508 |
Uncharacterized c14orf119 protein isoform X1 | XP_016876879.1 | 140 aa | DUF4508 |
There is a domain of unknown function (DUF) found in the c14orf119 protein: DUF4508 (with an E-value of 6.3e-36). [20] This DUF is a part of a family of proteins that is found in eukaryotes and is typically between 117 and 253 amino acids in length. [21] Additionally, there are three predicted CK2 phosphorylation sites (at positions 36, 83, and 121) within the c14orf119 protein. [22]
The predicted secondary structure of the c14orf119 protein is largely alpha helical in content. The specific makeup of the secondary structure is as follows, alpha helices make up 38.57% of the protein (54 amino acids), extended strands make up 23.57% of the protein (33 amino acids), and random coils make up 37.86% of the protein (53 amino acids). [24] Phryre2, a program for protein modeling, prediction, and analysis, was used to determine and model the predicted structure of the c14orf119 protein. [25] Shown in Figure 1, Phyre2 created a model for the predicted structure of 106 (out of a total of 140) residues of the c14orf119 protein, with 79.7% confidence and 76% coverage. [25]
With only two cysteines, 52 amino acids apart, found in the c14orf119 protein sequence, there were no predicted disulfide bonds in the c14orf119 protein. [26] [17] There are no predicted transmembrane regions or signal peptides in the c14orf119 protein. [27] [28] [29]
The predicted promoter sequence associated with c14orf119 is 3332 bases in length. [30] This promoter sequence has one CpG island associated with it, with a CpG count of 78 [30] Additionally, there are a number of transcription factor binding sites associated with this promoter sequence, such as RB1, HNF4A, ETS1, and RBL2. [31]
C14orf119 is expressed in 203 organs. [32] The c14orf119 gene is expressed in a number of tissues and has the highest expression rates in cultured fibroblast cells, with a TPM of 75.63. [33] There is notable decreased expression of c14orf119 in the following tissues, pancreas, bone marrow, brain, salivary glands, and the liver. [19] [34] Additionally, there is notable increased expression of c14orf119 in the adrenal gland, kidney, lung, prostate, thymus, white blood cells, lymph node, and thyroid. [19] Finally, expression levels of c14orf119 decrease with the development of the kidney and increases with development of the stomach. [19]
There were no predicted enhancers associated with c14orf119. [31] There were a number of stem loop formation predictions in both the 5' UTR and 3' UTR of c14orf119. [35]
The miRNA binding sites found in the 3' UTR of c14orf119 include miR-489, miR-1872, and miR-4778-3p; however, there were no miRNA binding sites found in the 5' UTR of c14orf119. [36]
The c14orf119 protein is predicted to be located in the nucleus, with a reliability score of 55.5. [5] However, the protein has a 7.9% basic residue content and a nuclear localization signal (NLS) score of -0.47. [37] Additionally, there was a predicted ER retention motif at positions 136-139 of the protein. [37] Finally, there were no N-terminal signal peptides, no cleavage sites for mitochondria, no actinin-type actin-binding motifs, and no N-myristolyation pattern. [5]
There are a number of post-translational modifications of the c14orf119 protein, all of which are shown on the conceptual translation of c14orf119 in Figure 2.
There are predicted ubiquitination sites at lysine residues at positions 128 and 139. [44]
There are predicted kinase-specific phosphorylation sites at serines at the following position in the c14orf119 protein sequence, 15, 19, 27, 32, 36, 81, 83, 90, and 121. [43] [45] Protein phosphorylation at serine residues can play critical roles in the regulation of protein function and the transmission of signals throughout the cell. [46]
There are two N-glycosylation sites at positions 25-27 and 48–50. [42] This type of post-translational modification plays important roles in both the structure and function of some eukaryotic proteins.
Additionally, there are predicted glycation of epsilon amino groups of lysines at the following positions, 40, 64, 69, and 139. [41] Glycation is a process in which proteins react with reducing sugar molecules, which ultimately impairs the function and changes the characteristics of the protein. [47]
There are also predicted mammalian mucin type GalNAc-O-glycosylation sites at the following positions, 5, 6, 7, 12, 15, 19, and 24. [40] GalNAc-type-O-glycosylation is the attachment of a sugar molecule to the oxygen atom of serine or threonine residues in a protein. [48] O-glycans or the sugars added to the serine or threonine, have various functions, including allowing recognition of foreign material, providing cartilage and tendon flexibility, controlling cell metabolism, and trafficking cells in the immune system. [49]
There is a predicted SUMOylation sites at the lysine at position 139. [39] SUMOylation is involved in transcriptional regulation, protein stability, apoptosis, nuclear-cytosolic transport, progression through the cell cycle, and response to stress. [50]
Finally, there are predicted O-GlcNAc sites at the serines at the following position in the c14orf119 protein, 5, 6, 7, 8, and 83. [38] This post-translational modification can play various critical roles such as, progression through the cell cycle, response to cellular stress, protein turnover, and protein stability. [51]
There are varying levels of H3K27ac, H3K4me1, and H3K4me3 throughout the c14orf119 gene. [31] H3K4me1 has variation in signal strength among different cell lines, which may reflect differences of epigenetic landscapes in these cell lines. [31] Additionally, there is a strong signal of H3K27ac across the majority of cell lines along the predicted promoter region. [31] Finally, there is also a strong signal of H3K4me3 across the majority of the cell types along the predicted promoter region, with no signal variation across cell types. [31]
C14orf119 is conserved in both vertebrates and invertebrates, however, it is not conserved in bacteria, archaea, trichoplax, plants or fungi. [52] The c14orf119 gene is highly conserved in the mammalian orthologs, however, within the non-mammalian orthologs, there are various insertions, especially at the beginning and end of the gene. [52] This gene does not contain any paralogs or paralogous domains. [52]
As shown in Figure 3, the c14orf119 gene has evolved moderately quickly when compared to cytochrome c, fibrinogen alpha chain, and hemoglobin. It has evolved faster than both hemoglobin and cytochrome c, but slower than fibrinogen alpha chain.
The table below reveals the various orthologs of the c14orf119 protein. This table includes the date of divergence (DoD) from humans, in million years ago (MYA), accession number, and percent identity and similarity to humans for each ortholog.
Genus and Species | Common Name | Taxonomy - Class | Taxonomy - Order | DoD (MYA) | Accession Number | Sequence Length (aa) | Percent Identity | Percent Similarity |
---|---|---|---|---|---|---|---|---|
Homo sapiens | Human | Mammalia | Primates | 0 | NP_060394.1 | 140 | 100 | 100 |
Mus musculus | Mouse | Mammalia | Rodentia | 89 | NP_067412.1 | 142 | 83.1 | 90.1 |
Myotis brandtii | Brandt's Bat | Mammalia | Chiroptera | 94 | XP_005852873.1 | 141 | 86.5 | 90.8 |
Callorhinus ursinus | Northern Fur Seal | Mammalia | Carnivora | 94 | XP_025726115.1 | 142 | 88 | 91.5 |
Bos taurus | Cattle | Mammalia | Artiodactyla | 94 | XP_002690553.1 | 142 | 88 | 92.3 |
Orycteropus afer afer | Aardvark | Mammalia | Tubulidentata | 102 | XP_007949377.1 | 140 | 85.1 | 89.4 |
Python bivittatus | Burmese Python | Reptilia | Squamata | 318 | XP_007441564.1 | 156 | 47.8 | 60.2 |
Podarcia muralis | Common Wall Lizard | Reptilia | Squamata | 318 | XP_028559108.1 | 115 | 51.8 | 63.8 |
Nanorana parkeri | High Himalaya Frog | Amphibia | Anura | 351.7 | XP_018411628.1 | 115 | 45.7 | 60.7 |
Larimichthys crocea | Marine Fish | Actinopterygii | Perciformes | 433 | XP_010740478.3 | 201 | 34.5 | 44.3 |
Aethina tumida | Small Hive Beetle | Insecta | Coleoptera | 736 | XP_019869014.1 | 124 | 18.6 | 39.7 |
Bombus terrestris | Buff-Tailed Bumblebee | Insecta | Hymenoptera | 736 | XP_020718687.1 | 125 | 19 | 36.1 |
Photinus pyraliis | Common Eastern Firefly | Insecta | Coleoptera | 736 | XP_031358233.1 | 128 | 19.9 | 40.4 |
Pieris rapae | Cabbage White Butterfly | Insecta | Lepidoptera | 736 | XP_022116245.1 | 180 | 20 | 38.4 |
Nasonia vitripennis | Small Parasitoid Wasp | Insecta | Hymenoptera | 736 | XP_031785555.1 | 121 | 22.4 | 42.1 |
Biomphalaria glabrata | Freshwater Snail | Gastropoda | Basommatophora | 736 | XP_013090201.1 | 113 | 31.7 | 46.2 |
Aplysia californica | California Seahorse | Gastropoda | Anaspidea | 736 | XP_005112416.1 | 112 | 32.6 | 47.9 |
The function of the c14orf119 protein is not yet well understood by the scientific community.
There are a number predicted interacting proteins found in Y2H screens, such as exportin 1 (XPO1), ras homolog family member U (RHOU), deoxyhypusine hydroxylase/monooxygenase (DOHH), hepatocyte nuclear factor 4, alpha (HNF4A), leukocyte receptor cluster member 1 (LENG1), and ubiquitin C (UBC). [53] [54] [55]
Expression of c14orf119 is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individuals. [6] Furthermore, expression of c14orf119 is increased in individuals with various types of lymphomas when compared to healthy individuals. [7]
CCDC186 is a protein that in humans is encoded by the CCDC186 gene The CCDC186 gene is also known as the CTCL-tumor associated antigen with accession number NM_018017.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.