C13orf42 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C13orf42 , LINC00372, LINC00371, long intergenic non-protein coding RNA 371, chromosome 13 open reading frame 42 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | GeneCards: C13orf42 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. [3] The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. [4] Tertiary structure predictions for C13orf42 indicate multiple alpha helices. [5]
C13orf42 is a protein encoding gene containing 4 exons. C13orf42 is also known by aliases LINC00371 and LINC00372. [3] RNA sequencing shows the gene's expression at low levels in various tissues. [3] [6]
C13orf42 is located on the minus strand of chromosome 13 at 13q14.3 in humans. [3] [7] C13orf42 is located from 51.08 Mb to 51.20 Mb on chromosome 13 and spans 118 kilobases. [8]
The genomic neighborhood of C13orf42 consists of several pseudogenes along with ribonuclease H2 subunit B (RNASEH2B), uncharacterized LOC107984554, and family with sequence similarity 124 member A (FAM124A). [3]
RNA sequencing of C13orf42 shows expression in a variety of tissues including the spleen, kidney, heart, brain, testis, skin, esophagus, colon, small intestine, stomach, lung, placenta, salivary gland, thymus, and adipose. [3] RNA sequencing of human fetal tissue shows C13orf42 expression starting at 20 weeks in the intestine, 16 weeks in the kidney, 10 weeks in the lung, and expression in the stomach is seen at 16 weeks but not 10, 18, or 20 weeks. [3] Recorded RNA expression is very low, with all results being lower than 0.5 reads per kilobase of transcript per million reads mapped (RPKM). Microarray data from NCBI geo (GDS425) shows expression in additional tissues including bone marrow, liver, skeletal muscle, spinal cord, and pancreas. [6]
C13orf42 produces four known transcript variants, variant 1, variant 2, variant 3, and variant X1. Transcript variant 3 (accession number: NM_001351589.3) is the longest high-quality mRNA at 3075 nucleotides. [10] Transcript variant 3 contains 4 exons and encodes a 325 amino acid protein.
Transcript variants 1, 2, and X1 all lack the first exon but align with exons 2, 3, and 4 of transcript variant 3. Variants 1 and 2 are not protein encoding, while variants 3 and X1 are protein coding. Variant X1 is 2717 nucleotides long and encodes a 189 amino acid protein which aligns with the last 187 amino acids of the longer protein encoded by transcript variant 3 and differs in its first two amino acids. [11] [12]
There are two known proteins encoded by the isoforms of C13orf42. [13] Transcript variant 3 encodes the longest protein at 325 amino acids long. [13] Transcript variant X1 encodes a 189 amino acid long protein. [13] This protein aligns with exons 2, 3, and 4 of the 325 amino acid protein, but is missing exon 1. [12]
C13orf42 has a predicted isoelectric point of 9.3 and a predicted molecular weight of 37.4 kDa. [14] Human C13orf42 is a serine rich and positively charged amino acid (lysine and arginine) rich protein. [15] This composition is partially conserved in orthologs.
The C13orf42 tertiary structure of the highest confidence predicted by I-Tasser is predicted to have many alpha helices. [5] In the structure below, residues indicated to be present in C13orf42 in higher amounts (serine, lysine and arginine) are annotated. [16] A space filling model and a charge model is also shown for C13orf42.
Human C13orf42 is predicted to be localized to the mitochondria, nucleus, cytosol, and endoplasmic reticulum with the ER predicted at a low percentage (<5%). [4] Orthologs show similar predicted subcellular localization with mitochondria, nucleus, and cytosol being the top predicted locations, however, predicted percentages vary. [4]
C13orf42 antibody B-4 (catalog number: sc-376095) shows cytoplasmic and nuclear staining in seminiferous ducts and Lyedig cells of testis tissue. [18] C13orf42 antibody E-3 (catalog number: sc-374567) shows cytoplasmic staining in seminiferous ducts and Lyedig cells of testis tissue, and cytoplasmic and nucleolar localization in HeLa cells. [19]
C13orf42 is predicted to have 10 highly conserved (in over 70% of analyzed orthologs from table below) phosphorylation sites. [20] Phosphorylation sites include one CK2 phosphorylation, one TYR phosphorylation, two cAMP phosphorylation sites, and six PKC phosphorylation sites. There are three predicted O-β-GlcNAc sites and two predicted yin-yang sites in C13orf42 which are fully conserved in orthologs. [21] A yin-yang site occurs when O-β-GlcNAc and phosphorylation are predicted for the same site. C13orf42 is not predicted to have myristylation sites as it does not contain an N-terminal glycine. [22]
C13orf42 has no identified domains with high confidence or conservation in orthologs.
C13orf42 has orthologs in mammals, birds, reptiles, amphibians, bony fish, and cartilaginous fish as shown in the ortholog table below. [23] No orthologs were found in jawless fish, invertebrates, plants, fungi, viruses, or bacteria. [23] All mammals contain the same 4 exons as the human C13orf42 protein, and nonmammals are missing exon 4. Mammalian orthologs have a high percent identity to human C13orf42, each having over 62% identity. The furthest orthologs (cartilaginous fish) have sequence identities around 33%. Human C13orf42 does not have paralogs. [8]
Genus and Species | Common Name | Taxonomic Class | Date of Divergence (MYA) [24] | Accession Number | Length (amino acids) | Percent Identity to Homo sapiens | Percent Similarity to Homo sapiens |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | Primates | 0 | NP_001338518.1 | 325 | 100 | 100 |
Mus musculus | Mouse | Dasyuromorphia | 87 | XP_030104110.1 | 318 | 76.9 | 84 |
Tursiops truncatus | Common bottlenose dolphin | Cetacea | 94 | XP_033699576.1 | 319 | 82.8 | 87.7 |
Equus caballus | Horse | Perissodactyla | 94 | XP_023477317.1 | 326 | 82.6 | 90.2 |
Mustela putorius | European polecat | Carnivora | 94 | XP_004775284.1 | 326 | 79.4 | 85 |
Pipistrellus kuhlii | Kuhl's pipistrelle (bat) | Chiroptera | 94 | XP_036312978.1 | 325 | 79.1 | 86.8 |
Ursus maritimus | Polar bear | Ursidae | 94 | XP_040478472.1 | 328 | 77.2 | 82.7 |
Elephas maximus indicus | Indian elephant | Proboscidea | 99 | XP_049709344.1 | 326 | 79.4 | 86.2 |
Dasypus novemcinctus | Nine-banded armadillo | Cingulata | 99 | XP_023446856.1 | 328 | 72.8 | 81.4 |
Dromiciops gliroides | Monito del monte | Microbiotheria | 160 | XP_043849658.1 | 326 | 67.9 | 79.5 |
Trichosurus vulpecula | Common brushtail possum | Diprotodontia | 160 | XP_036599801.1 | 326 | 66.7 | 79.5 |
Sarcophilus harrisii | Tasmanian devil | Dasyuromorphia | 160 | XP_023355407.1 | 326 | 66.4 | 80.1 |
Ornithorhynchus anatinus | Playtpus | Monotremes | 180 | XP_028904285.1 | 330 | 62.3 | 73 |
Alligator sinensis | Chinese alligator | Crocodilian | 319 | XP_025062978.1 | 266 | 48.6 | 61.7 |
Dromaius novaehollandiae | Emu | Casuariiformes | 319 | XP_025964173.1 | 268 | 48.2 | 59.8 |
Gallus gallus | Chicken | Galliformes | 319 | XP_004938779.1 | 268 | 47.6 | 61.3 |
Camarhynchus parvulus | Small tree finch | Thraupidae | 319 | XP_030802909.1 | 264 | 47.1 | 60.9 |
Pelodiscus sinensis | Chinese softshell turtle | Testudines | 319 | XP_014429996.1 | 267 | 47 | 60.6 |
Phasianus colchicus | Common pheasant | Galliformes | 319 | XP_031471701.1 | 272 | 46.4 | 59.9 |
Varanus komodoensis | Komodo dragon | Squamata | 319 | XP_044300138.1 | 269 | 45.9 | 60.1 |
Python bivittatus | Burmese python | Squamata | 319 | XP_025026122.1 | 269 | 44.1 | 58 |
Pantherophis guttatus | Corn snake | Squamata | 319 | XP_034287884.1 | 267 | 42.7 | 56.4 |
Rhinatrema bivittatum | Two-lined caecilian | Rhinatrematidae | 353 | XP_029459031.1 | 273 | 45.4 | 59.5 |
Xenopus tropicalis | Western clawed frog | Anura | 353 | XP_031752228.1 | 260 | 43.5 | 57.4 |
Bufo gargarizans | Asiatic toad | Anura | 353 | XP_044141363.1 | 262 | 40.7 | 56.8 |
Bufo bufo | Common toad | Anura | 353 | XP_040278366.1 | 391 | 30.9 | 42 |
Protopterus annectens | West african lungfish | Lepidosireniformes | 408 | XP_043927617.1 | 269 | 37.8 | 53.5 |
Polyodon spathula | Paddlefish | Acipenseriformes | 431 | XP_041127510.1 | 275 | 33.8 | 52.4 |
Danio rerio | Zebrafish | Cypriniformes | 431 | XP_021329868.1 | 287 | 30.2 | 44.7 |
Clupea harengus | Atlantic herring | Clupeiformes | 431 | XP_042559186.1 | 306 | 29 | 41.7 |
Callorhinchus milii | Australian ghostshark | Chimaeriformes | 464 | XP_042189981.1 | 267 | 35.5 | 53.3 |
Amblyraja radiata | Thorny skate | Rajiformes | 464 | XP_032889397.1 | 267 | 33 | 47.5 |
Scyliorhinus canicula | Small-spotted catshark | Carcharhiniformes | 464 | XP_038658253.1 | 264 | 32.5 | 50.4 |
A phylogenetic tree shows human C13orf42 is most related its mammalian orthologs, and most distantly related to cartilaginous fish orthologs. [25]
This section is empty. You can help by adding to it. (January 2024) |
Kanagal-Shamanna et. al identified an ATM fusion with C13orf42 in a patient with chronic lymphocytic leukemia which lead to ATM inactivation. [26]
Xiong et. al indicated SNP rs7325564 to be significantly associated with nasion and pronasale face shape in humans. [27]
Transmembrane protein 98 is a single-pass membrane protein that in humans is encoded by the TMEM98 gene. The function of this protein is currently unknown. TMEM98 is also known as UNQ536/PRO1079.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
C5orf24 is a protein encoded by the C5orf24 gene (5q31.1) in humans. C5orf24 is primarily localized to the nucleus and is highly conserved with orthologs in mammals, birds, reptiles, amphibians, and fish.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.
C4orf36 is a protein that in humans is encoded by the c4orf36 gene.
Human uncharacterized protein CXorf65 is encoded by the gene CXorf65, which is located on the minus strand of chromosome X. Its transcript is 834 nucleotides long and consists of 6 exons. The translated protein is 183 amino acids in length. with a molecular weight of 21.3 kDa
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.