C5orf24 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C5orf24 , chromosome 5 open reading frame 24 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1925771 HomoloGene: 17572 GeneCards: C5orf24 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C5orf24 (chromosome 5 open reading frame 24) is a protein encoded by the C5orf24 gene (5q31.1) in humans. [5] [6] C5orf24 is primarily localized to the nucleus and is highly conserved with orthologs in mammals, birds, reptiles, amphibians, and fish. [7] [8] [9]
Human C5orf24 is a protein-coding gene 26,133 base pairs long (chr5:134,833,603-134,859,735) composed of two exons and one intron at locus 5q31.1 oriented on the plus strand. [5] [10] [11] [12] Alternate names for the gene are FLJ37562 and LOC134553. [10] [13] [14] Genes neighboring C5orf24 include DDX46, RPL34P13, and TXNDC15. [5] Some transcription factors predicted to bind to conserved sites on the promoter region (GXP_7545710) are NRF1, E2F, ZF5, and AHR. [15]
Transcript Variant | Length (nt) | Protein Isoform | Length (aa) |
1 (NM_001135586.1) | 5083 | 1 (NP_001129058.1) | 188 |
2 (NM_152409.3) | 4896 | 1 (NP_689622.2) | 188 |
3 (NM_001300894.2) | 3054 | 2 (NP_001287823.1) | 155 |
The human C5orf24 gene has three mRNA transcript variants. [5] [11] Both transcript variant 1 and 2 encode protein isoform 1 which is 188 amino acids in length. [16] [17] Transcript variant 1 is the longest and highest quality transcript (5083 nucleotides) with transcript variant 2 (4896 nucleotides) having a smaller 5' UTR region. [16] [17] Transcript variant 3 lacks an internal segment resulting in an alternate translational stop codon making it is the shortest variant (3054 nucleotides) encoding the smaller protein isoform 2 which is 155 amino acids in length. [18]
Isoform 1 of the UPF0461 protein C5orf24 is 188 amino acids long encoded by exon 2. [6] It contains two disordered regions at the amino acid positions 1-20 and 79-142, respectively. [6] The second disordered region contains a series of internal repeats. [19] [20] The human precursor protein is predicted to be 20.1 kDa with an isoelectric point of approximately 10. [21] Immunoblotting demonstrated the experimental molecular-weight to be about 25 kDa. [22] Three experimental phosphorylation sites have been reported at Ser37, [23] Ser121, [24] and Ser180 [24] along with evidence for a ubiquitination site at Lys146. [25] [26] [6] [27] A conserved nuclear localization signal at amino acid positions 79 – 83 (KKKK) was corroborated by immunofluorescence experiments using anti-C5orf24 antibodies depicting localization to the nucleoplasm. [7] [8] [9] Affinity chromatography and anti tag coimmunoprecipitation experiments showed C5orf24 likely interacts with multiple other proteins including STK11, CAB39, LYK5, PKNOX1, and PBX1. [28] [29]
The C5orf24 protein is not present in plants or fungus but orthologs have been found in mammals, birds, reptiles, amphibians, as well as bony fish (Osteichthyes) and cartilaginous fish (Chondrichthyes). [7] There is evidence for an orthologous domain in jawless fishes (Agnatha) and invertebrates. [7] Comparison of m values (corrected rate of divergence) between C5orf24 (NP_001129058.1), Cytochrome c (NP_061820.1) which has a slow rate of evolution, [30] and Fibrinogen alpha (NP_000499.1) which has a fast rate of evolution [31] demonstrated this protein evolved at fairly slow rate especially when fish sequences are excluded. [6] [7] [32] [33] [34]
C5orf24 | Scientific Name | Common Name | Taxonomic Group | Median Date of Divergence (MYA) | Accession Number | Sequence Length (aa) | Query Cover | Sequence Identity |
Mammals | Homo sapiens | Human | Primates | 0 | NP_001129058.1 | 188 | 100% | 100% |
Cavia porcellus | Guinea Pig | Rodentia | 89 | XP_005005246.1 | 188 | 100% | 98.4% | |
Ursus maritimus | Polar Bear | Carnivora | 94 | XP_008689817.1 | 188 | 100% | 97.9% | |
Trichechus manatus latirostris | Florida Manatee | Sirenia | 102 | XP_004384765.1 | 188 | 100% | 95.7% | |
Ornithorhynchus anatinus | Platypus | Monotremata | 180 | XP_007669207.1 | 188 | 100% | 82.4% | |
Birds | Calypte anna | Anna's Hummingbird | Apodiformes | 318 | XP_030314921.1 | 188 | 100% | 86.2% |
Strigops habroptila | Kākāpō | Psittaciformes | 318 | XP_030360294.1 | 188 | 100% | 85.1% | |
Reptiles | Pelodiscus sinensis | Chinese Softshell Turtle | Testudines | 318 | XP_006116108.1 | 188 | 100% | 85.1% |
Python bivittatus | Burmese python | Squamata | 318 | XP_007421938.1 | 188 | 100% | 78.7% | |
Amphibians | Rhinatrema bivittatum | Two-Lined Caecilian | Gymnophiona | 352 | XP_029439506.1 | 188 | 100% | 75.5% |
Xenopus tropicalis | Tropical Clawed Frog | Anura | 352 | NP_001072358.1 | 186 | 100% | 70.7% | |
Fishes | Esox Lucius | Northern Pike | Osteichtyes | 433 | XP_019903474.2 | 204 | 100% | 56.5% |
Scyliorhinus canicular | Small-Spotted Catshark | Chondrichthyes | 465 | XP_038651786.1 | 193 | 96% | 53.8% |
Multiple sequence alignments revealed the C5orf24 protein has been highly conserved and likely originated in cartilaginous fishes nearly 465 million years ago. [7] [32] [35] [36] A series of internal repeats in the second disordered region were additionally identified in proteins found within jawless fishes and invertebrates, suggesting an orthologous domain began even further back in evolutionary history. [7]
C5orf24 is ubiquitously expressed with limited tissue variability. [5] [10] [37] Microarray-assessed tissue expression patterns show C5orf24 levels decreasing in pro-inflammatory environments such as in patients with tibial muscular dystrophy [38] and children with obesity. [39]
While this gene has yet to be well understood by the scientific community, some genotype-phenotype correlations have been established including the upregulation of C5orf24 in individuals with PTSD and downregulation in those with improved symptoms, [40] a linear correlation between methylation levels of C5orf24 GC sites to negative affect scores in drug addicts, [41] as well as GWAS studies demonstrating SNPs in C5orf24 to be associated with Parkinson's disease in the Chinese Han population [42] and Crohn's disease. [43]
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
C11orf42 is an uncharacterized protein in homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
UPF0602 is a protein in humans that is encoded by the chromosome 4 open reading frame 47 (c4orf47) gene.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Chromosome 12 open reading frame 71 (c12orf71) is a protein which in humans is encoded by c12orf71 gene. The protein is also known by the alias LOC728858.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.