C5orf34 (chromosome 5 open reading frame 34) is a protein that in humans is encoded by the C5orf34 gene (5p12). [1] [2]
C5orf34 is conserved in mammals, birds and reptiles with the most distant ancestor being the Burmese python, Python bivittatus. The C5orf34 protein contains two mammalian conserved domains: DUF 4520 and DUF 4524. The protein is also predicted to have a polo-box domain (PBD) of polo-like kinase 4 (plk4), which has predicted conservation in distant orthologs from the clade Aves. [3] [4]
C5orf34 is located on the negative DNA strand of the short arm of chromosome 6 at locus 12. The gene is 28,744 base pairs long and spans from base pair 43,486,701 to base pair 43,515,445. The gene produces a single transcript of 2,540 base pairs long and encodes for 638 amino acids. [1] [2] [6]
The gene PAIP1 is found on the negative strand just downstream of C5orf34 and is a member of the polyadenylate-binding family. PAIP1 extends from base pairs 43,526,267 to 43,557,419. [7] CCL28 is found downstream on the negative strand and extends from base pairs 43378052 to 43413837. [8]
There indication of multiple sources that suggest, in humans, C5orf34 protein is expressed non-ubiquitously in select tissues at low/moderate levels, with the most abundant expression in the tissues of the stomach, small intestine, testis, skeletal muscle and heart muscle. [9] [10] A study of Rho kinase inhibitor effect on primary cell lines also showed that C5orf34 is expressed in dermal fibroblasts of normal human tissue samples. [11]
The promoter region for C5orf34 is predicted to be between 43515079 and 43515773 and spans 695 base pairs. [12]
C5orf34 consists of 638 amino acids, has a weight of 72.7 kDa and an isoelectric point of 7.77 in humans. [1] [13] [14]
Although the precise function of C5orf34 in humans remains unknown, there is evidentiary support based on structure that it is involved in kinase-related cellular functions. [15] In addition, C5orf34 is predicted to be nuclear, thus it has potential involvement in gene regulation and cell proliferation seeing as these are two primary signal transduction pathways involve nuclear kinase proteins. [16] [17]
In humans, C5orf34 contains two domains of unknown function, DUF 4520 (pfam 15016) and DUF 4524 (pfam 150125), found between residues 6-153 and 444–539, respectively. The protein is serine and threonine rich. The charge distribution of the protein is equally dispersed per there are no positive or negative charge clusters sequestered within the protein. [13]
The predicted secondary structures of the human protein were assessed by multiple bioinformatic tools. All of the programs predicted the protein's structure to consist of alpha helices, extended strands, random coils and beta turns. The Phyre2 server provided a predicted human protein structure that indicated domains of plk polo-box of the serine/threonine-protein kinase plk4. The server predicted with 96.8% confidence of 20% coverage (130 residues) of the protein. The coverage exhibited residues of the conserved polo-box domain and the two DUF domains. The protein was predominantly soluble, with an average hydrophobicity of -0.478. [15] [18] [19]
There is extensive, predicted phosphorylation of C5orf34, with 32 phosphoserines and 7 phosphothreonines being conserved in orthologs of the human C5orf34 protein. This analysis indicates C5orf34 as a phosphoprotein and supports structural predictions of it being a kinase protein. The protein contains only one nuclear export signal residue, found at 481-L; however the NES score was found to be low at 0.515. Structural analysis of the protein indicated it was sequestered in the nucleus with an 87% probability. [17] [20] [21]
Databases of protein interactions (MINT, STRING, IntAct, and BioGRID) have not identified any interactions with C5orf34.
C5orf34 is highly conserved in primates and mammals and moderately conserved in reptiles. The furthest conserved ortholog is in Python bivittatus, or the Burmese python. Below is a selected list of orthologs to demonstrate the homology of this gene with relation to the reference sequence in Homo sapiens.
151 organisms have been predicted orthologs with C5orf34. [2] The most distant ortholog is the Burmese python, which diverged from humans 296 million years ago, indicating C5orf34 developed in reptiles and birds. [3] [22]
Scientific Name | Common Name | Date of Divergence from Humans (MYA) [23] | NCBI Protein Accession # | Protein Length (amino acids) | Sequence Similarity (%) |
---|---|---|---|---|---|
Homo sapiens | Human | 0 | NP_001076895.1 | 638 | 100 |
Gorilla gorilla | Gorilla | 8.8 | XP_004058945.1 | 636 | 92 |
Camelus ferus | Bactrian Camel | 97.4 | XP_006191979.1 | 640 | 84 |
Panthera tigris altaica | Siberian Tiger | 97.4 | XP_007095478.1 | 638 | 83 |
Sus scrofa | Wild Boar | 97.4 | XP_003133971.3 | 441 | 80 |
Bos Tarus | Cattle | 97.4 | NP_001076895.1 | 638 | 80 |
Erinaceus europaeus | European Hedgehog | 97.4 | XP_007517686.1 | 632 | 69 |
Mus Musculus | House Mouse | 91 | BAE28742.1 | 382 | 75 |
Monodelphis domestica | Gray Short-tailed Opossum | 176.1 | XP_007487459.1 | 512 | 62 |
Chelonia mydas | Green Turtle | 324.5 | XP_007052886.1 | 638 | 51 |
Aptendodytes forsteri | Emperor Penguin | 324.5 | XP_009272830.1 | 647 | 48 |
Gallus gallus | Chicken | 324.5 | XP_424782.3 | 669 | 48 |
Python bivittatus | Burmese python | 324.5 | XP_007430528.1 | 649 | 46 |
There are no predicted paralogs for C5orf34 in both humans and mice. [3]
Multiple sequence alignments indicated amino acid residue conservation throughout the C5orf34 protein in an array of orthologs, with the most highly conserved regions at both N-terminus and C-terminus where the DUF are located. DUF 4520 (pfam 15016) was found to be conserved in C-terminus and DUF 4524 (pfam 150125) was found to be conserved in the N-terminus. Also, the polo-box domain of plk4 was found to be conserved in the C-terminus in a multiple sequence alignment in both strict and distant orthologs. [22]
C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.