C5orf34

Last updated

C5orf34 (chromosome 5 open reading frame 34) is a protein that in humans is encoded by the C5orf34 gene (5p12). [1] [2]

Contents

C5orf34 is conserved in mammals, birds and reptiles with the most distant ancestor being the Burmese python, Python bivittatus. The C5orf34 protein contains two mammalian conserved domains: DUF 4520 and DUF 4524. The protein is also predicted to have a polo-box domain (PBD) of polo-like kinase 4 (plk4), which has predicted conservation in distant orthologs from the clade Aves. [3] [4]

Gene

Human chromosomal position of C5orf34 gene on the short arm of chromosome 5. C5orf34 chrom.png
Human chromosomal position of C5orf34 gene on the short arm of chromosome 5.

C5orf34 is located on the negative DNA strand of the short arm of chromosome 6 at locus 12. The gene is 28,744 base pairs long and spans from base pair 43,486,701 to base pair 43,515,445. The gene produces a single transcript of 2,540 base pairs long and encodes for 638 amino acids. [1] [2] [6]

Gene neighborhood

The gene PAIP1 is found on the negative strand just downstream of C5orf34 and is a member of the polyadenylate-binding family. PAIP1 extends from base pairs 43,526,267 to 43,557,419. [7] CCL28 is found downstream on the negative strand and extends from base pairs 43378052 to 43413837. [8]

Gene expression

There indication of multiple sources that suggest, in humans, C5orf34 protein is expressed non-ubiquitously in select tissues at low/moderate levels, with the most abundant expression in the tissues of the stomach, small intestine, testis, skeletal muscle and heart muscle. [9] [10] A study of Rho kinase inhibitor effect on primary cell lines also showed that C5orf34 is expressed in dermal fibroblasts of normal human tissue samples. [11]

Promoter

The promoter region for C5orf34 is predicted to be between 43515079 and 43515773 and spans 695 base pairs. [12]

Protein

C5orf34 consists of 638 amino acids, has a weight of 72.7 kDa and an isoelectric point of 7.77 in humans. [1] [13] [14]

Function

Although the precise function of C5orf34 in humans remains unknown, there is evidentiary support based on structure that it is involved in kinase-related cellular functions. [15] In addition, C5orf34 is predicted to be nuclear, thus it has potential involvement in gene regulation and cell proliferation seeing as these are two primary signal transduction pathways involve nuclear kinase proteins. [16] [17]

A schematic representation of conserved domains and phosphorylated amino acid residues in human C5orf34. The red diamond projections are conserved phosphoserine sites and the grey diamond projections are conserved phosphothreonine sites. C5orf34 Locus.png
A schematic representation of conserved domains and phosphorylated amino acid residues in human C5orf34. The red diamond projections are conserved phosphoserine sites and the grey diamond projections are conserved phosphothreonine sites.

Structure

In humans, C5orf34 contains two domains of unknown function, DUF 4520 (pfam 15016) and DUF 4524 (pfam 150125), found between residues 6-153 and 444–539, respectively. The protein is serine and threonine rich. The charge distribution of the protein is equally dispersed per there are no positive or negative charge clusters sequestered within the protein. [13]

The predicted secondary structures of the human protein were assessed by multiple bioinformatic tools. All of the programs predicted the protein's structure to consist of alpha helices, extended strands, random coils and beta turns. The Phyre2 server provided a predicted human protein structure that indicated domains of plk polo-box of the serine/threonine-protein kinase plk4. The server predicted with 96.8% confidence of 20% coverage (130 residues) of the protein. The coverage exhibited residues of the conserved polo-box domain and the two DUF domains. The protein was predominantly soluble, with an average hydrophobicity of -0.478. [15] [18] [19]

Post-translational modifications

There is extensive, predicted phosphorylation of C5orf34, with 32 phosphoserines and 7 phosphothreonines being conserved in orthologs of the human C5orf34 protein. This analysis indicates C5orf34 as a phosphoprotein and supports structural predictions of it being a kinase protein. The protein contains only one nuclear export signal residue, found at 481-L; however the NES score was found to be low at 0.515. Structural analysis of the protein indicated it was sequestered in the nucleus with an 87% probability. [17] [20] [21]

Interacting proteins

Databases of protein interactions (MINT, STRING, IntAct, and BioGRID) have not identified any interactions with C5orf34.

Homology and evolution

C5orf34 is highly conserved in primates and mammals and moderately conserved in reptiles. The furthest conserved ortholog is in Python bivittatus, or the Burmese python. Below is a selected list of orthologs to demonstrate the homology of this gene with relation to the reference sequence in Homo sapiens.

Orthologous space

151 organisms have been predicted orthologs with C5orf34. [2] The most distant ortholog is the Burmese python, which diverged from humans 296 million years ago, indicating C5orf34 developed in reptiles and birds. [3] [22]

Table of C5orf34 orthologs

Scientific NameCommon

Name

Date of Divergence from Humans (MYA) [23] NCBI Protein Accession #Protein Length (amino acids)Sequence Similarity (%)
Homo sapiens Human0 NP_001076895.1 638100
Gorilla gorilla Gorilla8.8 XP_004058945.1 63692
Camelus ferus Bactrian Camel97.4 XP_006191979.1 64084
Panthera tigris altaica Siberian Tiger97.4 XP_007095478.1 63883
Sus scrofa Wild Boar97.4 XP_003133971.3 44180
Bos Tarus Cattle97.4 NP_001076895.1 63880
Erinaceus europaeus European Hedgehog97.4 XP_007517686.1 63269
Mus Musculus House Mouse91 BAE28742.1 38275
Monodelphis domestica Gray Short-tailed Opossum176.1 XP_007487459.1 51262
Chelonia mydas Green Turtle324.5 XP_007052886.1 63851
Aptendodytes forsteri Emperor Penguin324.5 XP_009272830.1 64748
Gallus gallus Chicken324.5 XP_424782.3 66948
Python bivittatus Burmese python324.5 XP_007430528.1 64946

[3]

Paralogous space

There are no predicted paralogs for C5orf34 in both humans and mice. [3]

Conserved regions

Multiple sequence alignments indicated amino acid residue conservation throughout the C5orf34 protein in an array of orthologs, with the most highly conserved regions at both N-terminus and C-terminus where the DUF are located. DUF 4520 (pfam 15016) was found to be conserved in C-terminus and DUF 4524 (pfam 150125) was found to be conserved in the N-terminus. Also, the polo-box domain of plk4 was found to be conserved in the C-terminus in a multiple sequence alignment in both strict and distant orthologs. [22]

Related Research Articles

<span class="mw-page-title-main">C2CD4D</span> Mammalian protein found in Homo sapiens

C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.

<span class="mw-page-title-main">C8orf48</span> Protein-coding gene in the species Homo sapiens

C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.

<span class="mw-page-title-main">ANKRD24</span> Protein-coding gene in the species Homo sapiens

Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.

Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.

Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.

<span class="mw-page-title-main">CRACD-like protein</span>

CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.

<span class="mw-page-title-main">TMEM44</span> Protein-coding gene in the species Homo sapiens

TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.

C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">TEX55</span> Protein-coding gene in the species Homo sapiens

Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">Fam89A</span> Human protein and gene

ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">C6orf136</span> Protein-coding gene in the species Homo sapiens

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

<span class="mw-page-title-main">C4orf19</span> Human C4orf19 gene

C4orf19 is a protein which in humans is encoded by the C4orf19 gene.

References

  1. 1 2 3 "NCBI Protein". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  2. 1 2 3 "NCBI Gene". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  3. 1 2 3 4 "NCBI Blast". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  4. Sillibourne, James E.; Bornens, Michel (2010-09-29). "Polo-like kinase 4: the odd one out of the family". Cell Division. 5 (1): 25. doi: 10.1186/1747-1028-5-25 . ISSN   1747-1028. PMC   2955731 . PMID   20920249.
  5. 1 2 Castro, Edouard. "PROSITE". prosite.expasy.org. Retrieved 2015-05-10.
  6. "Ensembl Genome Browser". www.ensembl.org. Retrieved 2015-05-09.
  7. "NCBI Gene". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  8. "NCBI Gene". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  9. "Tissue expression of C5orf34 - Summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2015-05-09.
  10. "NCBI GeoProfile". www.ncbi.nlm.nih.gov. Retrieved 2015-05-09.
  11. Boerma, Marjan; Fu, Qiang; Wang, Junru; Loose, David S.; Bartolozzi, Alessandra; Ellis, James L.; McGonigle, Sharon; Paradise, Elsa; Sweetnam, Paul; Fink, Louis M.; Vozenin-Brotons, Marie-Catherine; Hauer-Jensen, Martin (2008). "Comparative gene expression profiling in three primary human cell lines after treatment with a novel inhibitor of Rho kinase or atorvastatin". Blood Coagulation & Fibrinolysis. 19 (7): 709–718. doi:10.1097/MBC.0b013e32830b2891. PMC   2713681 . PMID   18832915.
  12. "Genomatix: Annotation & Analysis". www.genomatix.de. Retrieved 2015-05-09.
  13. 1 2 "Statistical Analysis of PS (SAPS)". Biology Workbench. Subramaniam, Shankar. Retrieved 5 May 2015.[ permanent dead link ]
  14. "ExPASy - Compute pI/Mw tool". web.expasy.org. Retrieved 2015-05-09.
  15. 1 2 "Phyre Investigator output for C5orf34__ with c1umwB_". www.sbg.bio.ic.ac.uk. Retrieved 2015-05-09.[ permanent dead link ]
  16. Matthews, Harry R.; Huebner, Verena D. (1984-03-01). "Nuclear protein kinases". Molecular and Cellular Biochemistry. 59 (1–2): 81–99. doi:10.1007/BF00231306. ISSN   0300-8177. PMID   6323962. S2CID   25765323.
  17. 1 2 "PSORT II server". www.genscript.com. Archived from the original on 2021-07-09. Retrieved 2015-05-09.
  18. UCBL, Institut. "NPS@ : SOPMA secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 2015-05-09.
  19. Sobhani, Armin. "PELE - Protein Energy Landscape Exploration - Web Server". pele.bsc.es. Retrieved 2015-05-09.
  20. "NetPhos 2.0 Server". www.cbs.dtu.dk. Retrieved 2015-05-09.
  21. "NetNES 1.1 Server". www.cbs.dtu.dk. Retrieved 2015-05-09.
  22. 1 2 "CLUSTALW". SDSC. Subramaniam, Shankar. 5 May 2015.[ permanent dead link ]
  23. "TimeTree :: The Timescale of Life". www.timetree.org. Retrieved 2015-05-10.