C4orf51

Last updated
C4orf51
Identifiers
Aliases C4orf51 , chromosome 4 open reading frame 51
External IDs MGI: 1914937 HomoloGene: 78034 GeneCards: C4orf51
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001080531

NM_026315

RefSeq (protein)

NP_001074000

NP_080591

Location (UCSC) Chr 4: 145.68 – 145.77 Mb Chr 8: 79.94 – 79.98 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene . [5]

Contents

Gene

Genomic neighborhood of C4orf51, labeled in red. Genomic neighborhood of C4orf51.png
Genomic neighborhood of C4orf51, labeled in red.

The C4orf51 gene is located at 4q31.21 on the plus strand of chromosome 4. [6] The gene spans 120,289 base pairs and contains 6 exons. [7] The genomic neighborhood of C4orf51 includes LOC285422, LINC02491, NCOA4P3, and MMAA , all located upstream of C4orf51. [5] ZNF827 and LOC105377468 are located downstream of C4orf51.

mRNA

There are three known transcript variants for C4orf51, which encode for isoforms X1, X2, and X3. [5] Though the variants vary in length, all contain exons 1 and 2. At times, C4orf51 is transcribed to form an mRNA corresponding to C4orf51 and the neighboring gene.

Protein

Schematic illustration of predicted post-translational modifications for C4orf51, made using DOG 2.0. DUF4722 shown. C4orf51 Schematic Illustration.png
Schematic illustration of predicted post-translational modifications for C4orf51, made using DOG 2.0. DUF4722 shown.

C4orf51 encodes for a protein with 202 amino acids and a molecular weight of 23 kDa. [6] The theoretical isoelectric point of C4orf51 is 8.6. [9] Relative to other human proteins, C4orf51 has more serine resides and fewer valine residues. [10]

Domains and motifs

In humans, the C4orf51 protein contains one domain of unknown function, DUF4722. [11] DUF4722 spans the first 168 amino acids of C4orf51 and has a predicted molecular weight of 19.3 kDa. [9] In a compositional analysis of this domain, no extremes were identified. [10] The DUF is highly conserved in orthologous proteins, particularly near the N-terminus. [12]

Secondary structure

Alpha-helices are predicted to span amino acids 20-34 and 150–165 in C4orf51. [13] [14] [15] Amino acids 45 to 48 are predicted to form a beta sheet. [13] [14] No coils are predicted in C4orf51. [16]

Structural analog of C4orf51, generated by I-TASSER and visualized with iCn3D. Conserved domain Clr2_transil, involved in transcriptional silencing, is labeled in yellow. Itasser.png
Structural analog of C4orf51, generated by I-TASSER and visualized with iCn3D. Conserved domain Clr2_transil, involved in transcriptional silencing, is labeled in yellow.

Tertiary and quaternary structure

The best-aligned structural analog of C4orf51, generated by I-TASSER, contains Clr2_transil, a domain involved in transcriptional silencing. [17] [18] Per Origene, migration of a C4orf51 rabbit polyclonal antibody in gel resulted in a band at 23 kDa and at ~44-46 kDa, suggesting that C4of51 may form a dimer. [19]

Post-translational modifications

Conceptual translation of C4orf51, with annotation key below. Exon-exon boundaries, transcription start and stop sites, and predicted post-translational modifications are marked. Conceptual translation of C4orf51.png
Conceptual translation of C4orf51, with annotation key below. Exon-exon boundaries, transcription start and stop sites, and predicted post-translational modifications are marked.

C4orf51 is predicted to undergo several post-translational modifications, including phosphorylation, glycation, and acetylation. [20] [21] [22] Though SUMOylation and tyrosine sulfation are also predicted, the sites of these modifications are not conserved in distant C4orf51 orthologs. [23] [24]

Subcellular localization

C4orf51 is predicted to be localized to the cell nucleus. [25] The protein contains pat4, a motif commonly used to identify potential nuclear localization signals. This motif is conserved in the most distantly related C4orf51 ortholog known, found in Anolis carolinensis.

Expression

C4orf51 expression is low in all tissues, with the exception of the testes. [26] However, because C4orf51 contains long-terminal repeats (LTRs) of human endogenous retroviruses (HERVs) in the gene body, it has exhibited high levels of expression in differentiation-defective human induced pluripotent stem cells. [27] [28]

Promoter

There are two promoter regions predicted by Genomatix, but only one (GXP_921944) is located upstream of the transcription start site. [29] GXP_921944 spans 1910 base pairs on chromosome 4. There are 15 coding transcripts supporting this promoter, but none are experimentally verified. [29]

Interacting proteins

Experimentally-determined protein interactions for C4orf51 have not yet been identified. [30] [31] [32]

Clinical significance

Vlaikou et al. (2004) report that a 4q deletion containing C4orf51 and six other genes causes growth failure and developmental delay, minor craniofacial dysmorphism, digital anomalies, and cardiac and skeletal defects. [33]

Homology

Paralogs

No paralogs or paralogous domains exist for C4orf51. [7]

Orthologs

Orthologs of C4orf51 have been found in mammals and reptiles. [7] Within class Mammalia, orthologs have been identified in orders Primata, Scandentia, Lagomorpha, Rodentia, Perissodactyla, Chiroptera, Carnivora, Cetartiodactyla, Sirenia, and Proboscidea, as well as mammalian infraclass Marsupialia. The green anole (Anolis carolinensis) and Burmese python (Python bivittatus) contain the most distantly related orthologs of C4orf51. Both species diverged from humans an estimated 312 million years ago. C4orf51 orthologs have not yet been identified in bacteria, archaea, protists, plants, fungi, trichoplax, invertebrates, bony or cartilaginous fish, amphibians, or birds.

C4orf51 Orthologs
Genus and speciesCommon nameTaxonomic groupEstimated date of divergenceAccession numberLength (amino acids)Sequence identitySequence similarity
Homo sapiensHumanMammalia (Primate)0NP_001074000.1202100.00%100%
Macaca mulattaRhesus macaqueMammalia (Primate)29.44NP_001181807.120294.55%97%
Callithrix jacchusCommon marmosetMammalia (Primate)43.6XP_008990874.121779.72%88%
Tupaia chinensisChinese tree shrewMammalia (Scandentia)82XP_006143532.120168.9677%
Oryctolagus cuniculusEuropean rabbitMammalia (Lagomorpha)90XP_017202803.122257.40%76%
Mus musculusHouse mouseMammalia (Rodentia)90NP_080591.120850.96%66%
Urocitellus parryiiArctic ground squirrelMammalia (Rodentia)90XP_026248522.114243.35%72%
Ceratotherium simum simumSouthern white rhinocerosMammalia (Perissodactyla)96XP_014635653.119966.50%77%
Equus asinusDonkeyMammalia (Perissodactyla)96XP_014693612.120164.71%75%
Pteropus vampyrusLarge flying foxMammalia (Chiroptera)96XP_023385935.120062.56%71%
Enhydra lutris kenyoniSea otterMammalia (Carnivora)96XP_022368037.120161.39%74%
Myotis brandtiiBrandt's batMammalia (Chiroptera)96XP_014393999.119959.11%69%
Callorhinus ursinusNorthern fur sealMammalia (Carnivora)96XP_025730051.114650.50%68%
Vicugna pacosAlpacaMammalia (Artiodactyla)96XP_00621000715850.50%59%
Balaenoptera acutorostrata scammoniMinke whaleMammalia (Cetacea)96XP_007189508.116849.51%56%
Trichechus manatus latirostrisWest Indian manateeMammalia (Sirenia)105XP_004378925.116257.64%66%
Loxodonta africanaAfrican bush elephantMammalia (Proboscidea)105XP_023412869.121353.00%65%
Sarcophilus harrisiiTasmanian devilMammalia (Marsupial)159XP_023361728.119028.71%52%
Anolis carolinensisGreen anoleReptilia312XP_003221711.119421.53%46%
Python bivittatusBurmese pythonReptilia312XP_025028520.117619.43%51%

Related Research Articles

C5orf34 is a protein that in humans is encoded by the C5orf34 gene (5p12).

CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.

UPF0575 protein C19orf67 is a protein which in humans is encoded by the C19orf67 gene. Orthologs of C19orf67 are found in many mammals, some reptiles, and most jawed fish. The protein is expressed at low levels throughout the body with the exception of the testis and breast tissue. Where it is expressed, the protein is predicted to be localized in the nucleus to carry out a function. The highly conserved and slowly evolving DUFF3314 region is predicted to form numerous alpha helices and may be vital to the function of the protein.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

C11orf42 is an uncharacterized protein in homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">SMCO3</span>

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">C1orf185</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.

<span class="mw-page-title-main">C20orf202</span>

C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.

<span class="mw-page-title-main">C7orf50</span>

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">C12orf24</span>

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">LSMEM2</span>

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

<span class="mw-page-title-main">C11orf98</span>

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

<span class="mw-page-title-main">C3orf38</span> An article about the uncharacterized gene C3orf38.

Chromosome 3 open reading frame 38 (C3orf38) is a protein which in humans is encoded by the C3orf38 gene.

<span class="mw-page-title-main">C4orf19</span> Human C4orf19 gene

C4orf19 is a protein which in humans is encoded by the C4orf19 gene.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000237136 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000031682 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 "C4orf51 chromosome 4 open reading frame 51 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-06.
  6. 1 2 "C4orf51 Gene (Protein Coding)". Gene Cards. Retrieved 2019-02-03.
  7. 1 2 3 "Homo sapiens chromosome 4 open reading frame 51 (C4orf51), mRNA". NCBI (National Center for Biotechnology Information): Nucleotide. May 2019.
  8. "DOG 2.0 - Protein Domain Structure Visualization". dog.biocuckoo.org. Retrieved 2019-05-02.
  9. 1 2 "ExPASy - Compute pI/Mw tool". web.expasy.org. Retrieved 2019-04-21.
  10. 1 2 "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-04-21.
  11. "uncharacterized protein C4orf51 [Homo sapiens]". NCBI (National Center for Biotechnology Information): Protein.
  12. "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-05-06.
  13. 1 2 "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2019-05-03.
  14. 1 2 "NPS@ : GOR4 secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 2019-04-21.
  15. "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2019-05-03.
  16. "COILS Server". embnet.vital-it.ch. Retrieved 2019-05-03.
  17. 1 2 "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2019-05-03.
  18. 1 2 "iCn3D: Web-based 3D Structure Viewer". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  19. "C4orf51 Rabbit Polyclonal Antibody – TA335924 | OriGene". www.origene.com. Retrieved 2019-05-03.
  20. "GPS 3.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.org. Retrieved 2019-04-21.
  21. "NetGlycate 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-04-21.
  22. "NetAcet 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-04-21.
  23. "SUMOplot™ Analysis Program | Abgent". www.abgent.com. Retrieved 2019-05-03.
  24. "ExPASy - Sulfinator". web.expasy.org. Retrieved 2019-05-03.
  25. "PSORT II Prediction". psort.hgc.jp. Retrieved 2019-04-21.
  26. "C4orf51 chromosome 4 open reading frame 51 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-04-21.
  27. Koyanagi-Aoi M, Ohnuki M, Takahashi K, Okita K, Noma H, Sawamura Y, Teramoto I, Narita M, Sato Y, Ichisaka T, Amano N, Watanabe A, Morizane A, Yamada Y, Sato T, Takahashi J, Yamanaka S (December 2013). "Differentiation-defective phenotypes revealed by large-scale analyses of human pluripotent stem cells". Proceedings of the National Academy of Sciences of the United States of America. 110 (51): 20569–74. Bibcode:2013PNAS..11020569K. doi: 10.1073/pnas.1319061110 . PMC   3870695 . PMID   24259714.
  28. Ohnuki M, Tanabe K, Sutou K, Teramoto I, Sawamura Y, Narita M, Nakamura M, Tokunaga Y, Nakamura M, Watanabe A, Yamanaka S, Takahashi K (August 2014). "Dynamic regulation of human endogenous retroviruses mediates factor-induced reprogramming and differentiation potential". Proceedings of the National Academy of Sciences of the United States of America. 111 (34): 12426–31. Bibcode:2014PNAS..11112426O. doi: 10.1073/pnas.1413299111 . PMC   4151758 . PMID   25097266.
  29. 1 2 "Genomatix: Gene2Promoter Result". www.genomatix.de. Retrieved 2019-04-21.
  30. "C4orf51 protein (human) - STRING interaction network". string-db.org. Retrieved 2019-04-21.
  31. "The Molecular INTeraction Database – An ELIXIR Core Resource" . Retrieved 2019-05-06.
  32. "Mentha: the interactome browser". www.mentha.uniroma2.it. Retrieved 2019-05-06.
  33. Vlaikou AM, Manolakos E, Noutsopoulos D, Markopoulos G, Liehr T, Vetro A, Ziegler M, Weise A, Kreskowski K, Papoulidis I, Thomaidis L, Syrrou M (2014). "An interstitial 4q31.21q31.22 microdeletion associated with developmental delay: case report and literature review". Cytogenetic and Genome Research. 142 (4): 227–38. doi:10.1159/000361001. PMID   24733116. S2CID   32287205.