C19orf67

Last updated

UPF0575 protein C19orf67 is a protein which in humans ( Homo sapiens) is encoded by the C19orf67 gene. [1] Orthologs of C19orf67 are found in many mammals, some reptiles, and most jawed fish. [2] [3] The protein is expressed at low levels throughout the body with the exception of the testis and breast tissue. [4] [5] Where it is expressed, the protein is predicted to be localized in the nucleus to carry out a function. The highly conserved and slowly evolving DUFF3314 region is predicted to form numerous alpha helices and may be vital to the function of the protein.

Contents

Gene

In humans, C19orf67 is located on the minus strand of Chromosome 19 at 19p13.12 and spans 4,163 base pairs (bp). [1] [6]

The following genes are found near C19orf67 on the positive strand: [1]

The following genes are found near C19orf67 on the minus strand: [1]

mRNA transcript

C19orf67 has three transcript variants, although the second and third variants are only predicted by an Ensembl analysis and not experimentally confirmed. [7] Only the first two variants are protein-coding transcripts.

The first transcript consists of 1447bp while the second and third have 751bp and 656bp respectively. [7] The mature mRNA of the longest isoform is made up from 6 exons.

Splice Variants for C19orf67. Splice Variants C19orf67.png
Splice Variants for C19orf67.

Protein

The unmodified protein has 358 amino acids, predicted molecular weight of 40kDa, charge of -11, and isoelectric point of 5. [8] [9] 44 prolines were found along the protein, making up 12.3% of the total amino acid sequence. The proline content by percentage was found to be higher in UPF0575 than 95% of comparable human proteins. However, the amount of asparagine the protein is less abundant when compared to the human proteome. [9]

Domains

Both isoforms contain DUF3314. Although the function is not yet well understood, conservation among numerous taxa indicate that it may be important to the function of the protein. [2] [10] The first isoform has a non-repeating proline-rich region from amino acids 12 to 80. [11] The function of the region is not well understood but it may be involved in preventing helices from forming due to the rigid structure of proline. [12]


Secondary structure

A cross-program consensus predicted that UPF0575 protein C19orf67 forms four alpha helices and two beta sheets in the following locations in the amino acid sequence: [13] [14]

Helix52-6290-108115-125153-180
Sheet193-202210-217

Post-translational modifications

Acetylation is likely to occur at the N-terminus while the C-terminus is unlikely to be modified. [15] [16] O-Glycosylation is predicted to occur at amino acids 18 and 67. Several possible phosphorylation sites were identified along with the associated kinases: [17] [18]

Predicted Post-Translational Modifications and Domains of UPF0575 C19orf67. Post mods.tif
Predicted Post-Translational Modifications and Domains of UPF0575 C19orf67.
LocationAmino acidKinase
67Serine cdk5
127ThreoninePKC
169Threonine PKC
196Serine cdc2
204Serine PKA
299Tyrosine SRC
346SerinePKA/PKG

Subcellular localization

UPF0575 protein C19orf67 is expected to be targeted in the nucleus, specifically the nucleolus. [19] [20]

Expression and regulation

Abundance of C19orf67 found throughout the body in humans relative to abundances of other human proteins C19orf67 Protein Abdunance.png
Abundance of C19orf67 found throughout the body in humans relative to abundances of other human proteins

Regulation of gene expression

The promoter region is predicted to start 1,303 bp upstream from the 5' UTR and consist of 1,447 bp, causing 144 bp to overlap with the 5' UTR. [21]

A number of transcription factors such as FOXP1, SOX5, SOX6, SOX4, and MZF1 are likely to bind with the promoter, often acting to downregulate transcription. When regarding the expression of other genes, these transcription factors typically play a role in the development of various tissues such as the heart, lung, and also take part in the differentiation of early embryonic cells, and red blood cells.

Transcriptional regulation

It is suspected that the mature mRNA of C19orf67 forms a stem loop on the 3' UTR that spans from 1,296bp to 1,350bp of the transcript. [22]

Tissue expression

In humans, UPF0575 protein C19orf67 is highly expressed in the testis and breast tissue, although it is also expressed at low levels in the stomach, cerebral cortex, thyroid gland, and salivary gland. [8] [4] [5]

The protein product is less abundant than most of the human proteome in many tissues. [23]

Homology

Paralogs

There are no known paralogs of UPF0575 protein C19orf67, nor are there any known paralogous domains of DUF3314 found. [2] [3]

Orthologs

Orthologs of UPF0575 protein C19orf67 were found to be present among a wide variety of mammals with it being particularly well represented in rodentia and primates. Orthologs were also found in each reptilian order but were much more scarce in presence relative to mammals. [2] [3] A high number and variety of ray-finned fishes were found to have orthologs while there were fewer cartilaginous fish found to have orthologs; no jawless fish could be found with orthologs. [2]

No orthologs are known to be present in birds or amphibians. No invertebrates, fungi, bacteria, or lower species have known orthologs.

BLAT and BLAST were used to create following table as a sample ortholog space for UPF0575 protein C19orf67. [2] [3] [24] [25] This table is not a complete list of orthologs, it is meant to display the span in which there are orthologs and the diversity of those species.

Genus and speciesCommon nameAccession numberOrderDivergence (MYA)Sequence lengthIdentitySimilarity
Homo sapiensHumanNP_001264307Primate0358----
Galeopterus variegatusSunda Flying LemurXP_008564240.1Dermoptera8235885%89%
Miniopterus natalensisLong-fingered batXP_016077689.1Chiroptera9435684%89%
Ursus maritimusPolar bearXP_008709937.1Carnivores9435885%89%
Mus musculusHouse mouseNP_898920.2Rodentia8830071%74%
Gekko japonicusGekkoXP_015270669.1Squamata32033149%63%
Chelonia mydasGreen TurtleXP_007069233.1Testudinata32034549%61%
Alligator mississippiensisAmerican alligatorXP_019353135.1Crocodilia32029745%57%
Latimeria chalumnaeWest Indian CoelacanthXP_005995930.1Coelacanthiformes44041437%50%
Salmo salarAtlantic SalmonXP_013986580.1Salmoniformes43233635%45%
Danio rerioZebrafishNP_001083047.1Cypriniformes43234432%44%
Pygocentrus nattereriRed piranhaXP_017554468.1Characiformes43234832%43%

Molecular evolution

UPF0575 protein C19orf67 consists of one family and there are no apparent duplications throughout the evolution of UPF0575 protein C19orf67. [2]

The DUF3314 region of the gene appears to have diverged at a slower rate relative to the rest of the gene, indicating that that region may have been undergoing purifying selection because that region played an important role in the protein. [2] [24] [25]

Clinical significance

In one case study, C19orf67 was one of 29 genes on chromosome 19 lost due to deletions caused by chromosomal rearrangements. The rearrangements resulted neural development issues and behavioral abnormalities, although it is not known whether C19orf67 played an active role in the resulting phenotypes. [26] In a different study, when a portion of chromosome 19 that also included C19orf67 was deleted, developmental issues such as Intrauterine growth restriction, premature birth, and muscular hypotonia, occurred. [27]

C19orf67, among many other genes, may be used as a possible marker to detect mature beta cells. [28]

Related Research Articles

<span class="mw-page-title-main">ITFG3</span> Protein-coding gene in the species Homo sapiens

Protein ITFG3 also known as family with sequence similarity 234 member A (FAM234A) is a protein that in humans is encoded by the ITFG3 gene. Here, the gene is explored as encoded by mRNA found in Homo sapiens. The FAM234A gene is conserved in mice, rats, chickens, zebrafish, dogs, cows, frogs, chimpanzees, and rhesus monkeys. Orthologs of the gene can be found in at least 220 organisms including the tropical clawed frog, pandas, and Chinese hamsters. The gene is located at 16p13.3 and has a total of 19 exons. The mRNA has a total of 3224 bp and the protein has 552 aa. The molecular mass of the protein produced by this gene is 59660 Da. It is expressed in at least 27 tissue types in humans, with the greatest presence in the duodenum, fat, small intestine, and heart.

<span class="mw-page-title-main">KIAA1109</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.

<span class="mw-page-title-main">Proline-rich 12</span> Protein-coding gene in the species Homo sapiens

Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.

C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein. In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">C17orf53</span>

C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">FAM155B</span> Protein-coding gene in humans

Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">TMEM169</span> Gene

Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.

<span class="mw-page-title-main">SMIM19</span> Protein-coding gene in the species Homo sapiens

SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">C4orf19</span> Human C4orf19 gene

C4orf19 is a protein which in humans is encoded by the C4orf19 gene.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

References

  1. 1 2 3 4 "C19orf67 chromosome 19 open reading frame 67 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-03-02.
  2. 1 2 3 4 5 6 7 8 "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2017-03-02.
  3. 1 2 3 4 "Human BLAT Search". genome.ucsc.edu. Retrieved 2017-03-02.
  4. 1 2 github.com/gxa/atlas/graphs/contributors, EMBL-EBI Expression Atlas development team. "Expression Atlas < EMBL-EBI". www.ebi.ac.uk. Retrieved 2017-05-07.{{cite web}}: |last= has generic name (help)
  5. 1 2 "77903326 - GEO Profiles - NCBI". www.ncbi.nlm.nih.gov. Retrieved 7 May 2017.
  6. Grimwood J, Gordon LA, Olsen A, Terry A, Schmutz J, Lamerdin J, Hellsten U, Goodstein D, Couronne O, Tran-Gyamfi M, Aerts A, Altherr M, Ashworth L, Bajorek E, Black S, Branscomb E, Caenepeel S, Carrano A, Caoile C, Chan YM, Christensen M, Cleland CA, Copeland A, Dalin E, Dehal P, Denys M, Detter JC, Escobar J, Flowers D, Fotopulos D, Garcia C, Georgescu AM, Glavina T, Gomez M, Gonzales E, Groza M, Hammon N, Hawkins T, Haydu L, Ho I, Huang W, Israni S, Jett J, Kadner K, Kimball H, Kobayashi A, Larionov V, Leem SH, Lopez F, Lou Y, Lowry S, Malfatti S, Martinez D, McCready P, Medina C, Morgan J, Nelson K, Nolan M, Ovcharenko I, Pitluck S, Pollard M, Popkie AP, Predki P, Quan G, Ramirez L, Rash S, Retterer J, Rodriguez A, Rogers S, Salamov A, Salazar A, She X, Smith D, Slezak T, Solovyev V, Thayer N, Tice H, Tsai M, Ustaszewska A, Vo N, Wagner M, Wheeler J, Wu K, Xie G, Yang J, Dubchak I, Furey TS, DeJong P, Dickson M, Gordon D, Eichler EE, Pennacchio LA, Richardson P, Stubbs L, Rokhsar DS, Myers RM, Rubin EM, Lucas SM (2004). "The DNA sequence and biology of human chromosome 19". Nature. 428 (6982): 529–35. Bibcode:2004Natur.428..529G. doi: 10.1038/nature02399 . PMID   15057824.
  7. 1 2 3 "Transcript: C19orf67-001 (ENST00000548523) - Domains & features - Homo sapiens - Ensembl genome browser 83". dec2015.archive.ensembl.org. Retrieved 2017-03-02.
  8. 1 2 "Tissue expression of C19orf67 - Summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2017-03-02.
  9. 1 2 Brendel, V.; Bucher, P.; Nourbakhsh, I. R.; Blaisdell, B. E.; Karlin, S. (1992-03-15). "Methods and algorithms for statistical analysis of protein sequences". Proceedings of the National Academy of Sciences of the United States of America. 89 (6): 2002–2006. Bibcode:1992PNAS...89.2002B. doi: 10.1073/pnas.89.6.2002 . ISSN   0027-8424. PMC   48584 . PMID   1549558.
  10. "Pfam: Family: DUF3314 (PF11771)". pfam.xfam.org. Retrieved 2017-03-02.
  11. "Database of protein domains, families and functional sites". ExPASy. Retrieved 2017-02-27.
  12. Williamson, M.P. (1994). "The structure and function of proline-rich regions in proteins". Biochemical Journal. 249 (Pt 2): 249–260. doi:10.1042/bj2970249. PMC   1137821 . PMID   8297327.
  13. Alva, Vikram; Nam, Seung-Zin; Söding, Johannes; Lupas, Andrei N. (2016-07-08). "The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis". Nucleic Acids Research. 44 (W1): W410–W415. doi:10.1093/nar/gkw348. ISSN   0305-1048. PMC   4987908 . PMID   27131380.
  14. Chou, Peter Y.; Fasman, Gerald D. (1974-01-15). "Prediction of protein conformation". Biochemistry. 13 (2): 222–245. doi:10.1021/bi00699a002. ISSN   0006-2960. PMID   4358940.
  15. Charpilloz, Jean-Luc Falcone & Christophe. "TERMINUS - Welcome to terminus". terminus.unige.ch. Retrieved 2017-04-24.
  16. Fankhauser, Niklaus; Mäser, Pascal (2005-05-01). "Identification of GPI anchor attachment signals by a Kohonen self-organizing map". Bioinformatics. 21 (9): 1846–1852. doi: 10.1093/bioinformatics/bti299 . ISSN   1367-4803. PMID   15691858.
  17. Blom, N.; Gammeltoft, S.; Brunak, S. (1999-12-17). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–1362. doi:10.1006/jmbi.1999.3310. ISSN   0022-2836. PMID   10600390.
  18. Blom, Nikolaj; Sicheritz-Pontén, Thomas; Gupta, Ramneek; Gammeltoft, Steen; Brunak, Søren (2004-06-01). "Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence". Proteomics. 4 (6): 1633–1649. doi:10.1002/pmic.200300771. ISSN   1615-9853. PMID   15174133. S2CID   18810164.
  19. "PSORT II server - GenScript". www.genscript.com. Retrieved 2017-04-27.
  20. Shen, Hong-Bin; Chou, Kuo-Chen (2007-11-01). "Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM". Protein Engineering, Design and Selection. 20 (11): 561–567. doi:10.1093/protein/gzm057. ISSN   1741-0126. PMID   17993650.
  21. "Genomatix - NGS Data Analysis & Personalized Medicine". www.genomatix.de. Archived from the original on 24 February 2001. Retrieved 7 May 2017.
  22. "The Mfold Web Server | mfold.rit.albany.edu". unafold.rna.albany.edu. Retrieved 2017-05-07.
  23. "C19orf67 protein abundance in PaxDb". pax-db.org. Retrieved 2017-04-30.
  24. 1 2 EMBL-EBI. "EMBOSS Needle < Pairwise Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2017-03-02.
  25. 1 2 "TimeTree :: The Timescale of Life". www.timetree.org. Retrieved 2 March 2017.
  26. Marangi, Giuseppe; Orteschi, Daniela; Vigevano, Federico; Felie, Jillian; Walsh, Christopher A.; Manzini, M. Chiara; Neri, Giovanni (2012-04-01). "Expanding the spectrum of rearrangements involving chromosome 19: A mild phenotype associated with a 19p13.12–p13.13 deletion". American Journal of Medical Genetics Part A. 158A (4): 888–893. doi:10.1002/ajmg.a.35254. ISSN   1552-4833. PMC   3363957 . PMID   22419660.
  27. Miller, David T.; Adam, Margaret P.; Aradhya, Swaroop; Biesecker, Leslie G.; Brothman, Arthur R.; Carter, Nigel P.; Church, Deanna M.; Crolla, John A.; Eichler, Evan E. (2010-05-14). "Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies". American Journal of Human Genetics. 86 (5): 749–764. doi:10.1016/j.ajhg.2010.04.006. ISSN   1537-6605. PMC   2869000 . PMID   20466091.
  28. US 20140329704,Melton, Douglas A.&Hrvatin, Sinisa,"Markers for mature beta-cells and methods of using the same",published Nov 6, 2014