C1orf141

Last updated

Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. [1] It is a precursor protein that becomes active after cleavage. [2] The function is not yet well understood, but it is suggested to be active during development [3]

Contents

Gene

Locus

This gene is located on chromosome 1 at position 1p31.3. It is encoded on the antisense strand of DNA spanning from 67,092,176 to 67,141,646 and has 10 total exons. It overlaps slightly with the gene IL23R being encoded on the sense strand. [1]

Chromosome 1 spanning from 66,924,895 to 67,267,726. Chromosome 1 (66924895 to 67267726).gif
Chromosome 1 spanning from 66,924,895 to 67,267,726.

Transcription regulation

A specific promoter region has not been predicted for C1orf141 so the 1000 base pairs upstream of the start of transcription was analyzed for transcription factor binding sites. [4] The transcription factors below represent a subset of the transcription factor binding sites found within this region that give an idea of the kind of factors that could bind to the promoter [4]

mRNA

Alternative Splicing

The C1orf141 gene appears to have two common isoforms and seven less common transcript variants. [1]

C1orf141 Isoforms
NamemRNA Length (base pairs)Protein Length (amino acids)
C1orf141 Isoform 12177400
C1orf141 Isoform 22203217
C1orf141 Isoform X12348471
C1orf141 Isoform X22265458
C1orf141 Isoform X31875333
C1orf141 Isoform X4920243
C1orf141 Isoform X5612154
C1orf141 Isoform X6639146
C1orf141 Isoform X7514138

Protein

The primary encoded precursor protein (C1orf141 Isoform 1) consists of 400 amino acid residues and is 2177 base pairs long. It consists of 7 exons and a domain of unknown function DUF4545. [5] Its predicted molecular mass is 54.4 kDa and predicted isoelectric point is 9.63. [6]

Composition

Conceptual Translation of C1orf141 that shows predicted Post-translational modifications. C1orf141 Conceptual Tranlastion.png
Conceptual Translation of C1orf141 that shows predicted Post-translational modifications.

The C1orf141 precursor protein has more lysine amino acid residues and less glycine amino acid residues than expected when compared to other human proteins. The sequence has 11.7% lysine and only 2.1% glycine. [6]

Post-translational modifications

C1orf141 is modified post translation to form a mature protein product. It undergoes O-linked glycosylation, sumoylation, glycation, and phosphorylation. [7] [8] [9] [10] One N-terminal cleavage occurs followed by acetylation. Propeptide cleavage occurs at the start site of the final exon. [2]

Model of the Tertiary Structure for precursor human protein C1orf141 as predicted by I-TASSER. C1orf141 Tertiary Structure Model.jpg
Model of the Tertiary Structure for precursor human protein C1orf141 as predicted by I-TASSER.

Structure

The secondary structure for uncleaved C1orf141 consists primarily of alpha helices with a few small segments of beta sheets. These helices can be seen in the model of the tertiary structure predicted by the I-TASSER program. [11] The program Phyre2 also predicts the protein to be made up primarily of alpha helices. [12] After propeptide cleavage of C1orf141, I-TASSER predicts that only alpha helices remain.

Interactions

There are currently no experimentally confirmed interactions for C1orf141. The STRING database for protein interactions identified ten potential proteins that interact with C1orf141 through text mining. [13] These include SALT1, C8orf74, SHCBP1L, ACTL9, RBM44, CCDC116, ADO, WDR78, ZNF365, SPATA45. [14] [15] [16] [17] Through investigation of the papers where these interaction predictions were found, a solid link was not clear for any of the identified proteins.

Expression

Expression data for C1orf141 from HPA RNA-Seq normal tissues project. C1orf141 Expression.png
Expression data for C1orf141 from HPA RNA-Seq normal tissues project.

C1orf141 is expressed in 30 different tissues but primarily in the testes. [1] Other tissues where expression is above baseline levels are the brain, lungs, and ovaries. [3]

Localization

The subcellular localization for C1orf141 is predicted to be in the nucleus. There are two nuclear localization signals within the protein sequence, one of which stays present after propeptide cleavage. [18]

Function

The function of C1orf141 is not yet fully understood and has not been experimentally confirmed. However, expression data shows that the protein is active in some developmental stages. RNA-Seq data taken at different stages of development show expression at varying levels throughout. [3] Expression rates are seen at higher levels in the fetal developmental stage than the adult in the protein's ETS profile. [19] Microarray data for cumulus cells during natural and stimulated in vitro fertilization show relatively high levels of expression. [20] There is no significant change in expression in adult tissue disease states. [19]

Homology

Paralogs

There are no paralogs for C1orf141 [21]

Orthologs

Orthologous sequences are seen primarily in other mammalian species. The most distant ortholog identified through a NCBI BLAST search is a Reptilian species, but that is the only non-mammalian species. [21] This list contains a subset of the species identified as orthologs to display the diversity of the species where orthologs can be found. Each species was compared to the human C1orf141 isoform that includes each coding exon, isoform X1. [1]

C1orf141 Orthologs
Genus and SpeciesCommon NameTaxonomic GroupAccession NumberDate of Divergence (millions of years)Sequence Length (amino acids)Sequence IdentitySequence Similarity
Homo sapiens Human Primate XP_011539768.10471100%100%
Gorilla gorilla gorilla Western Lowland GorillaPrimateXP_018892062.18.6146997%98%
Otolemur garnettii Northern Greater GalagoPrimateXP_023365656.18445759%70%
Tupaia chinensis Northern Treeshrew Scandentia XP_006171456.18846862%74%
Oryctolagus

cuniculus

European Rabbit Lagomorpha XP_017201685.18847056%68%
Fukomys damarensis Damaraland Mole Rat Rodentia XP_010603404.18847954%66%
Chinchilla lanigera Long-tailed ChincillaRodentiaXP_013369940.19447650%65%
Ochotona princeps American PikaLagomorphaXP_012783463.19445050%67%
Miniopterus natalensis Natal long-fingered bat Chiroptera XP_016064273.19439063%72%
Panthera pardus Leopard Carnivora XP_019304485.19445062%74%
Enhydra lutris kenyoni Sea OtterCarnivoraXP_022351992.19445162%74%
Balaenoptera acutorostrata scammoni Minke Whale Cetacea XP_007164359.19443260%60%
Delphinapterus leucas Beluga WhaleCetaceaXP_022436606.19443259%72%
Sus scrofa Wild Boar Cetartiodactyla XP_005656203.19444256%70%
Pteropus vampyrus Large Flying FoxChiropteraXP_011367916.19447056%68%
Ovis aries SheepCetartiodactylaXP_012026840.19443155%69%
Bos taurus CattleCetartiodactylaNP_001070559.19443054%69%
Condylura cristata Star-nosed Mole Eulipotyphla XP_012577585.19443252%64%
Desmodus rotundus Common Vampire BatChiropteraXP_024421106.19439848%59%
Sarcophilus harrisii Tasmanian Devil Marsupiala XP_012405605.116035643%63%
Phascolarctos cinereus KoalaMarsupialaXP_020848724.116020429%50%
Monodelphis domestica Gray Short-tailed OpossumMarsupialaXP_007480481.116052425%48%
Pogona vitticeps Central Bearded Dragon Reptilia XP_020661721.132050128%54%

Evolutionary History

Using the Molecular Clock Hypothesis, the m value (the number of corrected amino acid changes per 100 residues) was calculated for C1orf141 and plotted against the divergence of species. When compared to the same m value plot for hemoglobin, fibrinogen alpha chain, and cytochrome c, it is clear that the C1orf141 gene is evolving at a faster rate than all three.

Related Research Articles

<span class="mw-page-title-main">YIF1A</span> Protein-coding gene in the species Homo sapiens

Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.

LOC105377021 is a protein which in humans is encoded by the LOC105377021 gene. LOC105377021 exhibits expressional pathology related to breast cancer, specifically triple negative breast cancer. LOC105377021 contains a serine rich region in addition to predicted alpha helix motifs.

Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.

Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.

<span class="mw-page-title-main">TMCO4</span> Protein-coding gene in the species Homo sapiens

Transmembrane and coiled-coil domains 4, TMCO4, is a protein in humans that is encoded by the TMCO4 gene. Currently, its function is not well defined. It is transmembrane protein that is predicted to cross the endoplasmic reticulum membrane three times. TMCO4 interacts with other proteins known to play a role in cancer development, hinting at a possible role in the disease of cancer.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">C16orf86</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">C1orf185</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">KRBA1</span> Protein-coding gene in the species Homo sapiens

KRBA1 is a protein that in humans is encoded by the KRBA1 gene. It is located on the plus strand of chromosome 7 from 149,411,872 to 149,431,664. It is also commonly known under two other aliases: KIAA1862 and KRAB A Domain Containing 1 gene and encodes the KRBA1 protein in humans. The KRBA family of genes is understood to encode different transcriptional repressor proteins

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">Fam89A</span> Human protein and gene

ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.

<span class="mw-page-title-main">LSMEM2</span> Protein-coding gene in the species Homo sapiens

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

References

  1. 1 2 3 4 5 6 "C1orf141 chromosome 1 open reading frame 141 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  2. 1 2 "ProP 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-05-03.
  3. 1 2 3 4 "C1orf141 Gene Expression - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  4. 1 2 "Genomatix: Gene2Promoter Subtasks". www.genomatix.de. Retrieved 2019-05-03.
  5. "C1orf141 Gene (Protein Coding)". www.genecards.org. Retrieved 2019-05-03.
  6. 1 2 "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-05-03.
  7. "NetOGlyc 4.0 Server". www.cbs.dtu.dk. Retrieved 2019-05-05.
  8. "SUMOplot™ Analysis Program | Abgent". www.abgent.com. Retrieved 2019-05-05.
  9. "NetGlycate 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-05-05.
  10. "NetPhos 3.1 Server". www.cbs.dtu.dk. Retrieved 2019-05-05.
  11. 1 2 "I-TASSER results". zhanglab.ccmb.med.umich.edu. Retrieved 2019-05-03.
  12. "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2019-05-03.
  13. "C1orf141 protein (human) - STRING interaction network". string-db.org. Retrieved 2019-05-03.
  14. Sammut, Stephen J.; Feichtinger, Julia; Stuart, Nicholas; Wakeman, Jane A.; Larcombe, Lee; McFarlane, Ramsay J. (2014-05-06). "A novel cohort of cancer-testis biomarker genes revealed through meta-analysis of clinical data sets". Oncoscience. 1 (5): 349–359. doi:10.18632/oncoscience.37. ISSN   2331-4737. PMC   4278308 . PMID   25594029.
  15. Swami, Meera (2014). "Genome-wide association study identifies three new melanoma susceptibility loci". Nature Medicine. 17 (11): 1357. doi:10.1038/nm.2568. hdl: 2445/128818 . ISSN   1078-8956. S2CID   42251944.
  16. Lu, Weining; Quintero-Rivera, Fabiola; Fan, Yanli; Alkuraya, Fowzan S.; Donovan, Diana J.; Xi, Qiongchao; Turbe-Doan, Annick; Li, Qing-Gang; Campbell, Craig G. (2007). "NFIA Haploinsufficiency Is Associated with a CNS Malformation Syndrome and Urinary Tract Defects". PLOS Genetics. 3 (5): e80. doi: 10.1371/journal.pgen.0030080 . ISSN   1553-7390. PMC   1877820 . PMID   17530927.
  17. Yao, Fang; Zhang, Chi; Du, Wei; Liu, Chao; Xu, Ying (2015-09-16). "Identification of Gene-Expression Signatures and Protein Markers for Breast Cancer Grading and Staging". PLOS ONE. 10 (9): e0138213. Bibcode:2015PLoSO..1038213Y. doi: 10.1371/journal.pone.0138213 . ISSN   1932-6203. PMC   4573873 . PMID   26375396.
  18. "Welcome to psort.org!!". www.psort.org. Retrieved 2019-05-03.
  19. 1 2 "EST Profile - Hs.666621". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  20. "Modified natural and stimulated in vitro fertilization cycles: cumulus cells - - GEO DataSets - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  21. 1 2 "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2019-05-03.