MGC50722

Last updated
MGC50722
Identifiers
Aliases
External IDs GeneCards:
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

n/a

n/a

RefSeq (protein)

n/a

n/a

Location (UCSC)n/an/a
PubMed searchn/an/a
Wikidata
View/Edit Human

MGC50722, also known as uncharacterized protein LOC399693, is a protein that in humans is encoded by the MGC50722 gene (Mammalian Gene Collection Project Gene 50722 [1] ). This 965 amino acid human protein has a molecular weight of 104.495 kDa and one domain of unknown function (DUF390). [2] Generally conserved across mammals, this quickly evolving gene shows relatively low expression in most human tissues except in the testis. [3] [4]

Contents

Gene

The entire human gene is 40,364 base pairs in length, while the unprocessed mRNA is 25,960 base pairs long. After splicing of introns the 10 exon gene has a final mRNA length of 3,596 base pairs that encodes for 965 amino acids. [2] [5] [6]

Locus

Human MGC50722 is located on the minus strand of chromosome 9 in the region q34 of the human genome (NCBI Gene ID: 399693). The most characterized gene in this region of the human genome is GPSM1 , which encodes the G-protein-signaling modulator 1 protein. [7]

Homology and evolution

Divergence of the human MGC50722 gene graphed against Fibrinogen and Cytochrome C divergence. Each data point on the graph represents a different species and that species homologous gene as identified through BLAST. BLAST searches were conducted using the human MGC50722, Fibrinogen and Cytochrome C gene and the percent identities were graphed against the actual divergence from humans for that species homologous gene. DOD Graph.png
Divergence of the human MGC50722 gene graphed against Fibrinogen and Cytochrome C divergence. Each data point on the graph represents a different species and that species homologous gene as identified through BLAST. BLAST searches were conducted using the human MGC50722, Fibrinogen and Cytochrome C gene and the percent identities were graphed against the actual divergence from humans for that species homologous gene.

Paralogs

It was found that the centrosome-associated protein 350 (CEP350) was the only possible paralog to protein MGC50722 in humans. CEP350 is a 3117 amino acid long protein and aligns with protein MGC50722 at its N-terminus. This indicates the paralog spacing is very distant for when MGC50722 split from CEP350.

Orthologs

Compete orthologs for protein MGC50722 are found only in mammals, where most conservation is found within the N-terminus and DUF390.

Distant homologs

The most distant homolog detectable is in cartilaginous fish (462.5 MYA).

Homologous domains

The domain of unknown function 390 (pfam04094: DUF390) is part of a family of proteins that have only been identified within the rice genome. Although this domain's function is unknown, it may be some kind of transposable element. [8]

Protein

Primary sequence and isoforms

Human protein MGC50722 is 104.495 kDa, with an isoelectric point of 10.24. A mixed charged cluster of amino acids is present between positions 146 and 182, which seems to be conserved in primates, but not present in other mammals. There are also 6 predicted isoforms found in human. [2]

Subcellualar localization signals

PSORTII servers Archived 2021-09-06 at the Wayback Machine predict 5 nuclear localization signals in the human protein MGC50722. When ortholog sequences to the human protein were run through PSORT II, the predicted nuclear subcellular localization was a consensus prediction.

Predicted nuclear localization signals in human protein MGC50722
Signal TypeResidue SpanAmino Acid Sequence
pat446-49RPRK
pat4148-151KPKR
pat743-49PQQRPRK
pat7149-155PKRVKSS
pat7302-308PSKRRLQ

Post-translational modifications

Human protein MGC50722 ortholog in mice, 4932418E24Rik protein, has experimentally determined phosphorylation sites at S588, S591, and S670 in the testis (pTestis ID: PT-MM-02686 Archived 2016-03-06 at the Wayback Machine ). [9] [10] [11] Prediction servers at ExPASy also predict more phosphorylation sites (NetPhos 2.0 Server), a N-terminal acetylation site (NetAcet 1.0 Server), glycation sites (NetGlycate 1.0 Server), and a GalNAc O-glycosylation site (NetOGlyc 4.0 Server) at conserved residues in the human MGC50722 protein.

Secondary structure

Prediction models characterized protein MGC50722 as mostly disordered, but two regions of coiled-coils.

Protein internal structure and features

FeatureResidue Span
Region of Low Complexity [6] 11-26
DUF390 [8] [12] 405-690
Region of Low Complexity [6] 410-423
Region of Low Complexity [6] 546-556
Coiled-Coil [6] 546-566
Region of Low Complexity [6] 606-621
Coiled-Coil [6] 720-753
Region of Low Complexity [6] 771-791
Region of Low Complexity [6] 871-884
Protein MGC50722 primary amino acid sequence and its internal features/structures. MGC50722 AASeq Features.png
Protein MGC50722 primary amino acid sequence and its internal features/structures.

Potential function

The function of protein MGC50722 is unknown. Given that it is preferentially expressed in the testis and appears to be subcellularly localized in the nucleus, it could play an important role in gamete cells.

Interacting proteins

Due to the recent identification of this gene and its protein, interaction databases (MINT, STRING, IntAct, and BioGRID) have not identified any interactions. More data would expand the characterization of MGC50722.

Expression

Expression levels of human MGC50722 appear to low/absent in most cell types, with the highest and most abundant expression shown to be in the testis (GEO Profile IDs: 48997768 and 49895282). [13] A lung cancer study also showed that MGC50722 was expressed in CD4+ T-Cells of normal human tissue samples. [14]

Promoter

The transcriptional start site for MGC50722 aligns best with SPZ1, SORY, SP1F, and FAST [15] transcription factor binding sites.

Clinical significance

A significant GEO Profile relating to MGC50722 was a study done on male fertility in humans looking at the disease teratozoospermia (GEO Profile ID: 38113951). [13] Teratozoospermia is a condition where during the development of mature sperm cells morphology is altered, thus leading to, in some cases, male infertility. [16] Gene expression shows that in normal human subjects MGC50722 is expressed, while in subjects with teratozoospermia expression levels drop significantly or shut off.

Related Research Articles

<span class="mw-page-title-main">NBEAL1</span> Protein-coding gene in the species Homo sapiens

NBEAL1 is a protein that in humans is encoded by the NBEAL1 gene. It is found on chromosome 2q33.2 of Homo sapiens.

<span class="mw-page-title-main">ABHD18</span> Protein-coding gene in the species Homo sapiens

ABHD18 is a protein that in Homo sapiens is encoded by the ABHD18 gene.

<span class="mw-page-title-main">C14orf80</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.

C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.

CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).

<span class="mw-page-title-main">C10orf67</span> Protein-coding gene in the species Homo sapiens

Chromosome 10 open reading frame 67 (C10orf67), also known as C10orf115, LINC01552, and BA215C7.4, is an un-characterized human protein-coding gene. Several studies indicate a possible link between genetic polymorphisms of this and several other genes to chronic inflammatory barrier diseases such as Crohn's Disease and sarcoidosis.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

<span class="mw-page-title-main">C12orf60</span> Protein-coding gene in humans

Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C16orf82</span> Protein-coding gene in the species Homo sapiens

C16orf82 is a protein that, in humans, is encoded by the C16orf82 gene. C16orf82 encodes a 2285 nucleotide mRNA transcript which is translated into a 154 amino acid protein using a non-AUG (CUG) start codon. The gene has been shown to be largely expressed in the testis, tibial nerve, and the pituitary gland, although expression has been seen throughout a majority of tissue types. The function of C16orf82 is not fully understood by the scientific community.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">LOC101059915</span> Protein-coding gene in the species Homo sapiens

LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">C12orf50</span> Protein-coding gene in humans

Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">Chromosome 5 open reading frame 47</span> Human C5ORF47 Gene

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

References

  1. Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, Collins FS, Wagner L, Shenmen CM, Schuler GD, Altschul SF, Zeeberg B, Buetow KH, Schaefer CF, Bhat NK, Hopkins RF, Jordan H, Moore T, Max SI, Wang J, Hsieh F, Diatchenko L, Marusina K, Farmer AA, Rubin GM, Hong L, Stapleton M, Soares MB, Bonaldo MF, Casavant TL, Scheetz TE, Brownstein MJ, Usdin TB, Toshiyuki S, Carninci P, Prange C, Raha SS, Loquellano NA, Peters GJ, Abramson RD, Mullahy SJ, Bosak SA, McEwan PJ, McKernan KJ, Malek JA, Gunaratne PH, Richards S, Worley KC, Hale S, Garcia AM, Gay LJ, Hulyk SW, Villalon DK, Muzny DM, Sodergren EJ, Lu X, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madan A, Young AC, Shevchenko Y, Bouffard GG, Blakesley RW, Touchman JW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Krzywinski MI, Skalska U, Smailus DE, Schnerch A, Schein JE, Jones SJ, Marra MA (Dec 2002). "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences". Proceedings of the National Academy of Sciences of the United States of America. 99 (26): 16899–16903. Bibcode:2002PNAS...9916899M. doi: 10.1073/pnas.242603899 . PMC   139241 . PMID   12477932.
  2. 1 2 3 "Homo sapiens uncharacterized protein LOC399693". NCBI Protein.
  3. "Uncharacterized MGC50722 (MGC50722)". NCBI UniGene.
  4. Tang; et al. (2007). "Characteristics of 292 Testis-Specific Genes in Human". Biological and Pharmaceutical Bulletin. 30 (5): 865–872. doi: 10.1248/bpb.30.865 . PMID   17473427.
  5. "Homo sapiens uncharacterized MGC50722 (MGC50722), transcript variant 1, mRNA". NCBI Nucleotide. 2018-09-24.
  6. 1 2 3 4 5 6 7 8 9 "Transcript: MGC50722-001 ENST00000569961 Protein Summary". Ensembl.
  7. "MGC50722 uncharacterized MGC50722 [ Homo sapiens (human) ]". NCBI Gene. Mar 2015.
  8. 1 2 "Conserved Protein Domain Family DUF390".
  9. Qi L, Liu Z, Wang J, Cui Y, Guo Y, Zhou T, Zhou Z, Guo X, Xue Y, Sha J (Dec 2014). "Systematic analysis of the phosphoproteome and kinase-substrate networks in the mouse testis". Molecular & Cellular Proteomics. 13 (12): 3626–38. doi: 10.1074/mcp.M114.039073 . PMC   4256510 . PMID   25293948.
  10. "4932418E24Rik".
  11. Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S, Rozado D, Magen A, Canidio E, Pagani M, Peluso I, Lin-Marq N, Koch M, Bilio M, Cantiello I, Verde R, De Masi C, Bianchi SA, Cicchini J, Perroud E, Mehmeti S, Dagand E, Schrinner S, Nürnberger A, Schmidt K, Metz K, Zwingmann C, Brieske N, Springer C, Hernandez AM, Herzog S, Grabbe F, Sieverding C, Fischer B, Schrader K, Brockmeyer M, Dettmer S, Helbig C, Alunni V, Battaini MA, Mura C, Henrichsen CN, Garcia-Lopez R, Echevarria D, Puelles E, Garcia-Calero E, Kruse S, Uhr M, Kauck C, Feng G, Milyaev N, Ong CK, Kumar L, Lam M, Semple CA, Gyenesei A, Mundlos S, Radelof U, Lehrach H, Sarmientos P, Reymond A, Davidson DR, Dollé P, Antonarakis SE, Yaspo ML, Martinez S, Baldock RA, Eichele G, Ballabio A (2011). "A high-resolution anatomical atlas of the transcriptome in the mouse embryo". PLOS Biology. 9 (1): e1000582. doi: 10.1371/journal.pbio.1000582 . PMC   3022534 . PMID   21267068.
  12. "MGC50722 uncharacterized MGC50722 [Homo sapiens (human)]". NCBI Gene.
  13. 1 2 Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A (Jan 2013). "NCBI GEO: archive for functional genomics data sets--update". Nucleic Acids Research. 41 (Database issue): D991–D995. doi:10.1093/nar/gks1193. PMC   3531084 . PMID   23193258.
  14. Ahn, Jung-Mo; et al. (Nov 2013). "Proteogenomic Analysis of Human Chromosome 9-Encoded Genes from Human Samples and Lung Cancer Tissues". Journal of Proteome Research. 13 (1): 137–146. doi:10.1021/pr400792p. PMC   3918476 . PMID   24274035.
  15. Yeo CY, Chen X, Whitman M (Sep 1999). "The role of FAST-1 and Smads in transcriptional regulation by activin during early Xenopus embryogenesis". The Journal of Biological Chemistry. 274 (37): 26584–90. doi: 10.1074/jbc.274.37.26584 . PMID   10473623.
  16. Machev N, Gosset P, Viville S (2005). "Chromosome abnormalities in sperm from infertile men with normal somatic karyotypes: teratozoospermia". Cytogenetic and Genome Research. 111 (3–4): 352–357. doi:10.1159/000086910. PMID   16192715. S2CID   36272097.

Suggested reading