MGC50722 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | |||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | GeneCards: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
MGC50722, also known as uncharacterized protein LOC399693, is a protein that in humans is encoded by the MGC50722 gene (Mammalian Gene Collection Project Gene 50722 [1] ). This 965 amino acid human protein has a molecular weight of 104.495 kDa and one domain of unknown function (DUF390). [2] Generally conserved across mammals, this quickly evolving gene shows relatively low expression in most human tissues except in the testis. [3] [4]
The entire human gene is 40,364 base pairs in length, while the unprocessed mRNA is 25,960 base pairs long. After splicing of introns the 10 exon gene has a final mRNA length of 3,596 base pairs that encodes for 965 amino acids. [2] [5] [6]
Human MGC50722 is located on the minus strand of chromosome 9 in the region q34 of the human genome (NCBI Gene ID: 399693). The most characterized gene in this region of the human genome is GPSM1 , which encodes the G-protein-signaling modulator 1 protein. [7]
It was found that the centrosome-associated protein 350 (CEP350) was the only possible paralog to protein MGC50722 in humans. CEP350 is a 3117 amino acid long protein and aligns with protein MGC50722 at its N-terminus. This indicates the paralog spacing is very distant for when MGC50722 split from CEP350.
Compete orthologs for protein MGC50722 are found only in mammals, where most conservation is found within the N-terminus and DUF390.
The most distant homolog detectable is in cartilaginous fish (462.5 MYA).
The domain of unknown function 390 (pfam04094: DUF390) is part of a family of proteins that have only been identified within the rice genome. Although this domain's function is unknown, it may be some kind of transposable element. [8]
Human protein MGC50722 is 104.495 kDa, with an isoelectric point of 10.24. A mixed charged cluster of amino acids is present between positions 146 and 182, which seems to be conserved in primates, but not present in other mammals. There are also 6 predicted isoforms found in human. [2]
PSORTII servers Archived 2021-09-06 at the Wayback Machine predict 5 nuclear localization signals in the human protein MGC50722. When ortholog sequences to the human protein were run through PSORT II, the predicted nuclear subcellular localization was a consensus prediction.
Signal Type | Residue Span | Amino Acid Sequence |
---|---|---|
pat4 | 46-49 | RPRK |
pat4 | 148-151 | KPKR |
pat7 | 43-49 | PQQRPRK |
pat7 | 149-155 | PKRVKSS |
pat7 | 302-308 | PSKRRLQ |
Human protein MGC50722 ortholog in mice, 4932418E24Rik protein, has experimentally determined phosphorylation sites at S588, S591, and S670 in the testis (pTestis ID: PT-MM-02686 Archived 2016-03-06 at the Wayback Machine ). [9] [10] [11] Prediction servers at ExPASy also predict more phosphorylation sites (NetPhos 2.0 Server), a N-terminal acetylation site (NetAcet 1.0 Server), glycation sites (NetGlycate 1.0 Server), and a GalNAc O-glycosylation site (NetOGlyc 4.0 Server) at conserved residues in the human MGC50722 protein.
Prediction models characterized protein MGC50722 as mostly disordered, but two regions of coiled-coils.
Feature | Residue Span |
---|---|
Region of Low Complexity [6] | 11-26 |
DUF390 [8] [12] | 405-690 |
Region of Low Complexity [6] | 410-423 |
Region of Low Complexity [6] | 546-556 |
Coiled-Coil [6] | 546-566 |
Region of Low Complexity [6] | 606-621 |
Coiled-Coil [6] | 720-753 |
Region of Low Complexity [6] | 771-791 |
Region of Low Complexity [6] | 871-884 |
The function of protein MGC50722 is unknown. Given that it is preferentially expressed in the testis and appears to be subcellularly localized in the nucleus, it could play an important role in gamete cells.
Due to the recent identification of this gene and its protein, interaction databases (MINT, STRING, IntAct, and BioGRID) have not identified any interactions. More data would expand the characterization of MGC50722.
Expression levels of human MGC50722 appear to low/absent in most cell types, with the highest and most abundant expression shown to be in the testis (GEO Profile IDs: 48997768 and 49895282). [13] A lung cancer study also showed that MGC50722 was expressed in CD4+ T-Cells of normal human tissue samples. [14]
The transcriptional start site for MGC50722 aligns best with SPZ1, SORY, SP1F, and FAST [15] transcription factor binding sites.
A significant GEO Profile relating to MGC50722 was a study done on male fertility in humans looking at the disease teratozoospermia (GEO Profile ID: 38113951). [13] Teratozoospermia is a condition where during the development of mature sperm cells morphology is altered, thus leading to, in some cases, male infertility. [16] Gene expression shows that in normal human subjects MGC50722 is expressed, while in subjects with teratozoospermia expression levels drop significantly or shut off.
NBEAL1 is a protein that in humans is encoded by the NBEAL1 gene. It is found on chromosome 2q33.2 of Homo sapiens.
ABHD18 is a protein that in Homo sapiens is encoded by the ABHD18 gene.
Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).
Chromosome 10 open reading frame 67 (C10orf67), also known as C10orf115, LINC01552, and BA215C7.4, is an un-characterized human protein-coding gene. Several studies indicate a possible link between genetic polymorphisms of this and several other genes to chronic inflammatory barrier diseases such as Crohn's Disease and sarcoidosis.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
C16orf82 is a protein that, in humans, is encoded by the C16orf82 gene. C16orf82 encodes a 2285 nucleotide mRNA transcript which is translated into a 154 amino acid protein using a non-AUG (CUG) start codon. The gene has been shown to be largely expressed in the testis, tibial nerve, and the pituitary gland, although expression has been seen throughout a majority of tissue types. The function of C16orf82 is not fully understood by the scientific community.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.