C22orf31

Last updated
C22orf31
Identifiers
Aliases C22orf31 , HS747E2A, bK747E2.1, chromosome 22 open reading frame 31
External IDs HomoloGene: 81840 GeneCards: C22orf31
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_015370
NM_001386866

n/a

RefSeq (protein)

NP_056185

n/a

Location (UCSC) Chr 22: 29.06 – 29.06 Mb n/a
PubMed search [2] n/a
Wikidata
View/Edit Human

C22orf31 (chromosome 22, open reading frame 31) is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. [3] The protein has orthologs with high percent similarity in mammals. [4] The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.

Contents

Similar to many proteins, C22orf31 is found to be highly expressed in the testes. Analysis of in vivo mature oocytes has revealed increased levels of C22orf31 [5] while promoter analysis has identified transcription factors for C22orf31 that are active during myeloid cell differentiation. [6]

Gene

C22orf31 is located on the minus strand of chromosome 22 at 20q12.1. [7] The gene is 3,172 base pairs long and spans from chr22: 29,058,672 to 29,061,844. [8] C22orf31 contains 3 exons and is also known by the aliases BK747E2.1 and HS747E2A.

Transcript

There is one transcript of C22orf31. The mRNA sequence is 1,070 base pairs long and contains an upstream in-frame stop codon from nucleotide 122–124. [9]

Protein

Domain of unknown function (DUF4662) in Human C22orf31 protein. DUF4662 in Human C22orf31 protein.jpg
Domain of unknown function (DUF4662) in Human C22orf31 protein.

General properties

The protein encoded by C22orf31 is 290 amino acids in length with a predicted molecular mass of 33kDa. [10] The isoelectric point of the protein is 10, indicating that the pH of the protein is basic. The C22orf31 protein contains a domain of unknown function (DUF4662) from amino acid 2 – 263. [11] The secondary and tertiary structure of this protein is not well known.

Isoforms

C22orf31 has two protein isoforms. [12] A comparison of these isoforms is shown in the table below.

C22orf31 Isoforms
ProteinAccession #Size (AA)Features
C22orf31 [Homo sapiens] [13] NP_056185290DUF4662 (AA 2-263)
Uncharacterized protein C22orf31 isoform X1 [Homo sapiens] [14] XP_016884230249DUF4662 (AA 1-221)
Uncharacterized protein C22orf31 isoform X2 [Homo sapiens] [15] XP_005261548186DUF4662 (AA 40-158)

Composition

The protein derived from C22orf31 is considered somewhat rich in lysine and somewhat poor in phenylalanine compared to the composition of the average human protein. [16] There are no positive, negative, mixed, or uncharged segments in C22orf31. There are also no transmembrane components or signal peptides in the protein.

Regulation

Gene level regulation

Transcription factor binding sites

The C22orf31 promoter has many transcription factor binding sites. [6] C22orf31's transcription factors are commonly found in immortalized liver cancer cell lines (HepG2) and immortalized myelogenous leukemia cell lines (K562). [17] The presence of C/EBP epsilon suggests a role for C22orf31 in myeloid cell differentiation. The presence of ARNT, which is typically associated with hypoxia-inducible factor 1 alpha, suggests a role for C22orf31 in the formation of acute myeloblastic leukemia. [18]

Expression

C22orf31 has been found to have moderate expression in the testes and low amounts of expression in the brain and ovaries. [19] The protein is also expressed in fetal tissue as well as adult tissues. C22orf31 has been seen to have increased conditional expression in vivo matured oocytes in comparison to metaphase II oocytes. [5]

Transcript level regulation

There are no microRNA binding sites found in C22orf31. [20] Three functionally important stem loops are predicted in both the 3' UTR and 5' UTR of C22orf31. [21]

Protein Level Regulation

Conceptual translation of C22orf31 including post-translational modifications, domains, and other features. C22orf31 conceptual translation including post-translational modifications, domains, and features.pdf
Conceptual translation of C22orf31 including post-translational modifications, domains, and other features.

C22orf31 is predicted to undergo several types of post-translational modifications. With a high degree of certainty, it is predicted that C22orf31 undergoes O-glycosylation, [22] glycation, [23] phosphorylation, [24] and O-GlcNAcylation. [25] Only two phosphorylation sites are located in highly conserved regions of the protein. These modifications can be seen in the conceptual translation on the right.

Homology/evolution

Paralogs

No human paralogs for C22orf31 have been identified. [26]

Orthologs

Orthologs of the C22orf31 protein exist predominantly in mammals. [4] However, the most distant orthologs are found in bony fish, with no orthologs being identified in amphibians or birds. Some of the major taxon groups that C22orf31 orthologs belong to include: bovidae, eulipotyphyla, cetacea, diprotodontia, vertebrata, and rodentia.

A list of 20 C22orf31 orthologs can be seen below, organized first by ascending date of divergence and second by descending percent identity with human C22orf31.

C22orf31 Orthologs
Genus speciesCommon NameTaxonDate of Divergence  (MYA) [27] Accession # [4] Length (AA) [4] % identity w/ human [4] % similarity w/ human
Homo sapiens Human Homonidae 0NP_056185.1290100100
Miniopterus natalensis Natal Long-fingered Bat Chiroptera 94XP_016054130.130178.4582.1
Physeter catodon Sperm whale Cetacea94XP_023976708.130775.6878.8
Bison bison bison Bison Bovidae 94XP_010827019.12927579.5
Mustela putorius furo Domestic ferret Mustelidae 94XP_012918895.139573.3160.4
Ovis aries Sheep Bovidae94XP_027836065.131573.272.7
Suricata suricatta Meerkat Carnivora 94XP_029777390.129672.3981.1
Manis javanica Malayan pangolin Manidae 94XP_017520770.130272.378.2
Lagenorhynchus obliquidens Pacific white-sided dolphin Cetacea94XP_026981083.130771.1476
Orcinus orca Killer whale Cetacea94XP_004283847.127168.6272.6
Globicephala melas Long-finned pilot whale Cetacea94XP_030715704.128768.2874.1
Neophocaena asiaeorientalis Yangtze finless porpoise Cetacea94XP_024623713.132466.0470.2
Sorex araneus European shrew Eulipotyphla94XP_004615674.132564.1163.1
Condylura cristata Star-nosed mole Rodentia94XP_004690724.134762.5459.2
Loxodonta africana African bush elephant Paenungulates 102XP_023415096.153678.5246.6
Chrysochloris asiatica Cape golden mole Rodentia102XP_006869362.146077.753.9
Dasypus novemcinctus Nine-banded armadillo Xenarthrans 102XP_023445504.130575.4479
Echinops telfairi Small Madagascar hedgehog Eulipotyphla102XP_012863338.230068.0173.4
Phascolarctos cinereus Koala Diprotodontia160XP_020852397.130249.1960.8
Vombatus ursinus Common wombat Diprotodontia160XP_027718888.137848.8748.8
Myripristis murdjan Pinecone soldierfish Vertebrata433XP_029922652.118448.9827
Cottoperca gobio Cottoperca Vertebrata433XP_029301846.117134.0422.4
Astyanax mexicanus Mexican tetra Vertebrata433XP_022533372.120826.3626.3

Divergence

Corrected percent divergence of protein orthologs from C22orf31, cytochrome c, and fibrinogen alpha chain over time. Divergence of the proteins C22orf31, cytochrome c, and fibrinogen alpha chain over time.jpg
Corrected percent divergence of protein orthologs from C22orf31, cytochrome c, and fibrinogen alpha chain over time.

When compared to other proteins, namely fibrinogen alpha chain and cytochrome c, C22orf31 is a moderately evolving protein. This was determined by calculating the corrected percent divergence, using molecular clock equations, [28] of different orthologs for each protein in comparison to their date of divergence. A physical representation of this information can be seen in the divergence graph on the right.

Interacting Proteins

C22orf31 interacts physically with 3 different proteins, according to the BioGRID, [29] Mentha, [30] and IntAct [31] protein interaction browsers. In particular, C22orf31 interacts with two histone deacetylases (HDAC1 and HDAC2) and the protein Lacritin (LACRT). These interactions were determined using high-throughput affinity-purification mass spectrometry [32] [33] A biochemical association has also been determined through protein microarray between C22orf31 and F-box protein 7 (FBOX7). [29] All of these proteins, with additional information, are shown in the table below.

C22orf31 Interacting Proteins [29]
Protein NameAbbreviationInteraction TypeScoreInteraction Detection Method
Histone deacetylase 1HDAC1Physical association0.9017 Affinity chromatography
Histone deacetylase 2HDAC2Physical association0.9213Affinity chromatography
LacritinLACRTPhysical association0.9886Affinity chromatography
F-box protein 7FBOX7Biochemical association- Protein microarray

The score for each protein in the table refers to the level of confidence of the prediction protein interaction with C22orf31 on a scale from 0–1, 1 being more confident.

Clinical significance

Pathology

Increased in vivo expression of C22orf31 in mature oocytes suggests that the gene plays a role in oocyte development. [34]

Disease

The predicted transcription factor binding sites of C22orf31 could possibly suggest a role for the gene in myeloid cell differentiation and the formation of acute myeloblastic leukemia. [6] [18]

Related Research Articles

<span class="mw-page-title-main">YIF1A</span> Protein-coding gene in the species Homo sapiens

Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.

<span class="mw-page-title-main">Protein FAM46B</span> Protein-coding gene in the species Homo sapiens

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">EVI5L</span> Protein-coding gene in the species Homo sapiens

EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">C2orf73</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

<span class="mw-page-title-main">TMEM171</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.

FAM71E2, also known as Family With Sequence Similarity 71 Member E2, is a protein that, in humans, is encoded by the FAM71E2 gene. Aliases include C19orf16, Protein FAM71E2, Chromosome 19 open reading frame 16, and Putative Protein FAM71E2. The gene is primarily conserved in mammals, but it is also conserved in two reptile species.

<span class="mw-page-title-main">TMEM128</span>

TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">SAAL1</span> Protein-coding gene in the species Homo sapiens

Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.

Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

<span class="mw-page-title-main">PANO1</span> Mammalian protein found in Homo sapiens

PANO1 is a protein which in humans is encoded by the PANO1 gene. PANO1 is an apoptosis inducing protein that is able to regulate the function of tumor suppressor. More specifically, P14ARF is a protein in which in humans is modulated by the PANO1 gene. P14ARF is known to function as a tumor suppressor. When PANO1 is highly expressed in the cells, it is able to modulate p14ARF by stabilizing it and protecting it from degradation. With a confidence level of 5 out of 5, PANO1 has been theorized to be expressed in the nucleolus of the cell. PANO1 is an intron-less gene. Intron-less genes only make up about 3% of the human genome. A functional analysis of these types of genes revealed that they often have tissue-specific expression in tissues such as the nervous system and testis. This kind of expression is commonly associated with neuropathies, disease, and cancer. The tissue types that PANO1 has the highest expression in, are the cerebellum regions of the brain as well as pituitary and testis tissues.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000100249 - Ensembl, May 2017
  2. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  3. "NCBI".
  4. 1 2 3 4 5 "NCBI Blastp".
  5. 1 2 "NCBI GEO Profile for record GDS3256, C22orf31". NCBI GEO.
  6. 1 2 3 "Genomatix MatInspector transcription factor binding sites of C22orf31". Genomatix.[ permanent dead link ]
  7. "NCBI Gene results for human C22orf31". NCBI Nucleotide.
  8. "C22orf31 GeneCards Entry".
  9. "NCBI Nucleotide results for C22orf31". 2 September 2020.
  10. "ExPasy compute pI/Mw tool". ExPasy.
  11. "MotifFinder results for C22orf31 protein". MotifFinder.
  12. "NCBI protein search for C22orf31 isoforms".
  13. "NCBI protein entry for Human C22orf31".
  14. "NCBI protein entry for uncharacterized protein C22orf31 isoform X1 [Homo sapiens]".
  15. "NCBI protein entry for uncharacterized protein C22orf31 isoform X2 [Homo sapiens]".
  16. "SAPs compositional analysis tool result for C22orf31 protein". SAPs compositional analysis.
  17. "UCSC Genome browser results for C22orf31 protein". UCSC Genome Browser.
  18. 1 2 Kallio PJ, Pongratz I, Gradin K, McGuire J, Poellinger L (May 1997). "Activation of hypoxia-inducible factor 1alpha: posttranscriptional regulation and conformational change by recruitment of the Arnt transcription factor". Proceedings of the National Academy of Sciences of the United States of America. 94 (11): 5667–72. Bibcode:1997PNAS...94.5667K. doi: 10.1073/pnas.94.11.5667 . PMC   20836 . PMID   9159130.
  19. "Human Protein Atlas page on C22orf31". Human Protein Atlas.
  20. "miRDB microRNA prediction for C22orf31".
  21. "quickFold Web Server".
  22. "NetOGlyc mucin type GalNAc O-glycosylation site prediction for C22orf31 protein".
  23. "NetGlycate glycation site predictor for C22orf31 protein".
  24. "NetPhos phosphorylation prediction for C22orf31 protein".
  25. "YinOYang prediction for C22orf31 protein".
  26. "NCBI BLASTp of Human C22orf31". NCBI Blastp.
  27. "Time Tree: The Timescale of Life".
  28. Ho S (2008). "The molecular clock and estimating species divergence". Nature Education. 1 (1): 168.
  29. 1 2 3 "BioGRID protein interaction browser results for C22orf31 protein".
  30. "Mentha interactome browser results for C22orf31 protein".
  31. "IntAct protein interaction browser results for C22orf31 protein".
  32. Huttlin EL, Ting L, Bruckner RJ, Gebreab F, Gygi MP, Szpyt J, et al. (July 2015). "The BioPlex Network: A Systematic Exploration of the Human Interactome". Cell. 162 (2): 425–440. doi:10.1016/j.cell.2015.06.043. PMC   4617211 . PMID   26186194.
  33. Huttlin EL, Bruckner RJ, Paulo JA, Cannon JR, Ting L, Baltier K, et al. (May 2017). "Architecture of the human interactome defines protein communities and disease networks". Nature. 545 (7655): 505–509. Bibcode:2017Natur.545..505H. doi:10.1038/nature22366. PMC   5531611 . PMID   28514442.
  34. Gonzalez-Muñoz E (2014). "Histone chaperone ASF1A is required for maintenance of pluripotency and cellular reprogramming". Science. 345 (6198): 822–825. Bibcode:2014Sci...345..822G. doi: 10.1126/science.1254745 . PMID   25035411. S2CID   34666170.