C22orf23 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C22orf23 , EVG1, dJ1039K5.6, chromosome 22 open reading frame 23 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1920774 HomoloGene: 13062 GeneCards: C22orf23 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C22orf23 (Chromosome 22 Open Reading Frame 23) is a protein which in humans is encoded by the C22orf23 gene. Its predicted secondary structure consists of alpha helices and disordered/coil regions. It is expressed in many tissues and highest in the testes and it is conserved across many orthologs.
C22orf23 is a gene found in homo sapiens. It is located on Chromosome 22 on the minus strand, map position 22q13.1. It spans 10,620 base pairs. [5] [6] Its mRNA transcript is 1988 base pars long and has 7 exons. [7] Its predicted function is protein binding, and molecular function. [5]
C22orf23's aliases are: UPF0193 Protein EVG1, DJ1039K5.6, EVG1 [8] FLJ32787, and LOC84645. [9]
The protein encoded by the mRNA sequence is 217 amino acids in length [6] and has a predicted molecular mass of 25 kDa. [8] [10] The predicted isoelectric point is 9.8. [11] It is located in the nucleus. [10]
It is predicted to be an intracellular protein [10] and does not have any predicted transmembrane domains. [10] [12] [13] Due to its location and lack of predicted transmembrane domains, the protein structure is likely a globular protein.
C22orf23 has many predicted post-translational modifications such as: phosphorylation sites, [15] cell attachment sequences, N-myristoylation sites, [16] O-linked glycosylation sites, [17] glycation sites, [18] Ac-ASQK cleaved-acetylated sites, and Sumoylation sites. [19] [20] Many of the predicted phosphorylation sites were also predicted to be O-linked glycosylation sites thus the phosphorylation site could be blocked altering that domain's structure or function.
The predicted secondary structure consists of alpha helices and disordered/coil regions. [14] [21] [22] [23] [24] The predicted secondary structure model has a 28% coverage of the amino acid sequence with a 42.9% confidence. [14]
There are currently no known paralogs to C22orf23. [25] [26] [27]
Orthologs can be found in most major groups of species ranging from most similar in primates to most distant in a member of phylum Chytridiomycota. This includes: mammals, reptiles, birds, amphibian, bony fish, cartilaginous fish, invertebrates, and fungi. Orthologs may have first appeared in plants or fungi however it is uncertain. [26]
This table lists several orthologs for C22orf23 and includes their species name, common name, taxonomic order, accession number, sequence length, sequence similarity, [26] and evolutionary date of divergence. [28]
Genus and Species | Common Name | Taxonomic Group (Order) | Date of Divergence (MYA) | Accession # | Sequence Length (aa) | Sequence Identity (%) |
---|---|---|---|---|---|---|
Homo sapiens | Human | Primate | 0 | AAH31998.1 | 217 | 100 |
Piliocolobus tephrosceles | Ugandan Red Colobus | Primate | 29 | XP_023077818 | 217 | 95 |
Propithecus coquereli | Coquerel's Sifaka | Primate | 74 | XP_012493592 | 217 | 84 |
Marmota marmota marmota | Alpine Marmot | Rodent | 90 | XP_015338208 | 217 | 73 |
Mus caroli | Ryukyu Mouse | Rodent | 90 | XP_021038824 | 216 | 81 |
Physeter catodon | Sperm Whale | Even-toed ungulates | 96 | XP_007111804 | 217 | 85 |
Odocoileus virginianus texanus | White Tailed Deer | Artiodactyla | 96 | XP_020752151 | 217 | 86 |
Panthera pardus | Leopard | Carnivores | 96 | XP_019302406 | 217 | 85 |
Rousettus aegyptiacus | Egyptian Fruit Bat | Bat | 96 | XP_016017249 | 217 | 83 |
Condylura cristata | Star-nosed Mole | Eulipotyphla | 96 | XP_004676507 | 217 | 80 |
Vombatus ursinus | Common Wombat | Diprotodontia | 159 | XP_027727589 | 263 | 61 |
Nothoprocta perdicaria | Chilean Tinamou | Tinamiformes | 312 | XP_025895660 | 234 | 61 |
Serinus canaria | Atlantic Canary | Passeriformes | 312 | XP_009084739 | 223 | 57 |
Notechis scutatus | Tiger Snake | Squamata | 312 | XP_026550684 | 234 | 56 |
Nanorana parkeri | High Himalaya Frog | Frog | 352 | XP_018428081 | 225 | 51 |
Salvelinus alpinus | Arctic char | Salmoniformes | 435 | XP_023998646 | 217 | 49 |
Rhincodon typus | Whale Shark | Carpet shark | 473 | XP_020370272 | 232 | 48 |
Callorhinchus milii | Australian ghostshark | Chimaera | 473 | NP_001279734 | 232 | 46 |
Apostichopus japonicus | Sea Cucumber | Synallactida | 684 | PIK47438 | 221 | 48 |
Crassostrea virginica | Eastern Oyster | Ostreoida | 797 | XP_022313321.1 | 224 | 43 |
Capitella teleta | Segmented Annelid Worm | Capitellidae | 797 | ELU02060 | 221 | 39 |
Megachile rotundata | Leafcutter Bee | Hymenopterans | 797 | XP_003702438 | 230 | 36 |
Stylophora pistillata | Hood Coral | Stony corals | 824 | XP_022780055 | 219 | 42 |
Pocillopora damicornis | Cauliflower Coral | Scleractinia | 824 | XP_027046963 | 220 | 42 |
Macrostomum lignano | Flatworm | Macrostomida | 824 | PAA47644 | 270 | 38 |
Trichoplax | Trichoplax | Tricoplaciformes | 948 | RDD45244 | 239 | 39 |
Spizellomyces punctatus | Fungi | Spizellomyces punctatus | 1105 | XP_016608264 [29] | 260 | 30 |
The core promoter is GXP_7541220 (-), and its coordinates are 37953445-37954669 and it is 1225 base pairs long. [30]
Protein expression is highest in the testes however it is also expressed at low levels in many other tissues such as: brain, kidney, stomach, skin, [31] thyroid, urinary bladder, placenta, endometrium, esophagus, and appendix, bone marrow, adipose, lung, [32] and ovary. [33]
Expression in orthologs Rattus norvegicus , is expressed primarily in the testes with low levels of expression in the: kidneys, lungs, heart, and uterus. [34] Mus musculus is expressed primarily in the adrenal and testes, and also notably expressed in the: bladder, abdomen, heart, lungs, ovaries, and mammary gland. [35]
There are several predicted protein interactions: Cyclin-D1-binding protein 1 which may regulate cell cycle progression, Vacuolar protein sorting-associated protein 28 homolog which is involved as a regulator of vesicular trafficking, UPF0739 protein C1orf74, and estrogen related receptor gamma. These interacting proteins were identified as either having direct interactions or physical associations. They were identified through a variety of detection methods including affinity chromatography, 2 hybrid prey pooling, and 2 hybrid array. [36] [37] [38] It also has predicted protein interactions with SH3 domain containing 19, EvC ciliary complex subunit 1, RIMS binding protein 3B, RIMS binding protein 3C,TSSK6-activating co-chaperone protein, V-set and immunoglobulin domain containing 8, family with sequence similarity 124 member B, small nucleolar RNA host gene 28, and transmembrane protein 200B. Evidence suggesting a functional link for these interactions were supported through Co-mention on PubMed. [36] [39]
C22orf23 was identified as belonging to one of two groups of pooled serum samples in a study that analyzed the difference between serum glycoproteins of hepatocellular carcinoma and that of normal serum. [40] Deletions of parts of C22orf23 (exons 3 and 4) and several other genes including SOX10 has been observed in patients with peripheral demyelinating neuropathy, central demyelinating leukodystrophy, Waardenburg Syndrome, and Hirschsprung disease and is therefore, suggested to be a potential factor involved in these ailments. [41] [42] C22orf23 was also mentioned in a study of mutation profiles from ER+ breast cancer samples taken from postmenopausal patience. There were mutations found that affected C22orf23 among many other genes. [43] In a study of epigenetic alterations involved in coronary artery disease, C22orf23 was found to have altered epigenetic modifications which could be involved in novel genes in Coronary artery disease. [44] In a study that attempts to predict imprinted genes that maybe linked to Human disorders, C22orf23 was identified as homologous of imprinted Gene candidates showing linkage to schizophrenia. [45] In another study it was listed as being a potently regulated protein in uterine leiomyoma. [46]
There are a total of 3340 SNPs within the 5’ and 3’ UTR, introns, exons, as well as some genes near the 5’ and 3’ UTR. There is a total of 225 SNPs within the coding sequence. Some of the SNPs occur in conserved amino acids within the coding sequence and some reported have one or more types of validation. Some of the SNPs have high heterozygosity scores and thus have a presence in the population. [47]
Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.
Transmembrane protein 251, also known as C14orf109 or UPF0694, is a protein that in humans is encoded by the TMEM251 gene. One notable feature of this protein is the presence of proline residues on one of its predicted transmembrane domains., which is a determinant of the intramitochondrial sorting of inner membrane proteins.
Chromosome 9 open reading frame 152 is a protein that in humans is encoded by the C9orf152 gene. The exact function of the protein is not completely understood.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein. In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
Transmembrane protein 179 is a protein that in humans is encoded by the TMEM179 gene. The function of transmembrane protein 179 is not yet well understood, but it is believed to have a function in the nervous system.
c7orf26 is a gene in humans that encodes a protein known as c7orf26. Based on properties of c7orf26 and its conservation over a long period of time, its suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
C1orf122 is a gene in the human genome that encodes the cytosolic protein ALAESM.. ALAESM is present in all tissue cells and highly up-regulated in the brain, spinal cord, adrenal gland and kidney. This gene can be expressed up to 2.5 times the average gene in its highly expressed tissues. Although the function of C1orf122 is unknown, it is predicted to be used for mitochondria localization.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.