GPATCH2L (G-Patch Domain Containing 2 Like) is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. [5] In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. [6] GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns (27 gt-ag, 1 gc-ag), 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. [7] It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
There are 23 different transcript variants in GPATCH2L Homo sapiens. The most common is transcript variant 1 and each transcript variant uses different exons.
Name | Accession Number | # of Exons | Size (bp) |
---|---|---|---|
Transcript Variant 1 | NM_017926 | 10 | 14,021 |
Transcript Variant 2 | NM_017972 | 9 | 12,396 |
Transcript Variant 3, non-coding RNA | NR_110314 | 10 | 12,511 |
Transcript Variant 4 | NM_001322026 | 7 | 3,263 |
Transcript Variant 5 | NM_001322027 | 4 | 4,485 |
Transcript Variant 6 | NM_001322028 | 10 | 1,797 |
Transcript Variant 7 | NM_001322029 | 9 | 1,825 |
Transcript Variant 8 | NM_001322030 | 10 | 3,232 |
Transcript Variant 9 | NM_001322031 | 3 | 5,278 |
Transcript Variant 10 | NM_001322032 | 7 | 3,014 |
Transcript Variant X1 | XM_017021427 | 11 | 14,388 |
Transcript Variant X2 | XM_017021428 | 11 | 14,373 |
Transcript Variant X3 | XM_017021429 | 11 | 3,472 |
Transcript Variant X4 | XM_006720191 | 10 | 14,146 |
Transcript Variant X5 | XM_017021430 | 11 | 3,457 |
Transcript Variant X6 | XM_017021431 | 11 | 2,927 |
Transcript Variant X7 | XM_017021432 | 11 | 2,924 |
Transcript Variant X8 | XM_017021433 | 10 | 2,685 |
Transcript Variant X9 | XR_001750414 | 10 | 14,302 |
Transcript Variant X10 | XR_001750415 | 10 | 2,841 |
Transcript Variant X11 | XR_001750416 | 10 | 3,386 |
Transcript Variant X12 | XR_001750417 | 12 | 1,758 |
Transcript Variant X13 | XR_001750418 | 12 | 14,488 |
The GPATCH2L human protein (NP_060396) has a molecular weight of 54,260 Da and consists of 482 amino acids with a predicted isoelectric point of 8.77. [8] It has 17 different isoforms and the most common is isoform 1. Every human GPATCH2L isoform has a GPATCH2L domain, but no other significant smaller repeats were found as be seen in the schematic illustration below. Also, human GPATCH2L protein in prostate tissue reveals distinct positivity in glandular cells, according to immunohistochemical staining of human prostate GPATCH2L Antibody (HPA018856) in IHC from The Human Protein Atlas. [9]
Name | Accession Number | Size (aa) |
---|---|---|
Isoform 1 | NP_060396 | 482 |
Isoform 2 | NP_060442 | 477 |
Isoform 4 | NP_001308955 | 434 |
Isoform 5 | NP_001308956 | 304 |
Isoform 6 | NP_001308957 | 447 |
Isoform 7 | NP_001308958 | 446 |
Isoform 8 | NP_001308959 | 469 |
Isoform 9 | NP_001308960 | 271 |
Isoform 10 | NP_001308961 | 402 |
Isoform X1 | XP_016876916 | 495 |
Isoform X2 | XP_016876917 | 490 |
Isoform X3 | XP_016876918 | 482 |
Isoform X4 | XP_006720254 | 477 |
Isoform X5 | XP_016876919 | 477 |
Isoform X6 | XP_016876920 | 460 |
Isoform X7 | XP_016876921 | 459 |
Isoform X8 | XP_016876922 | 442 |
Two figures show the predicted tertiary structure of GPATCH2L human protein from AlphaFold. [11] Four Alpha helix bundles and two Beta sheets are observable and these are annotated on conceptual translation.
GPATCH2L human protein is known to interact with KRR1, DDX10, and NOL6 within the nucleolus and nucleus.
Name | Full Name | Function | Cell’s Compartment | Experimental Validation | String-db Score |
---|---|---|---|---|---|
KRR1 | KRR1 small subunit processome component homolog | 1) Nucleolar protein required for rRNA synthesis and ribosomal assembly. 2) it enables RNA and protein binding. 3) it is required for 40S ribosome biogenesis in the nucleolus. | Nucleolus | Experiments: 1) Detected by two-hybrid array assay. 2) Detected by affinity chromatography technology assay. 3) Detected by inferred by author assay. 4) Detected by tandem affinity purification assay. | 0.650 |
DDX10 | Probable ATP-dependent RNA helicase DDX10 | 1) it promotes AIM2-inflammasome activation by maintaining AIM2 protein stability. 2) it promotes human lung carcinoma proliferation by U3 small nucleolar ribonucleoprotein IMP4 | Nucleus | Experiments: 1) Detected by two-hybrid array assay. 2) Detected by inferred by author assay. 3) Detected by tandem affinity purification assay. | 0.548 |
NOL6 | Nucleolar protein 6 | A nucleolar RNA-associated protein; 1) it is related to ribosome biogenesis in endometrial cancer. 2) it promotes the proliferation and migration of endometrial cancer cells by regulating TWIST1 expression. | Nucleus | Experiments: 1) Detected by two-hybrid array assay. 2) Detected by inferred by author assay. 3) Detected by tandem affinity purification assay. | 0.527 |
GPATCH2L human gene has a promoter [GXP_207451] located in ch14:76150912 - 76151972. [13] The length of the promoter is 1061 bp.
In the below table, 5 transcription factors [KLFS, HOMF, SP1F, ZF02, and NFKB] are predicted to bind within a conserved section of the transcriptional regulatory region. Unlike 19 transcription factors in the table, MAZF is only specifically active in cartilage and skeleton tissues. [13] Also, PLAG is only specifically active in bone marrow cells, digestive system, embryonic structures, endocrine system, germ cells, and hematopoietic system. Most transcription factors are active in the ovary, lung, brain, prostate, bone marrow cells, which show the highest values in RNA-seq data from the Gene database record at NCBI. [6]
Element | Description/Full Name | The Best Matrix Score | The Number of Binding Sites in The Region |
---|---|---|---|
AP1F | AP1, Activating protein 1 | 0.903 | 1 |
NR2F | Nuclear receptor subfamily 2 factors | 0.826 | 5 |
LHXF | Lim homeodomain factors | 0.906 | 3 |
STAT | Signal transducer and activator of transcription | 0.896 | 2 |
HIFF | Hypoxia inducible factor, bHLH/PAS protein family | 0.989 | 5 |
HESF | Vertebrate homologues of enhancer of split complex | 0.985 | 2 |
KLFS | Krueppel like transcription factors | 0.912 | 13 |
GLIF | GLI zinc finger family | 0.914 | 5 |
PLAG | Pleomorphic adenoma gene | 0.845 | 4 |
SP1F | GC-Box factors SP1/GC | 0.855 | 9 |
EGRF | EGR/nerve growth factor-induced protein C & related factors | 0.930 | 4 |
HOMF | Homeodomain transcription factors | 0.980 | 5 |
CAAT | CCAAT binding factors | 0.926 | 1 |
AP2F | Activator protein 2 | 0.917 | 1 |
ZF02 | C2H2 zinc finger transcription factors 2 | 0.932 | 3 |
RXRF | RXR heterodimer binding sites | 0.850 | 9 |
SMAD | Vertebrate SMAD family of transcription factors | 0.994 | 2 |
CREB | cAMP-responsive element binding proteins | 0.844 | 6 |
NFKB | Nuclear factor kappa B/c-rel | 0.928 | 4 |
MAZF | Myc associated zinc fingers | 1.000 | 3 |
RNA-seq was performed on tissue samples from 95 human individuals representing 27 different tissues to identify tissue-specificity protein-coding genes at NCBI. [6] RNA-seq data shows high expression within the bone marrow, testis, and brain tissue in GPATCH2L human mRNA. Tissues with low expression are the pancreas, liver, and salivary glands.
Significantly different gene expressions in tissues are shown in a microarray-assessed tissue expression pattern (GDS596) in GPATCH2L Homo sapiens from NCBI GEO. [14] The high gene expressions in cerebellum, fetal brain, bone marrow, ovary, prostate, and lung tissues in RNA-seq data are extremely low in GDS596. However, the gene expressions in liver, pancreas, salivary gland, and fetal liver tissues remain low in every gene database record.
A graph in NCBI GEO can be interpreted as follows: a 'single channel' sample means that a hybridization where cDNA obtained from one biosource is combined with the array. [14] This method is typically used for membrane (filter) arrays with radionucleotide labels and high-density oligonucleotide arrays with fluorescent labels. This experiment type makes the measurements of gene expression, which are defined as scaled/normalized signal count values that correspond to "value" in the below tables and right figures.
Sample/Tissue | Title | Value | Rank |
---|---|---|---|
GSM19012 / (Superior Cervical Ganglion) | 3AJZ02081478b_Superior_Cervical_Ganglion | 1408.7 | 81 |
GSM19014 / (Skeletal Muscle) | 3AJZ02083092b_Skeletal_Muscle_Psoas | 819.2 | 83 |
GSM19009 / (Dorsal Root Ganglion) | 3ARS02080736e_DRG | 787.4 | 81 |
Sample/Tissue | Title | Value | Rank |
---|---|---|---|
GSM18875 / (PB-CD 56+NK cells) / (Superior Cervical Ganglion) | 3AMH02082109_PB_CD56NKCells | 10.1 | 19 |
GSM18969 / (Cardiac Myocytes) | 3AJZ02053107_CardiacMyocytes | 10.1 | 12 |
GSM18881 / (PB - CD 19 + B cells) | 3AMH02082107_PB_CD19BCells | 10.5 | 28 |
The nucleic acid secondary structure of human GPATCH2L (5’UTR) shows one stem-loop, inframe stop codon, start codon, and exon boundaries. In this stem-loop, g and g are not connected. However, these are conserved in the multiple sequence alignment of this stem-loop region. In the 3'UTR figure, there are 10 stem-loops and these are zoomed in another figure. Although 3-3), 3-6), 3-9) show weird structure (3: CCTT, 6: CAT, TTC, 9:GTG), every letter is conserved in its multiple sequence alignment. Especially, 3-8) includes hsa-miRNA-205 in its stem-loop, and every letter of hsa-miRNA-205 is conserved in the multiple sequence alignment.
GPATCH2L protein is highly expressed in the bone marrow tissues, according to immunohistochemical staining of human hematopoietic cells in bone marrow tissue GPATCH2L Antibody (HPA018856) in IHC from The Human Protein Atlas. [9] Also, it has shown that GPATCH2L protein is highly expressed in human respiratory epithelial cells in bronchus tissue and human cells in endometrial stroma and glandular cells in endometrium tissue.
GPATCH2L Homo sapiens protein is mainly localized to the nucleoplasm, according to GPATCH2L antibody staining from The Human Protein Atlas and Thermo Fisher Scientific. [9] [15] Also, 82.6% of GPATCH2L human protein is predicted to be located in the nucleus, according to PSORT II [16] and pI/MW tool from Expasy. [8] GPATCH2L human protein is Isoleucine poor (I−), Serine rich (S+), and Arginine rich (R+) compared to other human proteins. [17] The post-translational modification sites [O-GalNAc (mucin-type) glycosylation, 0-(beta)-GlcNAc, N-glycosylation, O-glycosylation, and phosphorylation] are annotated on the conceptual translation. The conceptual translation figures in Wikipedia only include 1,560 bp mRNA and 482 amino acids.
GPATCH2L Homo sapiens has orthologs in Mammalia, Reptilia, Amphibia, Mollusca, Arthropoda, Ave, Fish, and Invertebrate. The values [query cover values (%), sequence identity (%), and sequence similarity (%)] decrease as the group changes into more distant orthologs from Homo sapiens, such as Invertebrates. However, frogs are unusual in that they have a very low sequence identity (36.8% - 38.0%). Also, the class of fungi and bacteria that contain GPATCH2L homologs was not able to be found using NCBI Homologene. [18] The paralogs of GPATCH2L Homo sapiens were found by using NCBI Homologene. In the below table, MYA stands for "Million Years Ago" and the equation of the corrected divergence [m] is 100*(-LN(sequence similarity(%)).
GPATCH2L | Genus, Species | Common Name | Taxonomic Group | Divergence Date (MYA) | Accession Number | Query Cover | Sequence Length (aa) | Sequence Identity (%) | Sequence Similarity (%) | Corrected Divergence [m] |
---|---|---|---|---|---|---|---|---|---|---|
Mammalia | Homo sapiens | Human | Primates | 0 | NP_060396.2 | 100.0% | 482 | 100.0% | 100.0% | 0 |
Mammalia | Mus musculus | House Mouse | Rodentia | 90 | XP_006516282.1 | 100.0% | 490 | 86.1% | 90.0% | 10.5 |
Mammalia | Ursus arctos horribilis | Grizzly Bear | Carnivora | 94 | XP_026375234.1 | 93.8% | 479 | 93.8% | 95.4% | 4.7 |
Mammalia | Equus caballus | Horse | Perissodactyla | 94 | XP_023483915.1 | 94.8% | 482 | 86.1% | 90.0% | 10.5 |
Reptiles | Chelonia mydas | Green Sea Turtle | Testudines | 318 | XP_007064235.1 | 85.8% | 485 | 85.8% | 90.3% | 10.2 |
Reptiles | Crocodylus porosus | Saltwater Crocodile | Crocodilia | 318 | XP_019407441.1 | 84.2% | 486 | 84.2% | 89.3% | 11.3 |
Aves | Dromaius novaehollandiae | Emu | Casuariiformes | 318 | XP_025976214.1 | 83.7% | 484 | 83.7% | 89.5% | 11.1 |
Aves | Falco cherrug | Saker Falcon | Falconiformes | 318 | XP_014139614.1 | 85.2% | 484 | 85.2% | 89.9% | 10.6 |
Amphibians | Xenopus laevis | African Clawed Frog | Anura | 352 | XP_018120469.1 | 36.8% | 481 | 32.9% | 44.1% | 81.9 |
Amphibians | Xenopus tropicalis | Western Clawed Frog | Anura | 352 | XP_004914844.1 | 37.4% | 481 | 33.7% | 46.5% | 76.6 |
Amphibians | Eleutherodactylus coqui | Common coquí (Frog) | Anura | 352 | KAG9484589.1 | 38.0% | 478 | 35.4% | 48.5% | 72.4 |
Fish | Paramormyrops kingsleyae | Elephantfish | Osteoglossiformes (bony fish) | 433 | XP_023683992.1 | 58.8% | 483 | 52.0% | 60.9% | 49.6 |
Fish | Oncorhynchus tshawytscha | Chinook Salmon | Salmoniformes (bony fish) | 433 | XP_024285887.1 | 56.2% | 481 | 50.6% | 61.3% | 48.9 |
Fish | Danio rerio | Zebrafish | Cypriniformes (Zebrafish) | 433 | XP_009293252.1 | 35.5% | 489 | 32.1% | 43.3% | 83.7 |
Fish | Carcharodon carcharias | Great White Shark | Lamniformes (sharks and rays) | 465 | XP_041071062.1 | 50.9% | 484 | 45.7% | 56.0% | 58.0 |
Invertebrates | Trachymyrmex septentrionalis | Ant | Arthropoda | 736 | XP_018345408.1 | 29.9% | 485 | 24.3% | 33.2% | 110.3 |
Invertebrates | Chionoecetes opilio | Snow Crab | Arthropoda | 736 | KAG0728066.1 | 31.3% | 478 | 21.0% | 32.1% | 113.6 |
Invertebrates | Amphibalanus amphitrite | Acorn Barnacle | Arthropoda | 736 | XP_043205896.1 | 30.1% | 489 | 23.8% | 34.9% | 105.3 |
Invertebrates | Owenia fusiformis | Tubeworm | Annelida | 736 | CAC9666901.1 | 33.3% | 480 | 27.8% | 41.0% | 89.2 |
Invertebrates | Pomacea canaliculata | Channeled Applesnail | Mollusca | 736 | XP_025105792.1 | 31.0% | 489 | 25.4% | 38.2% | 96.2 |
Invertebrates | Octopus sinensis | Asian Common Octopus | Mollusca | 736 | XP_036357334.1 | 34.2% | 462 | 26.2% | 36.6% | 100.5 |
Paralogs [Homo sapiens] | Accession Number | Sequence Length (aa) | Sequence Identity (%) | Sequence Similarity (%) | Corrected Divergence [m] |
---|---|---|---|---|---|
GPATCH2L | NP_060396.2 | 482 | 100.0% | 100.0% | 0 |
GPATCH2 | NP_060510.1 | 528 | 31.3% | 43.6% | 83.0 |
GPATCH1 | NP_060495 | 931 | 9.8% | 17.2% | 176.0 |
GPATCH3 | NP_071361 | 525 | 13.3% | 22.8% | 147.8 |
GPATCH4 | NP_056405 | 375 | 6.8% | 11.2% | 218.9 |
GPATCH8 | NP_001002909 | 1502 | 8.1% | 12.3% | 209.6 |
GPATCH11 | NP_777591 | 525 | 9.2% | 18.4% | 169.3 |
GPATCH2L evolves more slowly compared to Fibrinogen Alpha Chain but faster than Cytochrome C. [19] In the unrooted tree of GPATCH2L protein, only one mammal (human) is included since every species in mammals is very closely related to each other, showing various short lines. Arthropoda in invertebrates shows the longest line, meaning that they have diverged the longest.
The closest organisms that do/do not have GPATCH2L are as follows: In reptiles, GPATCH2L homologs in crocodile, turtle, snake, lizard, gecko were found by using NCBI Blast, [21] while there were no homologs in skink, chameleon, and iguana. In amphibians, GPATCH2L homologs in frog, toad, and salamander were found by using NCBI Blast, [21] while other types of amphibians, such as caecilian, microsauria, and labyrinthodontia, do not contain GPATCH2L gene. In Invertebrates, NCBI Blast [21] demonstrates that scallop, starfish, octopus, spider have GPATCH2L gene, but sponge and jellyfish do not. Lastly, there are no GPATCH2L homolog in fungi and bacteria, according to NCBI Blast. [21]
GPATCH2L’s function is still unknown; however, the paralog GPATCH3 has been shown to participate in innate immune response within mammals. [22] GPATCH 3 negatively regulates RLR-mediated innate antiviral response, disrupting VISA signalosome assembly. [23] It has also shown to participate in ocular and craniofacial development. [24]
An SNP (rs935332) within the human GPATCH2L region is related to scleroderma renal crisis (SRC), according to the validation cohort. [25] Immunostaining of renal biopsy sections demonstrated an increase in tubular expression of GPATCH2L, despite the absence of any genetic replication for the associated SNP. [25]
Retinitis Pigmentosa 24 is one of the diseases that is associated with this gene. [5] The expression of GPATCH2L in cancer is as follows: A few cases of pancreatic cancers exhibited strong immunoreactivity, while malignant lymphomas, colorectal, breast, and prostate cancers were negative or weakly stained. [26]
GPATCH2 is overexpressed in the great majority of breast cancer cases since it encodes a nuclear factor that may be important for tumor growth during breast cancer and spermatogenesis. [27] An interaction of hPrp43 (an RNA-dependent ATPase) and GPATCH2 protein greatly improves the ATPase activity of hPrp43 and cause a growth-promoting effect on mammalian cells. [28] Since GPATCH2 may be novel cancer/testis antigen, according to northern blot analyses of normal human organs, targeting GPATCH2 or inhibiting the interaction between hPrp43 and GPATCH2 could be a therapeutic technique for breast cancer. [28]
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
Transmembrane protein 151B is a protein that in humans is encoded by the TMEM151B gene.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.
Transmembrane and coiled-coil domains 4, TMCO4, is a protein in humans that is encoded by the TMCO4 gene. Currently, its function is not well defined. It is transmembrane protein that is predicted to cross the endoplasmic reticulum membrane three times. TMCO4 interacts with other proteins known to play a role in cancer development, hinting at a possible role in the disease of cancer.
Sperm microtubule associated protein 1 is a protein which in humans is encoded by the SPMAP1 gene. The protein is derived from Homo sapiens chromosome 17. The SPMAP1 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. SPMAP1 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. SPMAP1 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
CAP-Gly Domain Containing Linker Protein Family Member 4 is a protein that in humans is encoded by the CLIP4 gene. In terms of conserved domains, the CLIP4 gene contains primarily ankyrin repeats and the eponymous CAP-Gly domains. The structure of the CLIP4 protein is largely made up of coil, with alpha helices dominating the rest of the protein. CLIP4 mRNA expression occurs largely in the adrenal cortex and atrioventricular node. The literature encompassing CLIP4's conserved domains and paralogs points toward microtubule regulation as a possible function of CLIP4.
RING Finger Protein 227, also known as RNF227 and LINC02581, is a protein which in humans is encoded by the RNF227 gene. According to DNA microarray data, it is found in at least 15 tissues.
Transmembrane protein 169 (TMEM169) in humans is encoded by TMEM169 gene. The aliases of TMEM169 include FLJ34263, DKFZp781L2456, and LOC92691. TMEM169 has the highest expression in the brain, particularly the fetal brain. TMEM169 has homologs mammals, reptiles, amphibians, birds, fish, chordates and invertebrates. The most distantly related homolog of TMEM169 is Anopheles albimanus.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
Transmembrane protein 104 (TMEM104) is a protein that in humans is encoded by the TMEM104 gene. The aliases of TMEM104 are FLJ00021 and FLJ20255. Humans have a 163,255 base pair long gene coding sequence, 4703 base pair long mRNA, and 496 amino acid long protein sequence. In Eukaryotes, the TMEM104 gene is conserved.
Transmembrane protein 248, also known as C7orf42, is a gene that in humans encodes the TMEM248 protein. This gene contains multiple transmembrane domains and is composed of seven exons.TMEM248 is predicted to be a component of the plasma membrane and be involved in vesicular trafficking. It has low tissue specificity, meaning it is ubiquitously expressed in tissues throughout the human body. Orthology analyses determined that TMEM248 is highly conserved, having homology with vertebrates and invertebrates. TMEM248 may play a role in cancer development. It was shown to be more highly expressed in cases of colon, breast, lung, ovarian, brain, and renal cancers.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.
Armadillo-like Helical Domain Containing 1 (ARMH1) is a protein which in humans is encoded by chromosome 1 open reading frame 228, also known as the ARMH1 gene. The gene shows expression levels significantly higher in bone marrow, lymph nodes, and testis. Currently the function of this gene and subsequent protein is still uncertain.
Proline-Rich Protein 23A is a protein that is encoded by the Proline-Rich 23A (PRR23A) gene.