ARMH1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | ARMH1 , NCRNA00082, p40, chromosome 1 open reading frame 228, C1orf228, armadillo-like helical domain containing 1, armadillo like helical domain containing 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2686507; HomoloGene: 28727; GeneCards: ARMH1; OMA:ARMH1 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Armadillo-like Helical Domain Containing 1 (ARMH1) is a protein which in humans is encoded by chromosome 1 open reading frame 228, also known as the ARMH1 gene. The gene shows expression levels significantly higher in bone marrow, lymph nodes, and testis. [5] Currently the function of this gene and subsequent protein is still uncertain.
The ARMH1 gene is found on the plus strand of chromosome 1 between base pairs 45,140,361 and 45,191,784. Other known aliases include P40, NCRNA00082, and most commonly C1orf228. The gene has 13 exons, most of which are concentrated near the poly-A site at the end of the gene and two located upstream from the start codon. The gene is highly expressed in bone marrow and lymph nodes, suggesting an immunological function. [6]
RNA seq data was produced using multiple samples of human tissues at varying stages of development. One study was acquired from 20 separate samples of human tissue showing significantly more expression of ARMH1 in the thymus, trachea, and lungs. [7] A second study shows 27 different tissues samples in 95 different individual subjects. The expression levels are significantly higher in bone marrow, lymph nodes, and testis. [8] A third shows high expression in white blood cells and testis again, corroborating previous studies. [9] A temporal study focused on expression in different stages of development collected 35 human fetal samples, from 6 distinct tissues, between 10 and 20 weeks gestational time and sequenced using Illumina TruSeq Stranded Total RNA. The data slightly favored expression in the adrenal glands throughout development. In each of the other tissues there were no stark changes in expression through time, only a small decline of gene expression as development furthers. [10]
The ARMH1 gene has extensive abilities to alter its function and size through isoforms. Gene isoforms are mRNAs that are produced from the same locus but are different in their transcription start sites, protein coding DNA sequences and/or untranslated regions, potentially altering gene function. All known isoforms are organized and listed below with information gathered from NCBI gene, [11] and a Bioinformatics tool for calculating molecular weight. [12]
Protein Isoform | Protein Accession | Protein Length | Molecular Weight | mRNA Isoform | mRNA Accession | mRNA length |
X1 | XP_047275293 | 446 aa | 49.58 Kda | X5 | XM_011541340 | 1693 bp |
X2 | XP_011539647 | 433 aa | 48.17 Kda | X7 | XM_011541345 | 1909 bp |
X3 | XP_047275308 | 431 aa | 47.39 Kda | X8 | XM_047419352 | 1782 bp |
X4 | XP_047275309 | 419 aa | 46.17 Kda | X9 | XM_047419353 | 1507 bp |
X5 | XP_047275314 | 405 aa | 44.49 Kda | X12 | XM_047419358 | 1588 bp |
X6 | XP_016856631 | 391 aa | 43.58 Kda | X13 | XM_017001142 | 1546 bp |
X7 | XP_047275318 | 379 aa | 41.32 Kda | X14 | XM_047419362 | 1393 bp |
X8 | XP_011539651 | 376 aa | 41.67 Kda | X15 | XM_011541349 | 1645 bp |
X9 | XP_016856632 | 365 aa | 40.47 Kda | X16 | XM_017001143 | 1468 bp |
X10 | XP_047275323 | 364 aa | 40.17 Kda | X17 | XM_047419367 | 1342 bp |
X11 | XP_054192270 | 338 aa | 37.06 Kda | X18 | XM_054336295 | 1264 bp |
X12 | XP_054192271 | 336 aa | 36.46 Kda | X19 | XM_054336296 | 1207 bp |
X13 | XP_054192272 | 333 aa | 36.84 Kda | X20 | XM_054336297 | 1474 bp |
x14 | XP_047275327 | 332 aa | 36.65 Kda | X21 | XM_047419371 | 1262 bp |
x15 | XP_054192274 | 274 aa | 30.61 Kda | X23 | XM_054336299 | 1670 bp |
x16 | XP_016856635 | 263 aa | 29.31 Kda | X24 | XM_017001146 | 1146 bp |
x17 | XP_054192276 | 242 aa | 27.05 Kda | X25 | XM_054336301 | 2306 bp |
x18 | XP_054192277 | 213 aa | 23.69 Kda | X26 | XM_054336302 | 1380 bp |
The mRNA for this gene can be spliced in many different ways, making way for approximately 20 known isoforms. The most common mRNA gets spliced down to a coding region that is about 1693 nucleotides long which makes up 440 amino acids in total. [13] In a comprehensive study on oral squamous cell carcinoma, the sixth most prevalent cancer worldwide, identified ARMH1 as a gene of interest by comparing healthy subjects mRNA against affected individuals. Through mRNA inhibition of ARMH1, researchers demonstrated significantly reduced leukemic cell proliferation (P=.0041) and leukemic cell migration (P=.0001), as well as a decreased resistance to the chemotherapy drug Cytarabine. [14] [15]
The protein encoded by the gene goes by the same name, Armadillo like containing helical domain 1. The isoelectric point of the ARMH1 protein is around a pH of 5.5. [16] The protein has 2 known major domains, one being a transmembrane domain and the other being a coiled coil. [17] Within the coiled coil domains, the ARMH1 protein has 24 alpha helices. [18] [19] [20] [21] The European Bioinformatics Institute's analysis of ARMH1 reveals clearly a significantly enriched lysine content as well as a significantly deficient proline count. [22] The protein has been proven to have one major interaction with the human protein known as ABAT. [23] Gamma-aminobutyric acid transaminase (ABAT) catalyzes the conversion of gamma-aminobutyric acid (GABA) into succinic semialdehyde. Additionally, ABAT expression was associated with glycolysis-related genes, infiltrated immune cells, immunoinhibitors, and immunostimulators in HCC. [24]
The ARMH1 gene is extremely diverse and is found in thousands of different species. From primates to fungus, this gene has been evolutionarily relevant for hundreds of millions of years. While in near relatives such as cows, the similarity score is 91% that of our genome, in species of fungi the similarity ranges between 20 and 30%. [26] While attempting to find homologs in any round or flat worms, single celled eukaryotes or prokaryotes, plants, or any fungi besides chitrids, there were no significantly similar genes found. Below is a table of orthologous genes in order of sequence similarity compared to the human ARMH1 isoform X1.
Species | Common name | Accession number | Date of divergence | Sequence length (AA) | Sequence similarity | Sequence Identity |
---|---|---|---|---|---|---|
Homo sapiens | Human | NP_001139108 | 0 mya | 440 | 100% | 100% |
Microcebus murinus | Grey Mouse Lemur | XP_012631405.1 | 74 mya | 441 | 88% | 82% |
Rattus norvegicus | Brown Rat | NP_001119769.2 | 87 mya | 441 | 80% | 78% |
Bos taurus | Cow | XP_005204913.1 | 94 mya | 442 | 91% | 83% |
Ornithorhynchus anatinus | Platypus | XP_028938784.1 | 180 mya | 459 | 75% | 60% |
Apteryx rowi | Oktarito Kiwi | XP_025942684 | 319 mya | 419 | 73% | 59% |
Haliaeetus leucocephalus | Bald Eagle | XP_010581029 | 319 mya | 418 | 70% | 56% |
Gopherus flavomarginatus | Bolson Tortoise | XP_050817160 | 319 mya | 421 | 78% | 65% |
Xenopus tropicalis | Western Clawed Frog | XP_017949069 | 352 mya | 409 | 70% | 55% |
Danio rerio | Zebra Fish | XP_001341083.1 | 429 mya | 410 | 71% | 53% |
Leucoraja erinacea | Little Skate | XP_055497706 | 462 mya | 406 | 69% | 53% |
Lytechinus pictus | Painted Urchin | XP_054764007 | 619 mya | 406 | 67% | 51% |
Owenia fusiformis | Segmented Worm | CAH1776102.1 | 686 mya | 410 | 71% | 51% |
Aplysia californica | California Sea Hare | XP_012936639.1 | 708 mya | 410 | 69% | 52% |
Adineta sterineri | Rotifera | CAF4083605.1 | 708 mya | 420 | 56% | 37% |
Pocillopora verrucosa | Colonial Coral | XP_058955966.1 | 708 mya | 404 | 67% | 49% |
Geodia barretti | Sea Sponge | CAI8036895.1 | 758 mya | 404 | 50% | 35% |
Blastocladiella britannica | Chytrids | KAI9218662 | 1275 mya | 423 | 34% | 22% |
Borealophlyctis nickersoniae | Rhizophlyctidales | KAJ3289137 | 1275 mya | 453 | 19% | 11% |
The ARMH1 gene and subsequent protein have been extensively linked to leukemia, specifically T-cell acute lymphoblastic leukemia (T-ALL). [27] In mostly lymphatic tissue cell lines, T-ALL showed dramatically increased expression of the ARMH1 gene. Bone marrow samples were taken at the initial diagnosis and the conclusion of treatment and ARMH1 along with 5 other genes that were all found to be dramatically changed in expression. Researchers Dr. Manoj Bhasin and Dr. Mojtaba Bakhtiari from Emory University's Aflac Cancer Center have identified ARMH1, a novel cancer-associated gene, as being highly expressed in malignant blast cells across several pediatric hematologic malignancies, including AML, T/B-ALL, and T/B-MPAL. Importantly, ARMH1 expression is significantly elevated in patients with relapsed disease or high-risk cytogenetic profiles (e.g., MLL rearrangements) compared to those with standard-risk markers (e.g., RUNX1, inv(16)). Additionally, ARMH1 expression strongly correlates with the pediatric leukemia stem cell score (LSC6), a six-gene signature linked to poor prognosis.
Functional studies involving ARMH1 perturbation (via knockdown and overexpression) in leukemia cell lines revealed substantial effects on cell proliferation and migration. RNA sequencing of these modified cells highlighted associations between ARMH1 and pathways related to mitochondrial fatty acid synthesis and the cell cycle. Pharmacological inhibition of CPT1A, a key regulator of fatty acid synthesis in the mitochondrial matrix, led to ARMH1 downregulation, along with reduced CPT1A levels, ATP production, and oxygen consumption rates. Furthermore, ARMH1 knockdown caused a notable decrease in cell cycle regulators, including CDCA7 and EZH2.
The research also uncovered that ARMH1 physically interacts with EZH2, a protein implicated in multiple cancers, suggesting its critical role in oncogenesis. These findings establish ARMH1 as a key player in mitochondrial metabolism and cell cycle regulation, positioning it as a potential target for therapeutic intervention in pediatric hematologic malignancies. [28]
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
Trinucleotide repeat containing 18 is a protein that in humans is encoded by the TNRC18 gene.
Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.
Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Ski/Dach domain-containing protein 1 is a protein that in humans is encoded by the SKIDA1 gene. It is also known as C10orf140 and DLN-1. It has orthologs in vertebrates. It has two domains: the Ski/Sno/Dac domain and a domain of unknown function, DUF4854. It is associated with multiple types of cancer, like leukemia, ovarian cancer, and colon cancer. It's predicted to be a nuclear protein. It may interact with PRC2.
C13orf38 is a protein found in the thirteenth chromosome with an open reading frame number 38. It is 139 amino acids long. The protein goes by a number of aliases CCDC169-SOHLH2 and CCDC169. The protein is found to be over expressed in the testis of humans. It is not known what the exact function of the protein is at this current time. The human CCDC169 gene contains 753 nucleotides. C13orf contains a domain of unknown function DUF4600. which is conserved in between nucleotide interval 1-79. The protein contains 139 amino acids.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
Transmembrane Protein 269 (TMEM269) is a protein which in humans is encoded by the TMEM269 gene.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
FAM131A is a protein that is encoded by the FAM131A gene in humans. Aliases for FAM131A include C3orf40, FLAT715, and PRO1378.