ARMH1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | ARMH1 , NCRNA00082, p40, chromosome 1 open reading frame 228, C1orf228, armadillo-like helical domain containing 1, armadillo like helical domain containing 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2686507; HomoloGene: 28727; GeneCards: ARMH1; OMA:ARMH1 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Armadillo-like Helical Domain Containing 1 (ARMH1) is a protein which in humans is encoded by chromosome 1 open reading frame 228, also known as the ARMH1 gene. The gene shows expression levels significantly higher in bone marrow, lymph nodes, and testis. [5] Currently the function of this gene and subsequent protein is still uncertain.
The ARMH1 gene is found on the plus strand of chromosome 1 between base pairs 45,140,361 and 45,191,784. Other known aliases include P40, NCRNA00082, and most commonly C1orf228. The gene has 13 exons, most of which are concentrated near the poly-A site at the end of the gene and two located upstream from the start codon. The gene is highly expressed in bone marrow and lymph nodes, suggesting an immunological function. [6]
RNA seq data was produced using multiple samples of human tissues at varying stages of development. One study was acquired from 20 separate samples of human tissue showing significantly more expression of ARMH1 in the thymus, trachea, and lungs. [7] A second study shows 27 different tissues samples in 95 different individual subjects. The expression levels are significantly higher in bone marrow, lymph nodes, and testis. [8] A third shows high expression in white blood cells and testis again, corroborating previous studies. [9] A temporal study focused on expression in different stages of development collected 35 human fetal samples, from 6 distinct tissues, between 10 and 20 weeks gestational time and sequenced using Illumina TruSeq Stranded Total RNA. The data slightly favored expression in the adrenal glands throughout development. In each of the other tissues there were no stark changes in expression through time, only a small decline of gene expression as development furthers. [10]
The ARMH1 gene has extensive abilities to alter its function and size through isoforms. Gene isoforms are mRNAs that are produced from the same locus but are different in their transcription start sites, protein coding DNA sequences and/or untranslated regions, potentially altering gene function. All known isoforms are organized and listed below with information gathered from NCBI gene, [11] and a Bioinformatics tool for calculating molecular weight. [12]
Protein Isoform | Protein Accession | Protein Length | Molecular Weight | mRNA Isoform | mRNA Accession | mRNA length |
X1 | XP_047275293 | 446 aa | 49.58 Kda | X5 | XM_011541340 | 1693 bp |
X2 | XP_011539647 | 433 aa | 48.17 Kda | X7 | XM_011541345 | 1909 bp |
X3 | XP_047275308 | 431 aa | 47.39 Kda | X8 | XM_047419352 | 1782 bp |
X4 | XP_047275309 | 419 aa | 46.17 Kda | X9 | XM_047419353 | 1507 bp |
X5 | XP_047275314 | 405 aa | 44.49 Kda | X12 | XM_047419358 | 1588 bp |
X6 | XP_016856631 | 391 aa | 43.58 Kda | X13 | XM_017001142 | 1546 bp |
X7 | XP_047275318 | 379 aa | 41.32 Kda | X14 | XM_047419362 | 1393 bp |
X8 | XP_011539651 | 376 aa | 41.67 Kda | X15 | XM_011541349 | 1645 bp |
X9 | XP_016856632 | 365 aa | 40.47 Kda | X16 | XM_017001143 | 1468 bp |
X10 | XP_047275323 | 364 aa | 40.17 Kda | X17 | XM_047419367 | 1342 bp |
X11 | XP_054192270 | 338 aa | 37.06 Kda | X18 | XM_054336295 | 1264 bp |
X12 | XP_054192271 | 336 aa | 36.46 Kda | X19 | XM_054336296 | 1207 bp |
X13 | XP_054192272 | 333 aa | 36.84 Kda | X20 | XM_054336297 | 1474 bp |
x14 | XP_047275327 | 332 aa | 36.65 Kda | X21 | XM_047419371 | 1262 bp |
x15 | XP_054192274 | 274 aa | 30.61 Kda | X23 | XM_054336299 | 1670 bp |
x16 | XP_016856635 | 263 aa | 29.31 Kda | X24 | XM_017001146 | 1146 bp |
x17 | XP_054192276 | 242 aa | 27.05 Kda | X25 | XM_054336301 | 2306 bp |
x18 | XP_054192277 | 213 aa | 23.69 Kda | X26 | XM_054336302 | 1380 bp |
The mRNA for this gene can be spliced in many different ways, making way for approximately 20 known isoforms. The most common mRNA gets spliced down to a coding region that is about 1693 nucleotides long which makes up 440 amino acids in total. [13] In a comprehensive study on oral squamous cell carcinoma, the sixth most prevalent cancer worldwide, identified ARMH1 as a gene of interest by comparing healthy subjects mRNA against affected individuals. Through mRNA inhibition of ARMH1, researchers demonstrated significantly reduced leukemic cell proliferation (P=.0041) and leukemic cell migration (P=.0001), as well as a decreased resistance to the chemotherapy drug Cytarabine. [14] [15]
The protein encoded by the gene goes by the same name, Armadillo like containing helical domain 1. The isoelectric point of the ARMH1 protein is around a pH of 5.5. [16] The protein has 2 known major domains, one being a transmembrane domain and the other being a coiled coil. [17] Within the coiled coil domains, the ARMH1 protein has 24 alpha helices. [18] [19] [20] [21] The European Bioinformatics Institute's analysis of ARMH1 reveals clearly a significantly enriched lysine content as well as a significantly deficient proline count. [22] The protein has been proven to have one major interaction with the human protein known as ABAT. [23] Gamma-aminobutyric acid transaminase (ABAT) catalyzes the conversion of gamma-aminobutyric acid (GABA) into succinic semialdehyde. Additionally, ABAT expression was associated with glycolysis-related genes, infiltrated immune cells, immunoinhibitors, and immunostimulators in HCC. [24]
The ARMH1 gene is extremely diverse and is found in thousands of different species. From primates to fungus, this gene has been evolutionarily relevant for hundreds of millions of years. While in near relatives such as cows, the similarity score is 91% that of our genome, in species of fungi the similarity ranges between 20 and 30%. [26] While attempting to find homologs in any round or flat worms, single celled eukaryotes or prokaryotes, plants, or any fungi besides chitrids, there were no significantly similar genes found. Below is a table of orthologous genes in order of sequence similarity compared to the human ARMH1 isoform X1.
Species | Common name | Accession number | Date of divergence | Sequence length (AA) | Sequence similarity | Sequence Identity |
---|---|---|---|---|---|---|
Homo sapiens | Human | NP_001139108 | 0 mya | 440 | 100% | 100% |
Microcebus murinus | Grey Mouse Lemur | XP_012631405.1 | 74 mya | 441 | 88% | 82% |
Rattus norvegicus | Brown Rat | NP_001119769.2 | 87 mya | 441 | 80% | 78% |
Bos taurus | Cow | XP_005204913.1 | 94 mya | 442 | 91% | 83% |
Ornithorhynchus anatinus | Platypus | XP_028938784.1 | 180 mya | 459 | 75% | 60% |
Apteryx rowi | Oktarito Kiwi | XP_025942684 | 319 mya | 419 | 73% | 59% |
Haliaeetus leucocephalus | Bald Eagle | XP_010581029 | 319 mya | 418 | 70% | 56% |
Gopherus flavomarginatus | Bolson Tortoise | XP_050817160 | 319 mya | 421 | 78% | 65% |
Xenopus tropicalis | Western Clawed Frog | XP_017949069 | 352 mya | 409 | 70% | 55% |
Danio rerio | Zebra Fish | XP_001341083.1 | 429 mya | 410 | 71% | 53% |
Leucoraja erinacea | Little Skate | XP_055497706 | 462 mya | 406 | 69% | 53% |
Lytechinus pictus | Painted Urchin | XP_054764007 | 619 mya | 406 | 67% | 51% |
Owenia fusiformis | Segmented Worm | CAH1776102.1 | 686 mya | 410 | 71% | 51% |
Aplysia californica | California Sea Hare | XP_012936639.1 | 708 mya | 410 | 69% | 52% |
Adineta sterineri | Rotifera | CAF4083605.1 | 708 mya | 420 | 56% | 37% |
Pocillopora verrucosa | Colonial Coral | XP_058955966.1 | 708 mya | 404 | 67% | 49% |
Geodia barretti | Sea Sponge | CAI8036895.1 | 758 mya | 404 | 50% | 35% |
Blastocladiella britannica | Chytrids | KAI9218662 | 1275 mya | 423 | 34% | 22% |
Borealophlyctis nickersoniae | Rhizophlyctidales | KAJ3289137 | 1275 mya | 453 | 19% | 11% |
The ARMH1 gene and subsequent protein have been extensively linked to leukemia, specifically T-cell acute lymphoblastic leukemia (T-ALL). [27] In mostly lymphatic tissue cell lines, T-ALL showed dramatically increased expression of the ARMH1 gene. Bone marrow samples were taken at the initial diagnosis and the conclusion of treatment and ARMH1 along with 5 other genes that were all found to be dramatically changed in expression. To corroborate these findings, once again ARMH1 saw a 1.8x expression increase in samples after diagnosis of leukemia. Higher ARMH1 expression was significantly associated with poor overall survival. [28]
Trinucleotide repeat containing 18 is a protein that in humans is encoded by the TNRC18 gene.
Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.
C16orf82 is a protein that, in humans, is encoded by the C16orf82 gene. C16orf82 encodes a 2285 nucleotide mRNA transcript which is translated into a 154 amino acid protein using a non-AUG (CUG) start codon. The gene has been shown to be largely expressed in the testis, tibial nerve, and the pituitary gland, although expression has been seen throughout a majority of tissue types. The function of C16orf82 is not fully understood by the scientific community.
FAM71E1, also known as Family With Sequence Similarity 71 Member E1, is a protein that in humans is encoded by the FAM71E1 gene. It is thought to be ubiquitously expressed at low levels throughout the body, and it is conserved in vertebrates, particularly mammals and some reptiles. The protein is localized to the nucleus and can be exported to the cytoplasm.
Chromosome 1 open reading frame 112, is a protein that in humans is encoded by the C1orf112 gene, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of mRNA, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an isoelectric point of 5.62. C1orf112 has been experimentally determined to localize to the mitochondria, although it does not contain a mitochondrial targeting sequence.
Ski/Dach domain-containing protein 1 is a protein that in humans is encoded by the SKIDA1 gene. It is also known as C10orf140 and DLN-1. It has orthologs in vertebrates. It has two domains: the Ski/Sno/Dac domain and a domain of unknown function, DUF4854. It is associated with multiple types of cancer, like leukemia, ovarian cancer, and colon cancer. It's predicted to be a nuclear protein. It may interact with PRC2.
C13orf38 is a protein found in the thirteenth chromosome with an open reading frame number 38. It is 139 amino acids long. The protein goes by a number of aliases CCDC169-SOHLH2 and CCDC169. The protein is found to be over expressed in the testis of humans. It is not known what the exact function of the protein is at this current time. The human CCDC169 gene contains 753 nucleotides. C13orf contains a domain of unknown function DUF4600. which is conserved in between nucleotide interval 1-79. The protein contains 139 amino acids.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
FAM131A is a protein that is encoded by the FAM131A gene in humans. Aliases for FAM131A include C3orf40, FLAT715, and PRO1378.
SPMIP10 is a protein that in Homo sapiens is encoded by the SPMIP10 gene.
FAM86B1 is a protein, which in humans is encoded by the FAM86B1 gene. FAM86B1 is an essential gene in humans. The protein contains two domains: FAM86, and AdoMet-MTase.
ZNF839 or zinc finger protein 839 is a protein which in humans is encoded by the ZNF839 gene. It is located on the long arm of chromosome 14. Zinc finger protein 839 is speculated to play a role in humoral immune response to cancer as a renal carcinoma antigen (NY-REN-50). This is because NY-REN-50 was found to be over expressed in cancer patients, especially those with renal carcinoma. Zinc finger protein 839 also plays a role in transcription regulation by metal-ion binding since it binds to DNA via C2H2-type zinc finger repeats.
Zinc Finger Protein 62, also known as "ZNF62," "ZNF755," or "ZET," is a protein that in humans is encoded by the ZFP62 gene. ZFP62 is part of the C2H2 Zinc Finger family of genes.