ARMH3 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | ARMH3 , chromosome 10 open reading frame 76, C10orf76, armadillo-like helical domain containing 3, armadillo like helical domain containing 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1918867; HomoloGene: 15843; GeneCards: ARMH3; OMA:ARMH3 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. [5] Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. [6] The protein contains a conserved proline-rich motif, [5] [7] suggesting that it may participate in protein-protein interactions via an SH3-binding domain, [8] although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. [7] [9] The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741. [5] [10]
It has been found to contain a potential SH3-binding domain, [5] [8] which is known to participate in protein-protein binding interactions; however, no protein interactions have been experimentally verified with c10orf76. A 2007 gene expression study found c10orf76 expression to vary inversely with the expression of several other genes, including NFYB, CCR5, and NSBP1, suggesting that the protein may function as a transcriptional regulator. [6]
ARMH3 is well-conserved throughout Eumetazoans. [5] [7] Some weakly similar orthologs (approximately 35% sequence identity) were identified in Parazoa (i.e., A. queenslandica) and in Fungi, specifically Ascomycetes (i.e., A. oryzae). [7]
The following table illustrates the sequence similarity between human c10orf76 protein and various orthologs. Similar sequences were identified with BLAST [7] and BLAT [11] tools.
Species | Organism common name | NCBI accession | Sequence identity | Sequence similarity | Length (AAs) | Gene common name |
---|---|---|---|---|---|---|
Homo sapiens | Human | NP_078817.2 | 100% | 100% | 689 | UPF0668 protein C10orf76 |
Mus musculus | House Mouse | NP_938038.2 | 99% | 99% | 689 | UPF0668 protein C10orf76 homolog |
Danio rerio | Zebrafish | NP_956913.2 | 85% | 93% | 689 | UPF0668 protein C10orf76 homolog |
Apis florea | Honey bee | XP_003695991.1 | 51% | 70% | 641 | Predicted: UPF0668 protein C10orf76 homolog |
Amphimedon queenslandica | Sponge | XP_003383350.1 | 46% | 67% | 667 | Predicted: UPF0668 protein C10orf76-like |
Acyrthosiphon pisum | Pea Aphid | XP_001952575.2 | 40% | 61% | 684 | Predicted: UPF0668 protein C10orf76 homolog isoform 1 |
Aspergillus oryzae | Fungus | XP_001820240 | 23% | 42% | 653 | hypothetical protein AOR_1_2042154 |
In humans, the ARMH3 gene, also known by the alias FLJ13114, spans 210,577 base pairs on the reverse strand of the long arm of chromosome 10. [10] Its 26 alternatively spliced exons encode 5 potential transcript variants, the largest of which being 4101 base pairs in length. [10]
The human ARMH3 locus is flanked on the left and right sides by HPS6 and KCNIP2, respectively. [5] HPS6 is a protein that may play a role in organelle biogenesis, [12] and KCNIP2 is a voltage-gated potassium channel interacting protein. [13] The same pattern is observed in the orthologous locus in mice, [14] as well as most other vertebrates.
The NCBI (GenBank) gene profile for c10orf76 labels the start of the first transcribed exon as the beginning of the gene. [5] The primary promoter predicted by the El Dorado tool from Genomatix begins 519 base pairs upstream of this transcription start site. [15] This promoter is predicted to be 658 base pairs in length and thus includes the first transcribed exon at its 3 prime end. [5]
The c10orf76 locus is thought to be alternatively spliced into at least five unique isoforms, although it is unclear how this splicing is regulated. [5] A second potential promoter, also predicted by El Dorado, likely drives expression of one of the shorter documented variants (positioned before exon 23). [10] [15]
The largest protein variant is 689 amino acids in length. [5] It has a molecular mass of approximately 78.7kDa and is isoelectric at pH 6.13. [16] It may be secreted via a non-classical pathway. [17] NCBI identifies a protein domain of unknown function between amino acids Asp435 and Leu671, known as DUF1741 (Domain of Unknown Function 1741). [5] This domain is not known to exist in any other proteins. [7]
A potential stem loop region at the 3 prime end of the first exon (and thus, the end of the promoter) was predicted by the Dotlet program from ExPASy. [18] This could serve to regulate protein translation. [19] Also, an Alu segment in the 3 prime untranslated region of the mature mRNA could serve as a potential translational regulatory mechanism. [20]
The protein has been found to be differentially expressed in some medical conditions and in response to certain cellular signals. For example, decreased c10orf76 expression is observed in patients with chronic B-cell lymphocytic leukemia. [21] Decreased expression is also observed in cells treated with vascular endothelial growth factor. [22]
The protein is thought to be localized to the cytoplasm, [23] although this is uncertain. It has also been predicted to be a 3-pass transmembrane protein. [16] Also, a mitochondrial sorting signal was identified at the beginning of one of the protein isoforms using MitoProt II (located at Met416 of the largest protein variant). [24]
The structure of the c10orf76 protein has not been experimentally explored. The secondary structure is predicted to be completely helical in nature, with intervening regions of protein disorder. [26] [27] The potential SH3-binding domain is located on a predicted region of disorder, further supporting a protein-protein binding function for c10orf76. A helical region between amino acids 610-655 was predicted to be a coiled coil motif. [28]
A Phyre2 [29] protein structure prediction suggested that the first 200 residues of c10orf76 may share strong structural similarities with Symplekin, [26] a nuclear-localized protein that is thought to be a scaffold component of the polyadenylation complex. [25]
The expression of c10orf76 mRNA has been found to be inversely correlated with expression of various other mRNAs, including NFYB, CCR5, and NSBP1. [6] Although this study and the predicted SH3-binding domain suggest that c10orf76 partakes in protein-protein binding interactions, none have been experimentally verified. A short search using IntAct, [30] MINT, [31] and STRING [32] also yielded zero predicted protein-protein interactions.
There is a potential that the protein is secreted via a non-classical pathway, [17] which may underlie the functionality of some of the posttranslational modifications. There are ten conserved potential phosphorylation sites within the protein sequence. [33] Also, there are nine residues that are confidently (>90%) predicted by NetOGlyc [34] to undergo O-linked glycosylation, all residing within the low complexity region between Leu325 and Ser359.
The protein coded by the largest mRNA variant of c10orf76 encodes a proline-rich motif containing two PxxP domains, where "P" represents a proline residue and "x" represents any other amino acid [5] (highlighted in blue below). These domains have been shown to participate in protein-protein binding interactions, specifically via the SH3 protein binding domain. [8] The potential SH3-binding domain exists within a low complexity region with an unusually high number of amino acids with oxygen-containing side-groups (highlighted in green below). An NetOGlyc analysis [34] of the region suggests that these residues are likely to undergo O-linked glycosylation and thus may serve to regulate binding to the potential SH3-binding domain. [35]
<code="text"> 325 L V T TP V SP A PT TP V T P L G T T P P S S 359
An Alu element was identified in the 3`-UTR of the longest mRNA transcript variant [5] It is unclear as to whether this sequence serves any functional or regulatory purpose, but there is existing evidence for Alu-mediated protein translation regulation, so this cannot be ruled out in c10orf76. [20]
The N-terminus of a short transcript variant (exons 17-26) was predicted to have a mitochondrial sorting signal with 96% confidence using the MitoProt II tool. [24] It is unclear as to whether this is a uniquely transcribed variant or it results from protein cleavage of the full-size protein. There are no predicted alternative promoters upstream of this variant's first exon. [15]
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.
KIAA1704, also known as LSR7, is a protein that in humans is encoded by the GPALPP1 gene. The function of KIAA1704 is not yet well understood. KIAA1704 contains one domain of unknown function, DUF3752. The protein contains a conserved, uncharged, repeated motif GPALPP(GF) near the N terminus and an unusual, conserved, mixed charge throughout. It is predicted to be localized to the nucleus.
Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.
SNED1 is an extracellular matrix (ECM) protein expressed at low levels in a wide range of tissues. The gene encoding SNED1 is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein is 1413 amino-acid long. The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al. SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding EGF-like domains (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin (FN3) domains in the carboxy-terminal region.
Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1, which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.
Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
Chromosome 3 open reading frame 67 or C3orf67 is a protein that in humans is encoded by the gene C3orf67. The function of C3orf67 is not yet fully understood.
Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. It is a precursor protein that becomes active after cleavage. The function is not yet well understood, but it is suggested to be active during development
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
C22orf23 is a protein which in humans is encoded by the C22orf23 gene. Its predicted secondary structure consists of alpha helices and disordered/coil regions. It is expressed in many tissues and highest in the testes and it is conserved across many orthologs.
ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total. In addition, alternative splicing results in multiple transcript variants. The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations. While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.