ARMH3

Last updated
ARMH3
Identifiers
Aliases ARMH3 , chromosome 10 open reading frame 76, C10orf76, armadillo-like helical domain containing 3, armadillo like helical domain containing 3
External IDs MGI: 1918867 HomoloGene: 15843 GeneCards: ARMH3
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_024541

NM_198296

RefSeq (protein)

NP_078817

NP_938038

Location (UCSC) Chr 10: 101.85 – 102.06 Mb Chr 19: 45.81 – 45.99 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. [5] Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. [6] The protein contains a conserved proline-rich motif, [5] [7] suggesting that it may participate in protein-protein interactions via an SH3-binding domain, [8] although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. [7] [9] The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741. [5] [10]

Contents

Function

It has been found to contain a potential SH3-binding domain, [5] [8] which is known to participate in protein-protein binding interactions; however, no protein interactions have been experimentally verified with c10orf76. A 2007 gene expression study found c10orf76 expression to vary inversely with the expression of several other genes, including NFYB, CCR5, and NSBP1, suggesting that the protein may function as a transcriptional regulator. [6]

Homology

ARMH3 is well-conserved throughout Eumetazoans. [5] [7] Some weakly similar orthologs (approximately 35% sequence identity) were identified in Parazoa (i.e., A. queenslandica) and in Fungi, specifically Ascomycetes (i.e., A. oryzae). [7]

The following table illustrates the sequence similarity between human c10orf76 protein and various orthologs. Similar sequences were identified with BLAST [7] and BLAT [11] tools.

SpeciesOrganism Common NameNCBI AccessionSequence IdentitySequence SimilarityLength (AAs)Gene Common Name
Homo sapiens Human NP_078817.2 100%100%689UPF0668 protein C10orf76
Mus musculus House Mouse NP_938038.2 99%99%689UPF0668 protein C10orf76 homolog
Danio rerio Zebrafish NP_956913.2 85%93%689UPF0668 protein C10orf76 homolog
Apis florea Honey bee XP_003695991.1 51%70%641PREDICTED: UPF0668 protein C10orf76 homolog
Amphimedon queenslandica Sponge XP_003383350.1 46%67%667PREDICTED: UPF0668 protein C10orf76-like
Acyrthosiphon pisum Pea Aphid XP_001952575.2 40%61%684PREDICTED: UPF0668 protein C10orf76 homolog isoform 1
Aspergillus oryzae Fungus XP_001820240 23%42%653hypothetical protein AOR_1_2042154

Gene

Characteristics

In humans, the ARMH3 gene, also known by the alias FLJ13114, spans 210,577 base pairs on the reverse strand of the long arm of chromosome 10. [10] Its 26 alternatively spliced exons encode 5 potential transcript variants, the largest of which being 4101 base pairs in length. [10]

A map of human chromosome 10 with c10orf76 marked with the red line. Map of human chromosome 10 with c10orf76 marked.png
A map of human chromosome 10 with c10orf76 marked with the red line.

The human ARMH3 locus is flanked on the left and right sides by HPS6 and KCNIP2, respectively. [5] HPS6 is a protein that may play a role in organelle biogenesis, [12] and KCNIP2 is a voltage-gated potassium channel interacting protein. [13] The same pattern is observed in the orthologous locus in mice, [14] as well as most other vertebrates.

Expression

The NCBI (GenBank) gene profile for c10orf76 labels the start of the first transcribed exon as the beginning of the gene. [5] The primary promoter predicted by the El Dorado tool from Genomatix begins 519 base pairs upstream of this transcription start site. [15] This promoter is predicted to be 658 base pairs in length and thus includes the first transcribed exon at its 3 prime end. [5]

The c10orf76 locus is thought to be alternatively spliced into at least five unique isoforms, although it is unclear how this splicing is regulated. [5] A second potential promoter, also predicted by El Dorado, likely drives expression of one of the shorter documented variants (positioned before exon 23). [10] [15]

Protein

Characteristics

The largest protein variant is 689 amino acids in length. [5] It has a molecular mass of approximately 78.7kDa and is isoelectric at pH 6.13. [16] It may be secreted via a non-classical pathway. [17] NCBI identifies a protein domain of unknown function between amino acids Asp435 and Leu671, known as DUF1741 (Domain of Unknown Function 1741). [5] This domain is not known to exist in any other proteins. [7]

Expression

A potential stem loop region at the 3 prime end of the first exon (and thus, the end of the promoter) was predicted by the Dotlet program from ExPASy. [18] This could serve to regulate protein translation. [19] Also, an Alu segment in the 3 prime untranslated region of the mature mRNA could serve as a potential translational regulatory mechanism. [20]

The protein has been found to be differentially expressed in some medical conditions and in response to certain cellular signals. For example, decreased c10orf76 expression is observed in patients with chronic B-cell lymphocytic leukemia. [21] Decreased expression is also observed in cells treated with vascular endothelial growth factor. [22]

The protein is thought to be localized to the cytoplasm, [23] although this is uncertain. It has also been predicted to be a 3-pass transmembrane protein. [16] Also, a mitochondrial sorting signal was identified at the beginning of one of the protein isoforms using MitoProt II (located at Met416 of the largest protein variant). [24]

Structure

A structural prediction of c10orf76 protein from PHYRE2 protein folding software. This structure is similar to that of human Symplekin, a protein thought to recruit regulatory factors to the polyadenylation machinery. PHYRE2 rendering of human c10orf76 protein.png
A structural prediction of c10orf76 protein from PHYRE2 protein folding software. This structure is similar to that of human Symplekin, a protein thought to recruit regulatory factors to the polyadenylation machinery.

The structure of the c10orf76 protein has not been experimentally explored. The secondary structure is predicted to be completely helical in nature, with intervening regions of protein disorder. [26] [27] The potential SH3-binding domain is located on a predicted region of disorder, further supporting a protein-protein binding function for c10orf76. A helical region between amino acids 610-655 was predicted to be a coiled coil motif. [28]

A PHYRE2 [29] protein structure prediction suggested that the first 200 residues of c10orf76 may share strong structural similarities with Symplekin, [26] a nuclear-localized protein that is thought to be a scaffold component of the polyadenylation complex. [25]

Predicted protein Interactions

The expression of c10orf76 mRNA has been found to be inversely correlated with expression of various other mRNAs, including NFYB, CCR5, and NSBP1. [6] Although this study and the predicted SH3-binding domain suggest that c10orf76 partakes in protein-protein binding interactions, none have been experimentally verified. A short search using IntAct, [30] MINT, [31] and STRING [32] also yielded zero predicted protein-protein interactions.

Predicted posttranslational modifications

There is a potential that the protein is secreted via a non-classical pathway, [17] which may underlie the functionality of some of the posttranslational modifications. There are ten conserved potential phosphorylation sites within the protein sequence. [33] Also, there are nine residues that are confidently (>90%) predicted by NetOGlyc [34] to undergo O-linked glycosylation, all residing within the low complexity region between Leu325 and Ser359.


Regions of potential research interest

The protein coded by the largest mRNA variant of c10orf76 encodes a proline-rich motif containing two PxxP domains, where "P" represents a proline residue and "x" represents any other amino acid [5] (highlighted in blue below). These domains have been shown to participate in protein-protein binding interactions, specifically via the SH3 protein binding domain. [8] The potential SH3-binding domain exists within a low complexity region with an unusually high number of amino acids with oxygen-containing side-groups (highlighted in green below). An NetOGlyc analysis [34] of the region suggests that these residues are likely to undergo O-linked glycosylation and thus may serve to regulate binding to the potential SH3-binding domain. [35]

325 L V T TP V SP A PT TP V T P L G T T P P S S 359

An Alu element was identified in the 3`-UTR of the longest mRNA transcript variant [5] It is unclear as to whether this sequence serves any functional or regulatory purpose, but there is existing evidence for Alu-mediated protein translation regulation, so this cannot be ruled out in c10orf76. [20]

The N-terminus of a short transcript variant (exons 17-26) was predicted to have a mitochondrial sorting signal with 96% confidence using the MitoProt II tool. [24] It is unclear as to whether this is a uniquely transcribed variant or it results from protein cleavage of the full-size protein. There are no predicted alternative promoters upstream of this variant's first exon. [15]

Model organisms

Model organisms have been used in the study of C10orf76 function. A conditional knockout mouse line called 9130011E15Riktm1a(EUCOMM)Wtsi was generated at the Wellcome Trust Sanger Institute. [36] Male and female animals underwent a standardized phenotypic screen [37] to determine the effects of deletion. [38] [39] [40] [41] Additional screens performed: - In-depth immunological phenotyping [42] - in-depth bone and cartilage phenotyping [43]

Related Research Articles

<span class="mw-page-title-main">C20orf27</span> Protein-coding gene in the species Homo sapiens

UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.

<span class="mw-page-title-main">C11orf16</span> Protein-coding gene in the species Homo sapiens

Gene C11orf16, chromosome 11 open reading frame 16, is a protein in humans that is encoded by the C11orf16 gene. It has 7 exons, and the size of 467 amino acids.

<span class="mw-page-title-main">TSBP1</span> Protein found in humans

TSBP1 is a protein that in humans is encoded by the TSBP1 gene. C6orf10 is an open reading frame on chromosome 6 containing a protein that is ubiquitously expressed at low levels in the adult genome and may play a role during fetal development. C6orf10 has been found to be linked to both neurodegenerative and autoimmune diseases in adults. Expression of this gene is highest in the testis but is also seen in other tissue types such as the brain, lens of the eye and the medulla. TSBP1 was previously known as C6orf10.

<span class="mw-page-title-main">TMEM242</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.

<span class="mw-page-title-main">INAVA</span> Protein-coding gene in the species Homo sapiens

INAVA, sometimes referred to as hypothetical protein LOC55765, is a protein of unknown function that in humans is encoded by the INAVA gene. Less common gene aliases include FLJ10901 and MGC125608.

TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.

<span class="mw-page-title-main">SHOC1</span> Protein-coding gene in the species Homo sapiens

Shortage In Chiasmata 1, also known as SHOC1, is a protein that in humans is encoded by the SHOC1 gene.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

<span class="mw-page-title-main">C12orf60</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.

<span class="mw-page-title-main">SHLD1</span> Protein-coding gene in the species Homo sapiens

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">C1orf198</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 198 (C1orf198) is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.

<span class="mw-page-title-main">C22orf23</span> Protein-coding gene in the species Homo sapiens

C22orf23 is a protein which in humans is encoded by the C22orf23 gene. Its predicted secondary structure consists of alpha helices and disordered/coil regions. It is expressed in many tissues and highest in the testes and it is conserved across many orthologs.

<span class="mw-page-title-main">C1orf185</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">CCDC190</span> Protein-coding gene in the species Homo sapiens

Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.

<span class="mw-page-title-main">C5orf22</span> Protein-coding gene in the species Homo sapiens

Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000120029 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000039901 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 6 7 8 9 10 11 12 13 "Entrez Gene: Chromosome 10 open reading frame 76 (Human)" . Retrieved 28 April 2013.
  6. 1 2 3 Weinberg MS, Barichievy S, Schaffer L, Han J, Morris KV (2007). "An RNA targeted to the HIV-1 LTR promoter modulates indiscriminate off-target gene activation". Nucleic Acids Research. 35 (21): 7303–12. doi:10.1093/nar/gkm847. PMC   2175361 . PMID   17959645.
  7. 1 2 3 4 5 6 "NCBI BLAST Tool" . Retrieved 2 April 2013.
  8. 1 2 3 Jia CY, Nie J, Wu C, Li C, Li SS (Aug 2005). "Novel Src homology 3 domain-binding motifs identified from proteomic screen of a Pro-rich region". Molecular & Cellular Proteomics. 4 (8): 1155–66. doi: 10.1074/mcp.M500108-MCP200 . PMID   15929943.
  9. "TimeTree: The Timescale of Life" . Retrieved 23 May 2013.
  10. 1 2 3 4 "AceView: c10orf76" . Retrieved 28 April 2013.
  11. UCSC Genome Bioinformatics. "Human BLAT Search Tool" . Retrieved 18 March 2013.
  12. "Entrez Gene: HPS6 Hermansky-Pudlak syndrome 6" . Retrieved 5 May 2013.
  13. Burgoyne RD (Mar 2007). "Neuronal calcium sensor proteins: generating diversity in neuronal Ca2+ signalling". Nature Reviews. Neuroscience. 8 (3): 182–93. doi:10.1038/nrn2093. PMC   1887812 . PMID   17311005.
  14. "Entrez Gene: 9130011E15Rik cDNA (Mus musculus)" . Retrieved 13 May 2013.
  15. 1 2 3 "El Dorado Gene Promoter Analysis". Archived from the original on 22 May 2021. Retrieved 21 April 2013.
  16. 1 2 SDSC Biology Workbench. "Biology WorkBench 3.2" . Retrieved 1 May 2013.
  17. 1 2 "SecretomeP" . Retrieved 18 April 2013.
  18. "Sib Dotlet Sequence Alignment" . Retrieved 13 May 2013.
  19. Pandey NB, Marzluff WF (Dec 1987). "The stem-loop structure at the 3' end of histone mRNA is necessary and sufficient for regulation of histone mRNA stability". Molecular and Cellular Biology. 7 (12): 4557–9. doi:10.1128/MCB.7.12.4557. PMC   368142 . PMID   3437896.
  20. 1 2 Häsler J, Strub K (2006). "Alu elements as regulators of gene expression". Nucleic Acids Research. 34 (19): 5491–7. doi:10.1093/nar/gkl706. PMC   1636486 . PMID   17020921.
  21. "Geo Profile: Differential Expression of c10orf76 in B-cell leukemia" . Retrieved 13 May 2013.
  22. "Geo Profile: Differential Expression of c10orf76 under VEGF-A conditions" . Retrieved 13 May 2013.
  23. "SOSUI Localization Prediction". Archived from the original on 15 May 2012. Retrieved 24 April 2013.
  24. 1 2 "MitoProt II - v1.101". Archived from the original on 30 August 2021. Retrieved 13 May 2013.
  25. 1 2 Takagaki Y, Manley JL (Mar 2000). "Complex protein interactions within the human polyadenylation machinery identify a novel component". Molecular and Cellular Biology. 20 (5): 1515–25. doi:10.1128/MCB.20.5.1515-1525.2000. PMC   85326 . PMID   10669729.
  26. 1 2 "PHYRE2 results for c10orf76" . Retrieved 18 April 2013.[ permanent dead link ]
  27. "PredictProtein - Sequence Analysis, Structure and Function Prediction" . Retrieved 18 April 2013.
  28. Lupas A, Van Dyke M, Stock J (May 1991). "Predicting coiled coils from protein sequences". Science. 252 (5009): 1162–4. Bibcode:1991Sci...252.1162L. doi:10.1126/science.252.5009.1162. PMID   2031185. S2CID   2442386.
  29. "PHYRE2 Protein Fold Recognition Server" . Retrieved 18 April 2013.
  30. "IntAct Interaction Database" . Retrieved 3 May 2013.
  31. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G (Jan 2007). "MINT: the Molecular INTeraction database". Nucleic Acids Research. 35 (Database issue): D572–4. doi:10.1093/nar/gkl950. PMC   1751541 . PMID   17135203.
  32. "STRING functional and predicted protein interactions" . Retrieved 10 May 2013.
  33. "NetPhos" . Retrieved 24 Apr 2013.
  34. 1 2 "NetOGlyc" . Retrieved 23 April 2013.
  35. Wells L, Vosseller K, Hart GW (Mar 2001). "Glycosylation of nucleocytoplasmic proteins: signal transduction and O-GlcNAc". Science. 291 (5512): 2376–8. Bibcode:2001Sci...291.2376W. doi:10.1126/science.1058714. PMID   11269319. S2CID   9397432.
  36. Gerdin AK (2010). "The Sanger Mouse Genetics Programme: high throughput characterisation of knockout mice". Acta Ophthalmologica. 88: 925–7. doi:10.1111/j.1755-3768.2010.4142.x. S2CID   85911512.
  37. 1 2 "International Mouse Phenotyping Consortium".
  38. Skarnes WC, Rosen B, West AP, Koutsourakis M, Bushell W, Iyer V, Mujica AO, Thomas M, Harrow J, Cox T, Jackson D, Severin J, Biggs P, Fu J, Nefedov M, de Jong PJ, Stewart AF, Bradley A (Jun 2011). "A conditional knockout resource for the genome-wide study of mouse gene function". Nature. 474 (7351): 337–42. doi:10.1038/nature10163. PMC   3572410 . PMID   21677750.
  39. Dolgin E (Jun 2011). "Mouse library set to be knockout". Nature. 474 (7351): 262–3. doi: 10.1038/474262a . PMID   21677718.
  40. Collins FS, Rossant J, Wurst W (Jan 2007). "A mouse for all reasons". Cell. 128 (1): 9–13. doi: 10.1016/j.cell.2006.12.018 . PMID   17218247. S2CID   18872015.
  41. White JK, Gerdin AK, Karp NA, Ryder E, Buljan M, Bussell JN, Salisbury J, Clare S, Ingham NJ, Podrini C, Houghton R, Estabel J, Bottomley JR, Melvin DG, Sunter D, Adams NC, Tannahill D, Logan DW, Macarthur DG, Flint J, Mahajan VB, Tsang SH, Smyth I, Watt FM, Skarnes WC, Dougan G, Adams DJ, Ramirez-Solis R, Bradley A, Steel KP (Jul 2013). "Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes". Cell. 154 (2): 452–64. doi:10.1016/j.cell.2013.06.022. PMC   3717207 . PMID   23870131.
  42. 1 2 "Infection and Immunity Immunophenotyping (3i) Consortium".
  43. "OBCD Consortium".