UBALD1

Last updated
UBALD1
Identifiers
Aliases UBALD1 , FAM100A, PP11303, UBA like domain containing 1
External IDs MGI: 1916255; HomoloGene: 17055; GeneCards: UBALD1; OMA:UBALD1 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_145253
NM_001330467

NM_145359

RefSeq (protein)

NP_001317396
NP_660296

NP_663334

Location (UCSC)n/a Chr 16: 4.69 – 4.7 Mb
PubMed search [2] [3]
Wikidata
View/Edit Human View/Edit Mouse

UBALD1 (ubiquitin-associated like domain containing 1) is a protein encoded by the UBALD1 gene, located on chromosome 16 in humans. [4] UBALD1 has high ubiquitous tissue expression and localizes in the nucleus and cytoplasm. UBALD1 is conserved in animals, including invertebrates. An alias for UBALD1 is FAM100A.

Contents

Gene

The human UBALD1 gene is located on the minus strand of chromosome 16 at cytogenetic location 16p13.3. The gene contains three exons and two introns, with a total gene length of 6,145 base pairs. [5]

Transcripts

There are three isoforms of UBALD1 in humans, all of which contain three exons. UBALD1 isoform 1 has a mRNA sequence of 1,374 nucleotides and encodes the longest protein. Isoform 2 differs with an exclusion of 75 nucleotides at the start of exon 3, and isoform 3 differs with an insertion of 185 nucleotides at the end of exon 1. [4]

Human UBALD1 Isoforms UBALD1 isoforms.png
Human UBALD1 Isoforms
Human UBALD1 isoforms
Transcript VariantmRNA lengthExon 1 lengthExon 2 lengthExon 3 lengthProtein IsoformProtein Length (AA)
1 (NM_145253.3) 13742116210991 (NP_660296.1) 177
2 (NM_001330467.2) 12992116210242 (NP_001317396.1) 152
3 (NM_001411032.1) 15593966210993 (NP_001397961.1) 122
Conceptual Translation of Human UBALD1 Human UBALD1 Conceptual Translation.png
Conceptual Translation of Human UBALD1

Protein

Isoforms

UBALD1 isoform 1 encodes the longest protein that consists of 177 amino acids. The protein sequence from isoform 2 is 85.9% identical to isoform 1. Isoform 2 contains an exclusion of 25 amino acids in exon 3 and lacks the PHA03247 domain. Isoform 3 greatly differs from isoform 1 and 2, being 35.6% identical to isoform 1. Isoform 3 contains 122 amino acids and an inclusion at exon 1, causing a frameshift of codons and earlier occurrence of its stop codon. Isoform 3 also lacks the PHA03247 domain.

Properties and domains

The protein encoded by UBALD1 isoform 1 has a predicted isoelectric point of 6.13 and a molecular weight 19.0 kDa. [6] [7] UBALD1 composition is rich in alanine and proline, and contains multiple duplets/triplets of these residues. Proline residues and multiplets are highly conserved, specifically within the PHA03247 domain. The protein contains one domain, PHA03247, or large tegument protein UL36 domain. Tegument protein UL36 is the largest tegument protein found in herpes simplex virus 1, and contains deubiquitinating activity. [8]

Structure

Tertiary structure of human UBALD1 Tertiary structure of human UBALD1.jpg
Tertiary structure of human UBALD1

The UBALD1 protein secondary structure consists of mostly coils and four short alpha helical regions (positions 5-22, 25–46, 76–83, and 170–173). [9] Its tertiary structure is subsequently coiled and globular-like.

Regulation

Gene Level Expression

UBALD1 is a highly expressed gene, 1.4x more expressed than the average gene. [10] UBALD1 has ubiquitous expression, with its highest levels in the placenta, skeletal muscle, liver, and brain. [11] Within the brain, UBALD1 expression is highest in the hippocampal formation and olfactory regions [12]

UBALD1 protein schematic including PHA03247 tegument domain, nuclear export signals, and phosphorylation sites Human UBALD1 Protein Schematic.png
UBALD1 protein schematic including PHA03247 tegument domain, nuclear export signals, and phosphorylation sites

Protein level regulation

UBALD1 protein is predicted to localize in the nucleus and cytoplasm. [13] A nuclear export signal is located moderately at positions 79-85 and strongly at positions 174–177. [14] UBALD1 has many predicted phosphorylation and glycosylation sites, with known phosphorylation sites at S88, S90, S93, and S96. [15]

Evolutionary history

Orthologs

The ortholog space for UBALD1 is large, with its most distant orthologs diverging 694 million years ago in invertebrates. The orthologs include most vertebrates, such as mammals, birds, reptiles, amphibians, fishes, as well as some invertebrates, such as arthropods, cnidaria, mollusks, echinoderms, and nematodes. There are no orthologs in fungi, plants, or bacteria. Closely related orthologs, including mammals, birds, and reptiles, range 67-92% sequence similarity. Moderately related orthologs, including amphibians and fishes, range sequence 55-75% similarity. Distantly related orthologs, including invertebrates, range 29-50% sequence similarity.

UBALD1 Orthologs
Taxonomic GroupGenus and speciesCommon NameDate of Divergence (MYA)Sequence Length (AA)Sequence Similarity %Accession Number
Mammals Homo sapiens human 0177100 NP_660296.1
Mus musculus mouse 8717692.1 NP_663334.1
Phascolarctos cinereus koala 16017480.6 XP_020834285.1
Aves Gallus gallus chicken 31916276.3 XP_015150132.1
Strigops habroptila owl parrot 31916274.6 XP_030340656.1
Reptiles Chrysemys picta bellii painted turtle 31916576.3 XP_005312202.1
Varanus komodoensis komodo dragon 31916667.6 XP_044289341.1
Amphibians Geotrypetes seraphini gaboon caecilian 35316170.4 XP_033770542.1
Bufo bufo common toad 35314265.4 XP_040297017.1
Fishes Danio rerio zebra fish 43115568.0 NP_001002488.1
Petromyzon marinus sea lamprey 59917964.7 XP_032830618.1
Invertebrates Styela clava tunicates 60311142.9 XP_039256608.1
Strongylocentrotus purpuratus purple sea urchin 61918336.2 XP_785714.2
Trichoplax sp. H2 trichoplax 6619439.5 RDD43865.1
Vanessa tameamea kamehameha butterfly 69415150.5 XP_026499211.1
Gigantopelta aegis deep sea snail 69412842.1 XP_041363274.1
Mercenaria mercenaria hard clam 69412242.5 XP_045214261.1

Paralogs

The human UBALD1 protein has one paralog, UBALD2, present in vertebrates but not invertebrates. They are similar, with a 63.1% sequence identity and 70.9% sequence similarity. UBALD2 protein has a length of 164 amino acids, predicted isoelectric point of 6.78 and molecular weight of 17.7 kDa. [7]

The UL36 tegument protein domain of UBALD1 is partially conserved in the UBALD2 paralog. UBALD1 has monoallelic expression, where as UBALD2 has biallelic expression.

Conservation

Exon 1 and the beginning of exon 3 (positions 64–79), are highly conserved among strict and distant orthologs. Specifically, the proline residues in exon 3 (P65, P69, P72, P73, P76) are strongly conserved.

Interacting proteins

UBALD1 has physical association with proteins MED8, found in the nucleus, and RPL9, found in the cytoplasm. [16] UBALD1 also associates with EPRS and EEF1E1, both of which are known to interact and be involved in the aminoacyl-tRNA synthetase multienzyme complex. [17]

Clinical Significance

UBALD1 has been associated with various cancer types, [18] and was identified as a significantly elevated autoantibody in lymphoma patients. [19] Hypomethylation of the UBALD1 promoter region was associated with cell-cell adhesion, B cell activation, and lymphocyte activation, is a potential biomarker for predicting primary resistance to platinum-based chemotherapeutics. [20] UBALD1 hypomethylation is also associated with gender incongruence. UBALD1 has significant CpG hypomethylation in trans women assigned male at birth before receiving gender-affirming hormone therapy, compared to cis men assigned male at birth. [21]

Related Research Articles

<span class="mw-page-title-main">TMEM8B</span> Protein-coding gene in humans

Transmembrane protein 8B is a protein that in humans is encoded by the TMEM8B gene. It encodes for a transmembrane protein that is 338 amino acids long, and is located on human chromosome 9. Aliases associated with this gene include C9orf127, NAG-5, and NGX61.

<span class="mw-page-title-main">RNF128</span> Protein-coding gene in the species Homo sapiens

E3 ubiquitin-protein ligase RNF128 is an enzyme that in humans is encoded by the RNF128 gene.

<span class="mw-page-title-main">Morn repeat containing 1</span> Protein-coding gene in the species Homo sapiens

MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.

<span class="mw-page-title-main">ARMH3</span> Protein-coding gene in the species Homo sapiens

ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">CCDC47</span> Protein-coding gene in humans

Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein PAT complex subunit CCDC47. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.

<span class="mw-page-title-main">FAM63A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.

<span class="mw-page-title-main">IFFO1</span> Protein-coding gene in the species Homo sapiens

Intermediate filament family orphan 1 is a protein that in humans is encoded by the IFFO1 gene. IFFO1 has uncharacterized function and a weight of 61.98 kDa. IFFO1 proteins play an important role in the cytoskeleton and the nuclear envelope of most eukaryotic cell types.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Chromosome 8 open reading frame 82 is a protein encoded in humans by the C8orf82 gene.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.

<span class="mw-page-title-main">SMCO3</span> Protein-coding gene in the species Homo sapiens

Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">C1orf159</span> Protein encoded on a gene

C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">C10orf53</span> Human gene

C10orf53 is a protein that in humans is encoded by the C10orf53 gene. The gene is located on the positive strand of the DNA and is 30,611 nucleotides in length. The protein is 157 amino acids and the gene has 3 exons. C10orf53 orthologs are found in mammals, birds, reptiles, amphibians, fish, and invertebrates. It is primarily expressed in the testes and at very low levels in the cerebellum, liver, placenta, and trachea.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

References

  1. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000039568 Ensembl, May 2017
  2. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  3. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. 1 2 "UBALD1 UBA like domain containing 1 [Homo sapiens (Human)] - Gene - NCBI".
  5. "UBALD1 Gene - GeneCards | UBAD1 Protein | UBAD1 Antibody".
  6. "PhosphoSite".
  7. 1 2 https://web.expasy.org/compute_pi/
  8. Schlieker, C., Korbel, G. A., Kattenhorn, L. M., & Ploegh, H. L. (2005). A deubiquitinating activity is conserved in the large tegument protein of the herpesviridae. Journal of virology, 79(24), 15582-15585.
  9. Wei Zheng, Chengxin Zhang, Yang Li, Robin Pearce, Eric W. Bell, Yang Zhang. Folding non-homology proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Reports Methods, 1: 100014 (2021).
  10. "NCBI AceView".
  11. "Gds3113 / 150432".
  12. "Gene Detail :: Allen Brain Atlas: Mouse Brain".
  13. Nakai, K. and Horton, P., PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization, Trends Biochem. Sci, 24(1) 34-35 (1999).
  14. "Services".
  15. Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015 43:D512-20
  16. Stelzl, U., Worm, U., Lalowski, M., Haenig, C., Brembeck, F. H., Goehler, H., ... & Wanker, E. E. (2005). A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6), 957-968.
  17. Szklarczyk D*, Gable AL*, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P‡, Jensen LJ‡, von Mering C‡. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets . Nucleic Acids Res. 2021 Jan 8;49(D1):D605-12.PubMed
  18. Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., ... & Snyder, M. (2012). Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell, 148(6), 1293-1307.
  19. Tan, Q., Wang, D., Yang, J., Xing, P., Yang, S., Li, Y., ... & Shi, Y. (2020). Autoantibody profiling identifies predictive biomarkers of response to anti-PD1 therapy in cancer patients. Theranostics, 10(14), 6399.
  20. Hua, T., Kang, S., Li, X. F., Tian, Y. J., & Li, Y. (2021). DNA methylome profiling identifies novel methylated genes in epithelial ovarian cancer patients with platinum resistance. Journal of Obstetrics and Gynaecology Research, 47(3), 1031-1039.
  21. Ramirez, K., Fernández, R., Collet, S., Kiyar, M., Delgado-Zayas, E., Gómez-Gil, E., ... & Pásaro, E. (2021). Epigenetics is implicated in the basis of gender incongruence: an epigenome-wide association analysis. Frontiers in Neuroscience, 1074.