Protein FAM46B

Last updated
TENT5B
Predicted Tertiarty Structure of FAM46A.png
Identifiers
Aliases TENT5B , family with sequence similarity 46 member B, terminal nucleotidyltransferase 5B, FAM46B, FAM46
External IDs MGI: 2140500 HomoloGene: 24928 GeneCards: TENT5B
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_052943

NM_175307

RefSeq (protein)

NP_443175

NP_780516

Location (UCSC) Chr 1: 27.01 – 27.01 Mb Chr 4: 133.21 – 133.22 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. [5] FAM46B contains one protein domain of unknown function, DUF1693. [6] Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 (encoded by RHOXF2) and DAZAP2. [7] [8]

Contents

Gene

Overview

FAM46B is the most common name used for the gene encoding FAM46B. The aliases MGC16491 and RP11-344H11 have also been used to describe the same gene. [7] FAM46B a 7,283 base pair gene located on the antisense strand of DNA on the short arm of chromosome 1 at the specific locus 1p36.11. Because it is on the antisense strand, the direction FAM46B is transcribed in is opposite to the standard numbering of nucleotides along the chromosome. FAM46B starts at base 27,339,333 and ends at 27,331,522.

The El Dorado program through Genomatix predicts the promoter region to be 1028 bases long, spanning bases 27,339,962 to 27,338, 935. [9]

Exon structure and splice variants

FAM46B is composed of two exons with no alternative splicing. As evidenced by the direction of the arrows on the exons in comparison with the base pair numbers on the chromosome, FAM46B is on the reverse strand of Chromosome 1. Exon Structure and Chromosome Position of FAM46B Gene.png
FAM46B is composed of two exons with no alternative splicing. As evidenced by the direction of the arrows on the exons in comparison with the base pair numbers on the chromosome, FAM46B is on the reverse strand of Chromosome 1.

The FAM46B gene contains two exons, both of which are found in FAM46B protein. There is one main protein isoform indicating no alternative splicing of FAM46B mRNA. [10]

Homology

Paralogs

FAM46B has three paralogs in Homo sapiens: FAM46A, FAM46C, and FAM46D. [7] Multiple sequence alignments of the four members of the FAM46 show high levels of conservation particularly toward the C-terminus. Amino acids conserved in all four paralogs indicate residues which make up the core of the FAM46 family.

Multiple Sequence Alignment of FAM46 Paralogs Multiple Sequence Alignment of FAM46 Paralogs.png
Multiple Sequence Alignment of FAM46 Paralogs
FAM46B orthologs in vertebrates and more distant homologs in invertebrates Table of FAM46B Ortholgs and Homologs.png
FAM46B orthologs in vertebrates and more distant homologs in invertebrates

Orthologs

FAM46B is present in the common ancestor to animals and is only found in eukaryotes. Although strict orthologs of FAM46B are only found in a relatively small range of animals such as insects and vertebrates, orthologs of FAM46 paralogs have been identified in a broader range of species. Within vertebrates, FAM46B is highly conserved in fish, amphibians and mammals. Common model organisms that FAM46B has been identified in are Danio rerio , Xenopus tropicalis , and Mus musculus . A strict ortholog of FAM46B is not found in reptiles or birds; however both the FAM46A and the FAM46C paralogs are found in the Anolis carolinensis, and the FAM46C paralog is found in birds such as Gallus gallus. [6]

Distant homologs

Distant homologs of FAM46B are present in Drosophila and nematodes such as Caenorhabditis elegans . There are no orthologs of FAM46B in plants, protists, or fungi. [11]

Phylogeny

This unrooted phylogenetic tree shows the relationship between human FAM46B and selective orthologs and homologs. Phylogeny of FAM46 in Orthologs and Homologs.png
This unrooted phylogenetic tree shows the relationship between human FAM46B and selective orthologs and homologs.

The phylogenetic tree of FAM46B mirrors a standard phylogenetic tree. As should be expected, the mammals are grouped together with the primates clustered most tightly. The more distant homologs such as Drosophila and Caenorhabditis are on the left, representing greater divergence between the gene sequences.

Protein

The function of FAM46B has not yet been determined. The information below is based on bioinformatic analyses and predictions.

Properties/characteristics

The human form of FAM46B contains 425 amino acid residues, has an isoelectric point of 8.093, [12] and a molecular mass of 46,888 daltons. [7] FAM46B is a soluble protein predicted to be located in the cytosol. [13] [14]

Domains and motifs

FAM46B contains only one identified domain: Domain of Unknown Function 1693 (DUF1693). DUF1693 has been identified as part of the nucleotidyltransferase superfamily and contains four nematode prion-like proteins, but the exact function remains unknown. [15] A SAPS protein analysis does not predict any unusual protein characteristics based on amino acid composition, internal repeats, charge clusters, or periodicities. [16]

Post-translational modifications

This diagram summarizes the major post-translational modifications of FAM46B. All of the individual images were generated using tools available through ExPASy. Summary of Post-Translational Modifications of FAM46B.png
This diagram summarizes the major post-translational modifications of FAM46B. All of the individual images were generated using tools available through ExPASy.

FAM46B is not predicted to contain a signal peptide cleavage site, [17] Glycophosphatidylinositol (GPI) anchors, or transmembrane regions. The absence of a signal peptide supports the prediction that FAM46B is located in the cytosol.

Tools at ExPASy were used to predict phosphorylation sites, O-linked glycosylation sites, and N-linked glycosylation sites. Although two sites in FAM46B are predicted as potential sites of N-linked glycosylation, FAM46B lacks a signal peptide and thus, does not enter the lumen of the endoplasmic reticulum where N-linked glycosylation occurs. Five sites were identified as possible O-linked glycosylation sites. [18] These are marked in the Conceptual Translation section below.

The most common post-translational modification predicted in FAM46B is phosphorylation. The program, NetPhos 2.0 predicts 23 phosphorylation sites. The majority of predicted phosphorylation are predicted on serine residues (14), but there are 6 predicted on threonine and 3 on tyrosines. [19] These tend to be clustered together within the protein sequence. A comparison of predicted phosphorylation sites in human, mouse and zebrafish shows that all three species have approximately the same number and distribution of phosphorylation sites (on serines vs. threonines vs. tryrosines).

Secondary structure

The exact structure of FAM46B has not been characterized. Predictive programs available though Biology Workbench [20] such GOR4, PELE, CHOFAS were used to predict secondary structure. The results obtained through programs at Biology Workbench were compared to the results obtained using Phyre2. [21] Since these programs are predictive and rely on different algorithms, each provides slightly different output. Consensus between programs suggests that FAM46B contains mainly alpha helix and random coils. Although present, FAM46B appears to contain only a few small sections pre predicted to form beta sheets. Annotated results of both PELE and PHYRE2 secondary structure predictions are outlined in the figure below.

Conceptual translation

FAM46B Conceptual Translation for Wikipedia.png

Expression

Microarray based gene expression of FAM46B in a variety of tissues. Image obtained from BioGPS FAM46B SymAtlas Expression.png
Microarray based gene expression of FAM46B in a variety of tissues. Image obtained from BioGPS

Expression can be assessed in a variety of ways. Both expressed sequence tags and GEO profiles show the number of transcripts of a gene present in a certain tissue type and relative to the total gene transcripts. Microarrays are also useful in quantifying gene expression. Protein in-situ hybridization is a more accurate measure of expression than mRNA or cDNA based methods, as probes can be fused directly to the protein.

Expression of FAM46B, broken down by tissue type and health state. Data obtained from the NCBI UniGene page FAM46B EST Summary.png
Expression of FAM46B, broken down by tissue type and health state. Data obtained from the NCBI UniGene page

According to some available microarray data, FAM46B is highly expressed in the tongue (levels 10x above mean gene expression for the tissue). [22] Outside of the tongue, FAM46B seems to be uniformly expressed across most tissues. In addition to gene expression in healthy tissues, EST data also highlights gene expression by health state. It appears FAM46B expression is elevated in cases of skin cancer and gliomas. [23]

Interacting proteins

Transcription factors that bind to regulatory sequences

The El Dorado program through Genomatix was used to predict this list of transcription factors that are likely to bind to the promoter region of FAM46B. Numerous E2F sites are predicted, in addition to numerous Zinc Finger transcription factor sites, several E-box binding factors and TWIST homologs. The binding sites are not evenly distributed within promoter region. The largest clustering of binding sites was located around base 177 of the promoter, which is about 600 base pairs upstream from the start of transcription for FAM46B. [9] The image below shows selected transcription factor binding sites for the top twenty matches identified by El Dorado that are on the antisense strand.

Transcription factor binding sites with high matrix match scores and located on the antisense strand. Data obtained from El Dorado FAM46B Promoter Region Txn Factor Binding Sites.png
Transcription factor binding sites with high matrix match scores and located on the antisense strand. Data obtained from El Dorado

Confirmed protein-protein interactions and possible clinical significance

Yeast two hybrid screening indicates FAM46B physically interacts with the ataxin-1 protein, which is encoded by ATXN1 . [8] The exact function of ATXN1 is not known, but it is thought to be involved in regulating aspects of protein production, particularly transcription. Since FAM46B physically interacts with ATXN1, it is possible that FAM46B also plays a role in the regulation of protein production and regulation of transcription. [24]

A second protein shown to physically interact with FAM46B is DAZAP2, is a proline-rich brain expressed protein. [8] In combination with the information about ATXN1 above, it appears that FAM46B interacts with brain-specific proteins. A third protein identified by yeast two-hybrid screening as a physical interactant of FAM46B is PEPP2, [8] a paired-like homeobox protein. If this interaction is significant, the interaction between FAM46B and PEPP2 may play a role in development and morphogenesis.

However, the protein interactome is not yet well understood. Not every program identified interacting proteins in the same ways. As an example, STRING identified ATXN-1 as a strong interaction partner with FAM46B, but did not identify PEPP2 nor DAZAP2. The prediction network from STRING is shown in the adjacent image.

Related Research Articles

<span class="mw-page-title-main">SOGA2</span> Protein-coding gene in the species Homo sapiens

SOGA2, also known as Suppressor of glucose autophagy associated 2 or CCDC165, is a protein that in humans is encoded by the SOGA2 gene. SOGA2 has two human paralogs, SOGA1 and SOGA3. In humans, the gene coding sequence is 151,349 base pairs long, with an mRNA of 6092 base pairs, and a protein sequence of 1586 amino acids. The SOGA2 gene is conserved in gorilla, baboon, galago, rat, mouse, cat, and more. There is distant conservation seen in organisms such as zebra finches and anoles. SOGA2 is ubiquitously expressed in humans, with especially high expression in brain, colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.

<span class="mw-page-title-main">KIAA1109</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.

<span class="mw-page-title-main">RNF128</span> Protein-coding gene in the species Homo sapiens

E3 ubiquitin-protein ligase RNF128 is an enzyme that in humans is encoded by the RNF128 gene.

<span class="mw-page-title-main">TMEM63A</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 63A is a protein that in humans is encoded by the TMEM63A gene. The mature human protein is approximately 92.1 kilodaltons (kDa), with a relatively high conservation of mass in orthologs. The protein contains eleven transmembrane domains and is inserted into the membrane of the lysosome. BioGPS analysis for TMEM63A in humans shows that the gene is ubiquitously expressed, with the highest levels of expression found in T-cells and dendritic cells.

<span class="mw-page-title-main">Fam158a</span> Protein-coding gene in the species Homo sapiens

UPF0172 protein FAM158A, also known as c14orf122 or CGI112, is a protein that in humans is encoded by the FAM158A gene located on chromosome 14q11.2.

<span class="mw-page-title-main">TMEM131</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 131 (TMEM131) is a protein that is encoded by the TMEM131 gene in humans. The TMEM131 protein contains three domains of unknown function 3651 (DUF3651) and two transmembrane domains. This protein has been implicated as having a role in T cell function and development. TMEM131 also resides in a locus (2q11.1) that is associated with Nievergelt's Syndrome when deleted.

<span class="mw-page-title-main">CCDC130</span> Protein found in humans

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

<span class="mw-page-title-main">ZC3H12B</span> Protein-coding gene in the species Homo sapiens

ZC3H12B, also known as CXorf32 or MCPIP2, is a protein encoded by gene ZC3H12B located on chromosome Xq12 in humans.

<span class="mw-page-title-main">CXorf66</span> Human protein

CXorf66 also known as Chromosome X Open Reading Frame 66, is a 361aa protein in humans that is encoded by the CXorf66 gene. The protein encoded is predicted to be a type 1 transmembrane protein; however, its exact function is currently unknown.

<span class="mw-page-title-main">FAM98A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.

<span class="mw-page-title-main">EVI5L</span> Protein-coding gene in the species Homo sapiens

EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.

<span class="mw-page-title-main">FAM221A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 221 member A is a protein in humans that is encoded by the FAM221A gene. FAM221A is a gene that is not yet well understood by the scientific community. However, it appears that this gene may have a role in Parkinson's disease and prostate cancer.

<span class="mw-page-title-main">C5orf46</span> Protein-coding gene in the species Homo sapiens

C5orf46 is a protein coding gene located on chromosome 5 in humans. It is also known as sssp1, or skin and saliva secreted protein 1. There are two known isoforms known in humans, with isoform 2 being the longer of the two. The protein encoded is predicted to have one transmembrane domain, and has a predicted molecular weight of 9,692 Da, and a basal isoelectric point of 4.67.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">C22orf31</span> Protein-coding gene in the species Homo sapiens

C22orf31 is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. The protein has orthologs with high percent similarity in mammals. The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.

<span class="mw-page-title-main">FAM155B</span> Protein-coding gene in humans

Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">MFSD6L</span> Protein-coding gene in the species Homo sapiens

Major facilitator superfamily domain containing 6 like (MFSD6L) is a protein encoded by the MFSD6L gene in humans. The MFSD6L protein is a transmembrane protein that is part of the major facilitator superfamily (MFS) that uses chemiosmotic gradients to facilitate the transport of small solutes across cell membranes.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000158246 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000046694 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. "NCBI Gene: FAM46B family with sequence similarity 46, member B" . Retrieved 23 April 2013.
  6. 1 2 "NCBI BLAST". National Library of Medicine. National Center for Biotechnology Information. Retrieved 11 May 2013.
  7. 1 2 3 4 "family with sequence similarity 46, member B" . Retrieved 23 April 2013.
  8. 1 2 3 4 "FAM46B Interaction Summary". BioGRID. Tyers Lab. Retrieved 11 May 2013.
  9. 1 2 "Annotation and Analysis". El Dorado. Genomatix. Retrieved 4 May 2013.
  10. "Homo sapiens family with sequence similarity 46, member B (FAM46B), mRNA" . Retrieved 23 April 2013.
  11. "family with sequence similarity 46, member B" . Retrieved 23 April 2013.
  12. "Big-PI". IMP Bioinformatics. Archived from the original on 21 July 2020. Retrieved 11 May 2013.
  13. "SOSUI Prediction". Archived from the original on 20 March 2004. Retrieved 4 May 2013.
  14. "PSORT II" . Retrieved 4 May 2013.
  15. "NCB Conserved Domains: DUF1693 Superfamily" . Retrieved 23 April 2013.
  16. Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S (March 1992). "Methods and algorithms for statistical analysis of protein sequences". Proc. Natl. Acad. Sci. U.S.A. 89 (6): 2002–6. Bibcode:1992PNAS...89.2002B. doi: 10.1073/pnas.89.6.2002 . PMC   48584 . PMID   1549558.
  17. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011). "SignalP 4.0: discriminating signal peptides from transmembrane regions". Nat. Methods. 8 (10): 785–6. doi: 10.1038/nmeth.1701 . PMID   21959131.
  18. "NetOGlyc". CBS Prediction Servers. Retrieved 11 May 2013.
  19. "NetPhos". CBS Prediction Servers. Retrieved 11 May 2013.
  20. "GOR4, CHOFAS, PELE". Protein Tools. San Diego Supercomputer Center. Retrieved 12 May 2013.[ permanent dead link ]
  21. Kelley LA, Sternberg MJ (2009). "Protein structure prediction on the Web: a case study using the Phyre server" (PDF). Nat Protoc. 4 (3): 363–71. doi:10.1038/nprot.2009.2. hdl: 10044/1/18157 . PMID   19247286. S2CID   12497300.
  22. "SymAtlas Expression FAM46B". BioGPS. The Scripps Research Institute. Retrieved 12 May 2013.
  23. "UniGene Data, FAM46B". EST Profile. National Center for Biotechnology Information. Retrieved 12 May 2013.
  24. "ATXN1 - ataxin 1". Genetic Home Reference, National Library of Medicine. Retrieved 11 May 2013.