FAM149B1

Last updated

The Family with sequence similarity 149 member B1 is an uncharacterized protein [1] encoded by the human FAM149B1 gene, with one alias KIAA0974. [2] [3] The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. [3] The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues. [4] [5]

Contents

Gene

This gene has a possible 14 exons. It is located on the forward strand of chromosome 10 at 10q22.2 on the positive strand. [6] The total span of the gene, including 5' and 3' UTR, is 3149 base pairs. The gene is flanked on the left by NUDT13 (nudix hydrolase 13) and on the right by DNAJC9-AS1 (DNAJC9 antisense RNA 1).

This figure shows the location of the FAM149B1 gene on Chromosome 10, and also displays the genes in the surrounding location. FAM149B1 Chromosome Map.png
This figure shows the location of the FAM149B1 gene on Chromosome 10, and also displays the genes in the surrounding location.

Isoforms

The FAM149B1 protein has a possible 10 isoforms, which are determined through alternative splicing of the gene.

The various isoforms of the FAM149B1 gene NCBI FAM149B1 Gene Isoforms.png
The various isoforms of the FAM149B1 gene
Isoform NameAccessionExonsLength (bp)
Primary TranscriptNM_173348.1All (14)3149
X1XM_005269744.2All (14)3108
X2XM_011539737.2132935
X3XM_005269745.2133006
X4XM_017016164.1122810
X5XM_017016165.1112779
X6*XM_017016166.192816
X6*XM_005269747.392923
X7XM_017016167.191485
X8XM_011539740.291447

Protein

General properties

The primary protein encoded by the FAM149B1 gene is 583 amino acids in length and has a molecular weight of 64 kDal. The protein contains a conserved protein domain, DUF3719 [8] [6] located at the amino acids 115–179. The isoelectric point of the protein before post-translational modifications is 6.3, [9] and this isoelectric point is relatively conserved in the protein's isoforms, especially in those with the most similar composition of exons. This protein is considered serine rich, in that it expresses a higher serine composition relative to the composition of other human proteins. [10] [11] This high serine composition is also seen in the gene's orthologues.

Splice variants

The splice variants of the protein demonstrate some shared qualities of the protein that is translated from the primary transcript. Because each isoform is a different length and contains various combinations of the available exons, there are variances in the isoelectric point and molecular weight. The isoforms closest to the weight and exon composition to the primary transcript generally share these characteristics. The protein isoforms missing the conserved DUF3719 domain are isoforms X5 and X6 because this domain is contained between exons 3–6.

Isoform NameAccessionMolecular Weight (kDal)Length (aa)Isoelectric point
Primary TranscriptNP_775483.1645826.3
X1XP_005269801.163.75746.3
X2XP_011538039.162.65607.5
X3XP_005269802.159.85406.4
X4XP_016871653.157.85187.7
X5XP_016871654.1534766.8
X6*XP_016871655.146.64197.5
X6*XP_005269804.146.64197.5
X7XP_016871656.1413685.1
X8XP_011538042.1383485.2

Structure

There is a negative charge cluster from amino acids 212 to 239. Negative charge clusters often coordinate calcium, or magnesium or zinc ions, mannose-binding protein, or aminopeptidase. [12] The protein contains no positive or mixed charge clusters. The secondary structure of the protein is predicted to be a combination of mostly alpha-helices with a few predicted beta-sheet structures.

This figure shows the predicted 3D structure of the human FAM149B1 protein. The secondary structures contributing to the tertiary structure are alpha-helices and one predicted beta-sheet turn. 3D Structure Prediction.png
This figure shows the predicted 3D structure of the human FAM149B1 protein. The secondary structures contributing to the tertiary structure are alpha-helices and one predicted beta-sheet turn.

Subcellular localization

Immunofluorescent stain shows expression of FAM149B1 to be highly expressed in the nucleus. Immunofluorescent Stain FAM149B1.jpg
Immunofluorescent stain shows expression of FAM149B1 to be highly expressed in the nucleus.

The subcellular location of the protein is the nucleus. [13] There is a leucine zipper pattern in the protein beginning at amino acid 347. [14]

Post-translational modifications

Acetylation

The third amino acid in the protein sequence, serine, is predicted to be acetylated. [15]

Phosphorylation

There are multiple predicted phosphorylation sites on various serine, tyrosine, and threonine amino acids are predicted for this protein sequence. [16] The conserved DUF3719 domain contains 7 predicted phosphorylation sites.

Sumoylation

One predicted sumoylation site was identified in the protein sequence at K267. [17]

A schematic diagram of the FAM149B1 protein after post translational modifications. The DUF3719 domain is also displayed. FAM149B1 Schematic Drawing.png
A schematic diagram of the FAM149B1 protein after post translational modifications. The DUF3719 domain is also displayed.

Expression

Overall in the human body, this gene is expressed at levels slightly below the average human gene expression level. [18] The protein is expressed in most cell types of the human body. [19] Most experimentation shows a higher expression of this protein in kidney, testes, and brain tissues, with very low expression seen in pancreas tissues. [4] [5] The gene is expressed at lower levels than its normal expression in most cancerous tissues. The gene is also seen to be expressed most highly in fetal and infantile tissues. [20]

DNA microarray data

A DNA microarray experiment showing the varying expression levels of FAM149B1 before and after depleting beta-catenin levels in the sample. Beta-catenin GeoProfile.png
A DNA microarray experiment showing the varying expression levels of FAM149B1 before and after depleting beta-catenin levels in the sample.

DNA microarray analysis experiments show expression patterns of FAM149B1 compared to multiple other genes in a sample. FAM149B1 is shown to be at a lower expression level than most other genes in a multiple myeloma cell line and was shown to increase to close to average gene expression levels after the beta-catenin was depleted from the sample. [21]

A microarray experiment displaying lowered gene expression levels of FAM149B1 in an ovarian cancer cell line after the use of NSC319726 anticancer drug. GEOProfile Ovarian Cancer Cell FAM149B1.png
A microarray experiment displaying lowered gene expression levels of FAM149B1 in an ovarian cancer cell line after the use of NSC319726 anticancer drug.

FAM149B1 expression was also shown to decrease to lower than average gene expression levels in an ovarian cancer cell line after the use of an anticancer drug named NSC319726. [13]

Transcriptional regulation

The gene has nine different identified promoter regions, which correlate to the various isoforms of the gene. The promoter for the primary transcript of the gene has binding sites for a variety of different transcription factors.

Interacting proteins

Current data supports the FAM149B1 protein interactions with 6 different proteins.

One protein was determined to be an interacting protein with FAM149B1 through affinity chromatography techniques.

The other five proteins that have been predicted to interact with FAM149B1 protein were found through the process of textmining.

Homology/Evolution

Paralog

There is one known paralog, FAM149A. [27] It is located on the human chromosome 4 at 4q35.1. The function of the protein encoded by this gene is not well understood, but it also contains the DUF3719 protein domain. The protein translated by this gene shares a 21.2% identity [28] with the FAM149B1 protein. The protein sequence is 482 amino acids in length.

Orthologs

An uprooted phylogenetic tree displaying a select number of orthologues and paralog of the FAM149B1 gene, based on the identity of the species to the human protein sequence. FAM149B1 Phylogenetic Tree - Protein Orthologs.jpg
An uprooted phylogenetic tree displaying a select number of orthologues and paralog of the FAM149B1 gene, based on the identity of the species to the human protein sequence.
The graph shows the relative rate of genetic modifications compared to fibrinogen and cytochrome c genes across the orthologs of FAM149B1. Identity versus divergence Graph.png
The graph shows the relative rate of genetic modifications compared to fibrinogen and cytochrome c genes across the orthologs of FAM149B1.

This gene has orthologues across mammals, birds, reptiles, fish, and some invertebrates. [3] There is a high conservation in mammals, moderate conservation in many of the other vertebrate orthologues, and a low conservation in its few invertebrate orthologues. [29] [28]

Genus speciesCommon NameTime of Divergence (MYA) [30] Accession NumberLength (aa)Identity [28]
1 Homo sapiens Human-NP_775483.1582100%
2 Pongo abelii Sumatran orangutan15.76XP_009243761.158793.0%
3 Papio anubis Baboon29.4XP_003903829.158293.6%
4 Mus musculus Mouse90XP_006518391.154473.5%
5 Bos mutus Domestic Yak96XP_005910201.158486.0%
6 Orcinus orca Killer whale, Orca96XP_004273176.158587.0%
7 Ailuropoda melanoleuca Giant Panda96XP_011224744.159082.7%
8Orycteropus afer aferAardvark105XP_007938812.158384.0%
9 Monodelphis domestica Short-Tailed Opossum159XP_007478430.158773.5%
10 Sarcophilus harrisii Tasmanian Devil159XP_012396086.158872.0%
11 Ornithorhynchus anatinus Platypus177XP_007658720.150648.1%
12 Gallus gallus Chicken312XP_004942035.160250.4%
13 Lepidothrix coronata Blue-crowned manakin312XP_017688171.157647.5%
14 Haliaeetus albicilla White-tailed eagle312XP_009911204.158949.4%
15 Falco peregrinus Peregrine falcon312XP_005235226.159749.2%
16 Chrysemys picta bellii Western painted turtle312XP_008169104.159656.1%
17 Pelodiscus sinensis Chinese softshell turtle312XP_014433498.148747.1%
18 Alligator mississippiensis American alligator312XP_014464842.159655.0%
19 Xenopus tropicalis Western clawed frog352NP_001278638.156139.8%
20 Danio rerio Zebra fish435NP_001074134.164437.7%
21 Lepisosteus oculatus Spotted gar435XP_015202055.164737.9%
22 Oreochromis niloticus Nile tilapia435XP_005474333.168334.3%
23 Callorhinchus milii Australian ghostshark473XP_007897395.163836.8%
24 Ciona intestinalis Sea squirt676XP_002129894.180724.5%
25 Aplysia californica California sea slug797XP_012945921.131216.9%

Clinical significance

While the gene is largely not well understood by scientists, it is shown to be associated with a wide range of various cancerous tumors. [31] [32]

The FAM149B1 gene is also included in a region of 11 genes that comprises one of 15 regions containing mutations associated with the African Pygmy phenotype. [33] [34]

Related Research Articles

<span class="mw-page-title-main">Proser2</span> Protein-coding gene in the species Homo sapiens

PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.

Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.

<span class="mw-page-title-main">FAM71F2</span> Protein-coding gene in the species Homo sapiens

FAM71F2 or Family with Sequence Similarity 71 member F2 is a protein that in humans is encoded by the Family with Sequence Similarity 71 member F2 gene. This gene is highly active in the reproductive tissues, specifically the testis, and may serve as a potential biomarker for determining metastatic testicular cancer.

Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">CRACD-like protein</span>

CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.

<span class="mw-page-title-main">FAM227a</span> Protein

FAM227A is a protein that in humans is encoded by FAM227A gene. Current studies have determined the location of this gene to be in the nuclear region of the cell. FAM227A is most highly expressed in the tissues of the fallopian tube, testis, and pituitary gland. FAM227A is present in species of mammals, birds and reptiles, and gene alignment sequences have shown that FAM227A is a rapidly evolving gene.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C16orf46</span> Human gene

Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.

<span class="mw-page-title-main">Chromosome 9 open reading frame 43</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 43 is a protein that in humans is encoded by the C9orf43 gene. The gene is also known as MGC17358 and LOC257169. C9orf43 contains DUF 4647 and a polyglutamine repeat region although protein function is not well understood.

C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">C14orf180</span> Protein-coding gene in the species Homo sapiens

C14orf180 is found on chromosome 14 in humans: 14q32.33. It consists of 1832 bp and 160 amino acids post translation. There is a total number of 6 exons. C14orf180 is also known as NRAC, C14orf77, and Chromosome 14 Open Reading Frame 180.

<span class="mw-page-title-main">C9orf85</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">SCRN3</span> Protein-coding gene in the species Homo sapiens

Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.

References

  1. "Homo sapiens family with sequence similarity 149 member B1 (FAM149B1), mRNA" . Retrieved 6 February 2017.
  2. "FAM149B1 Family with sequence similarity 149 member B1" . Retrieved 6 February 2017.
  3. 1 2 3 "Gene: FAM149B1". Ensembl. Retrieved 6 February 2017.
  4. 1 2 The Human Protein Atlas, FAM149B1 http://www.proteinatlas.org/ENSG00000138286-FAM149B1/tissue
  5. 1 2 NCBI, National Center for Biotechnology Information, H. sapiens FAM149B1 ETS https://www.ncbi.nlm.nih.gov/gene/317662
  6. 1 2 "FAM149B1 Gene". GeneCards. Retrieved 6 February 2017.
  7. "FAM149B1 family with sequence similarity 149 member B1 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-05-06.
  8. "Protein of unknown function DUF3719 (IPR022194)". InterPro.
  9. PI, Biology Workbench. Program by Dr. Luca Toldo, developed at http://www.embl-heidelberg.de. Changed by Bjoern Kindler to print also the lowest found net charge. Available at EMBL WWW Gateway to Isoelectric Point Service {{cite web |url=http://www.embl-heidelberg.de/cgi/pi-wrapper.pl |title=Archived copy |access-date=2014-05-10 |url-status=dead |archive-url=https://web.archive.org/web/20081026062821/http://www.embl-heidelberg.de/cgi/pi-wrapper.pl |archive-date=2008-10-26 }}
  10. AASTATS, Biology Workbench. by Jack Kramer, 1990.
  11. SAPS, Biology Workbench. Algorithm: Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E. & Karlin, S. (1992) "Methods and algorithms for statistical analysis of protein sequences" Proc. Natl. Acad. Sci. U.S.A. 89, 2002-2006. Program: Volker Brendel, Department of Mathematics, Stanford University, Stanford CA 94305, U.S.A.
  12. Zhu, Z. Y.; Karlin, S. (1996-08-06). "Clusters of charged residues in protein three-dimensional structures". Proceedings of the National Academy of Sciences. 93 (16): 8350–8355. Bibcode:1996PNAS...93.8350Z. doi: 10.1073/pnas.93.16.8350 . ISSN   0027-8424. PMC   38674 . PMID   8710874.
  13. 1 2 3 4 "GDS4877 / 213463_s_at". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  14. "PSORT prediction tool". GenScript.
  15. NetAcet: Prediction of N-terminal acetylation sites., Accessed through Expasy. Lars Kiemer, Jannick Dyrløv Bendtsen and Nikolaj Blom. Accepted in Bioinformatics, 2004.
  16. NetPhos: Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Proteomics: Jun;4(6):1633-49, review 2004.
  17. SUMOplot Analysis Program, Developed by Abgent, Copyright © 2013 -2017 http://www.abgent.com/sumoplot
  18. "FAM149B1 search results". pax-db.org. Retrieved 2017-05-07.
  19. NCBI GeoProfile, Homo sapiens FAM149B1, Profile GDS596 https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS596%3A213463_s_at
  20. Group, Schuler. "EST Profile - Hs.408577". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.{{cite web}}: |last= has generic name (help)
  21. 1 2 "GDS3578 / 213463_s_at". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  22. Huttlin, E., Ting, L., Bruckner, R., Gebreab, F., Gygi, M., Szpyt, J., . . . Gygi, S. (2015). The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell, 162(2), 425-440. doi : 10.1016/j.cell.2015.06.043
  23. "ABHD8_HUMAN". UniProt.
  24. "METTL16 Gene". GeneCards. Archived from the original on 2011-11-16.
  25. "SLC6A17 Gene". GeneCards. Archived from the original on 2011-11-07.
  26. "TM2D1 Gene". GeneCards. Archived from the original on 2011-11-30.
  27. "Homo sapiens family with sequence similarity 149 member A (FAM149A), t - Nucleotide - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-05-06.
  28. 1 2 3 "ALIGN". Biology Workbench. Retrieved 26 February 2017.
  29. "NCBI BLAST". National Center for Biotechnology Information. Retrieved 26 February 2017.
  30. "Time Tree". Time Tree. Retrieved 26 February 2017.
  31. Ikeda, A; Shimizu, T; Matsumoto, Y (19 September 2013). "Leptin receptor somatic mutations are frequent in HCV-infected cirrhotic liver and associated with haptocellular carcinoma". Gastroenterology. 146 (1): 222–32. doi:10.1053/j.gastro.2013.09.025. hdl: 2433/180778 . PMID   24055508.
  32. Hadj-Hamou, NS; Lae, M; Almeida, A (24 April 2012). "A transcriptome signature of endothelial lymphatic cells coexists with the chronic oxidative stress signature in radiation-induced post-radiotherapy breast angiosarcomas". Carcinogenesis. 33 (7): 1399–405. doi: 10.1093/carcin/bgs155 . PMID   22532251.
  33. Detection of Convergent Genome-Wide Signals of Adaptation to Tropical Forests in Humans. (Research Article) Amorim, Carlos Eduardo G.; Daub, Josephine T.; Salzano, Francisco M.; Foll, Matthieu; Excoffier, Laurent PLoS ONE, April 7, 2015, Vol.10(4), p.e0121557 [Peer Reviewed Journal]
  34. Adaptive evolution of loci covarying with the human African Pygmy phenotype Mendizabal, Isabel; Marigorta, Urko; Lao, Oscar; Comas, David Human Genetics, 2012, Vol.131(8), pp.1305-1317 [Peer Reviewed Journal]