The Family with sequence similarity 149 member B1 is an uncharacterized protein [1] encoded by the human FAM149B1 gene, with one alias KIAA0974. [2] [3] The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. [3] The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues. [4] [5]
This gene has a possible 14 exons. It is located on the forward strand of chromosome 10 at 10q22.2 on the positive strand. [6] The total span of the gene, including 5' and 3' UTR, is 3149 base pairs. The gene is flanked on the left by NUDT13 (nudix hydrolase 13) and on the right by DNAJC9-AS1 (DNAJC9 antisense RNA 1).
The FAM149B1 protein has a possible 10 isoforms, which are determined through alternative splicing of the gene.
Isoform Name | Accession | Exons | Length (bp) |
Primary Transcript | NM_173348.1 | All (14) | 3149 |
X1 | XM_005269744.2 | All (14) | 3108 |
X2 | XM_011539737.2 | 13 | 2935 |
X3 | XM_005269745.2 | 13 | 3006 |
X4 | XM_017016164.1 | 12 | 2810 |
X5 | XM_017016165.1 | 11 | 2779 |
X6* | XM_017016166.1 | 9 | 2816 |
X6* | XM_005269747.3 | 9 | 2923 |
X7 | XM_017016167.1 | 9 | 1485 |
X8 | XM_011539740.2 | 9 | 1447 |
The primary protein encoded by the FAM149B1 gene is 583 amino acids in length and has a molecular weight of 64 kDal. The protein contains a conserved protein domain, DUF3719 [8] [6] located at the amino acids 115–179. The isoelectric point of the protein before post-translational modifications is 6.3, [9] and this isoelectric point is relatively conserved in the protein's isoforms, especially in those with the most similar composition of exons. This protein is considered serine rich, in that it expresses a higher serine composition relative to the composition of other human proteins. [10] [11] This high serine composition is also seen in the gene's orthologues.
The splice variants of the protein demonstrate some shared qualities of the protein that is translated from the primary transcript. Because each isoform is a different length and contains various combinations of the available exons, there are variances in the isoelectric point and molecular weight. The isoforms closest to the weight and exon composition to the primary transcript generally share these characteristics. The protein isoforms missing the conserved DUF3719 domain are isoforms X5 and X6 because this domain is contained between exons 3–6.
Isoform Name | Accession | Molecular Weight (kDal) | Length (aa) | Isoelectric point |
Primary Transcript | NP_775483.1 | 64 | 582 | 6.3 |
X1 | XP_005269801.1 | 63.7 | 574 | 6.3 |
X2 | XP_011538039.1 | 62.6 | 560 | 7.5 |
X3 | XP_005269802.1 | 59.8 | 540 | 6.4 |
X4 | XP_016871653.1 | 57.8 | 518 | 7.7 |
X5 | XP_016871654.1 | 53 | 476 | 6.8 |
X6* | XP_016871655.1 | 46.6 | 419 | 7.5 |
X6* | XP_005269804.1 | 46.6 | 419 | 7.5 |
X7 | XP_016871656.1 | 41 | 368 | 5.1 |
X8 | XP_011538042.1 | 38 | 348 | 5.2 |
There is a negative charge cluster from amino acids 212 to 239. Negative charge clusters often coordinate calcium, or magnesium or zinc ions, mannose-binding protein, or aminopeptidase. [12] The protein contains no positive or mixed charge clusters. The secondary structure of the protein is predicted to be a combination of mostly alpha-helices with a few predicted beta-sheet structures.
The subcellular location of the protein is the nucleus. [13] There is a leucine zipper pattern in the protein beginning at amino acid 347. [14]
The third amino acid in the protein sequence, serine, is predicted to be acetylated. [15]
There are multiple predicted phosphorylation sites on various serine, tyrosine, and threonine amino acids are predicted for this protein sequence. [16] The conserved DUF3719 domain contains 7 predicted phosphorylation sites.
One predicted sumoylation site was identified in the protein sequence at K267. [17]
Overall in the human body, this gene is expressed at levels slightly below the average human gene expression level. [18] The protein is expressed in most cell types of the human body. [19] Most experimentation shows a higher expression of this protein in kidney, testes, and brain tissues, with very low expression seen in pancreas tissues. [4] [5] The gene is expressed at lower levels than its normal expression in most cancerous tissues. The gene is also seen to be expressed most highly in fetal and infantile tissues. [20]
DNA microarray analysis experiments show expression patterns of FAM149B1 compared to multiple other genes in a sample. FAM149B1 is shown to be at a lower expression level than most other genes in a multiple myeloma cell line and was shown to increase to close to average gene expression levels after the beta-catenin was depleted from the sample. [21]
FAM149B1 expression was also shown to decrease to lower than average gene expression levels in an ovarian cancer cell line after the use of an anticancer drug named NSC319726. [13]
The gene has nine different identified promoter regions, which correlate to the various isoforms of the gene. The promoter for the primary transcript of the gene has binding sites for a variety of different transcription factors.
Current data supports the FAM149B1 protein interactions with 6 different proteins.
One protein was determined to be an interacting protein with FAM149B1 through affinity chromatography techniques.
The other five proteins that have been predicted to interact with FAM149B1 protein were found through the process of textmining.
There is one known paralog, FAM149A. [27] It is located on the human chromosome 4 at 4q35.1. The function of the protein encoded by this gene is not well understood, but it also contains the DUF3719 protein domain. The protein translated by this gene shares a 21.2% identity [28] with the FAM149B1 protein. The protein sequence is 482 amino acids in length.
This gene has orthologues across mammals, birds, reptiles, fish, and some invertebrates. [3] There is a high conservation in mammals, moderate conservation in many of the other vertebrate orthologues, and a low conservation in its few invertebrate orthologues. [29] [28]
Genus species | Common Name | Time of Divergence (MYA) [30] | Accession Number | Length (aa) | Identity [28] | |
---|---|---|---|---|---|---|
1 | Homo sapiens | Human | - | NP_775483.1 | 582 | 100% |
2 | Pongo abelii | Sumatran orangutan | 15.76 | XP_009243761.1 | 587 | 93.0% |
3 | Papio anubis | Baboon | 29.4 | XP_003903829.1 | 582 | 93.6% |
4 | Mus musculus | Mouse | 90 | XP_006518391.1 | 544 | 73.5% |
5 | Bos mutus | Domestic Yak | 96 | XP_005910201.1 | 584 | 86.0% |
6 | Orcinus orca | Killer whale, Orca | 96 | XP_004273176.1 | 585 | 87.0% |
7 | Ailuropoda melanoleuca | Giant Panda | 96 | XP_011224744.1 | 590 | 82.7% |
8 | Orycteropus afer afer | Aardvark | 105 | XP_007938812.1 | 583 | 84.0% |
9 | Monodelphis domestica | Short-Tailed Opossum | 159 | XP_007478430.1 | 587 | 73.5% |
10 | Sarcophilus harrisii | Tasmanian Devil | 159 | XP_012396086.1 | 588 | 72.0% |
11 | Ornithorhynchus anatinus | Platypus | 177 | XP_007658720.1 | 506 | 48.1% |
12 | Gallus gallus | Chicken | 312 | XP_004942035.1 | 602 | 50.4% |
13 | Lepidothrix coronata | Blue-crowned manakin | 312 | XP_017688171.1 | 576 | 47.5% |
14 | Haliaeetus albicilla | White-tailed eagle | 312 | XP_009911204.1 | 589 | 49.4% |
15 | Falco peregrinus | Peregrine falcon | 312 | XP_005235226.1 | 597 | 49.2% |
16 | Chrysemys picta bellii | Western painted turtle | 312 | XP_008169104.1 | 596 | 56.1% |
17 | Pelodiscus sinensis | Chinese softshell turtle | 312 | XP_014433498.1 | 487 | 47.1% |
18 | Alligator mississippiensis | American alligator | 312 | XP_014464842.1 | 596 | 55.0% |
19 | Xenopus tropicalis | Western clawed frog | 352 | NP_001278638.1 | 561 | 39.8% |
20 | Danio rerio | Zebra fish | 435 | NP_001074134.1 | 644 | 37.7% |
21 | Lepisosteus oculatus | Spotted gar | 435 | XP_015202055.1 | 647 | 37.9% |
22 | Oreochromis niloticus | Nile tilapia | 435 | XP_005474333.1 | 683 | 34.3% |
23 | Callorhinchus milii | Australian ghostshark | 473 | XP_007897395.1 | 638 | 36.8% |
24 | Ciona intestinalis | Sea squirt | 676 | XP_002129894.1 | 807 | 24.5% |
25 | Aplysia californica | California sea slug | 797 | XP_012945921.1 | 312 | 16.9% |
While the gene is largely not well understood by scientists, it is shown to be associated with a wide range of various cancerous tumors. [31] [32]
The FAM149B1 gene is also included in a region of 11 genes that comprises one of 15 regions containing mutations associated with the African Pygmy phenotype. [33] [34]
Family with sequence similarity 149, member A is a protein that in humans is encoded by the FAM149A gene. It is well conserved in primates, dog, cow, mouse, rat, and chicken. It has one paralog, FAM149B.
Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.
Family with sequence similarity 63, member A is a protein that, in humans, is encoded by the FAM63A gene. It is located on the minus strand of chromosome 1 at locus 1q21.3.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
PROSER1 is a protein that in humans is encoded by the PROSER1 gene.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
FAM221B is a protein that in humans is encoded by the FAM221B gene . FAM221B is also known by the alias C9orf128, is expressed at low level, and is defined by 17 GenBank accessions . It is predicted to function in transcription regulation as a transcription factor.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
FAM71F2 or Family with Sequence Similarity 71 member F2 is a protein that in humans is encoded by the Family with Sequence Similarity 71 member F2 gene. This gene is highly active in the reproductive tissues, specifically the testis, and may serve as a potential biomarker for determining metastatic testicular cancer.
KIAA1211L is a protein that in humans is encoded by the KIAA1211L gene. It is highly expressed in the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, KIAA1211L is associated with certain mental disorders and various cancers.
FAM227A is a protein that in humans is encoded by FAM227A gene. Current studies have determined the location of this gene to be in the nuclear region of the cell. FAM227A is most highly expressed in the tissues of the fallopian tube, testis, and pituitary gland. FAM227A is present in species of mammals, birds and reptiles, and gene alignment sequences have shown that FAM227A is a rapidly evolving gene.
FAM71E1, also known as Family With Sequence Similarity 71 Member E1, is a protein that in humans is encoded by the FAM71E1 gene. It is thought to be ubiquitously expressed at low levels throughout the body, and it is conserved in vertebrates, particularly mammals and some reptiles. The protein is localized to the nucleus and can be exported to the cytoplasm.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Lysine-rich nucleolar protein 1 (KNOP1) is a protein which in human's is encoded by the KNOP1 gene. Aliases for KNOP1 include TSG118, C16orf88, and FAM191A.
bMERB domain containing 1 is a gene expressed in humans which has broad expression across the brain. This gene codes for bMERB1 domain-containing protein 1 isoform 1. It is predicted that this gene is involved in actin cytoskeleton regulation, microtubule regulation and glial cell migration.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.