Family with sequence similarity 149, member A is a protein that in humans is encoded by the FAM149A gene (also known as MSTP119, MST119 and DKFZP564J102). [5] It is well conserved in primates, dog, cow, mouse, rat, and chicken. It has one paralog, FAM149B.
Proteins are large biomolecules, or macromolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific three-dimensional structure that determines its activity.
In biology, a gene is a sequence of nucleotides in DNA or RNA that codes for a molecule that has a function. During gene expression, the DNA is first copied into RNA. The RNA can be directly functional or be the intermediate template for a protein that performs a function. The transmission of genes to an organism's offspring is the basis of the inheritance of phenotypic trait. These genes make up different DNA sequences called genotypes. Genotypes along with environmental and developmental factors determine what the phenotypes will be. Most biological traits are under the influence of polygenes as well as gene–environment interactions. Some genetic traits are instantly visible, such as eye color or number of limbs, and some are not, such as blood type, risk for specific diseases, or the thousands of basic biochemical processes that constitute life.
The domestic dog is a member of the genus Canis (canines), which forms part of the wolf-like canids, and is the most widely abundant terrestrial carnivore. The dog and the extant gray wolf are sister taxa as modern wolves are not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first species to be domesticated and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.
FAM149A is found in normal cardiac tissue of Homo sapiens and has been submitted to the Molecular Medicine Center for Cardiovascular Disease in 1999. Thus, this indicates it must play an important role in normal heart regulation. However, no variation report or information of clinical significance has been found for this gene, according to NCBI. According to the Basic Local Alignment Search Tool (BLAST), FAM149A is similar to cDNA FLJ32604 (98% query cover), which is found in stomach tissue and has no known function. FAM149A is also similar to cDNA FLJ58677 (86% query cover), which is found in fetal kidney tissue with no known function.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
Information acquired from:
https://www.ncbi.nlm.nih.gov/
FAM149A consists of 2721 base pairs and 482 amino acids and is located on chromosome 4q35.1. It runs on the positive strand of chromosome 4. Other genes are also found nearby on the same chromosome, including TLR3, CYP4V2, FLJ38576, ORAOV1P1, and SORBS2. [6]
Chromosome 4 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome. Chromosome 4 spans more than 186 million base pairs and represents between 6 and 6.5 percent of the total DNA in cells.
FAM149A possess one major paralog, FAM149B. Not much is currently known about FAM149B besides its membership in the overall FAM149 family of genes.
Orthologs of FAM149A include BRTD and its four isoforms, ECCHC11 and ALMS1. These genes are all found in humans and have conserved areas with FAM149A.
Species | Common name | Accession number | Length | Protein identity | Protein similarity | Date of divergence (millions of years) | |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | NP_001073963.1 | 482aa | 100% | 100% | 0 | |
Pongo abelii | Orangutan | XP_002815398.2 | 481aa | 93.2% | 95.0% | 15.7 | |
Nomascus leucogenys | Northern white-cheeked gibbon | XP_004093218.1 | 482aa | 92.7% | 95.0% | 20.4 | |
Equus ferus caballus | Horse | XP_001490414.3 | 480aa | 72.0% | 81.0% | 94.2 | |
Taeniopygia guttata | Zebra finch | XP_002193183 | 485aa | 46.0% | 62.0% | 296 | |
Monodelphis domestica | Opossum | XP_001368447.2 | 1133aa | 19.5% | 61.0% | 162.6 | |
Xenopus tropicalis | Western clawed frog | XP_002934449 | 427aa | 22.0% | 65.0% | 371.2 | |
FAM149A has a conserved domain of unknown function (DUF) 3719. The DUF 3719 has very little information. It is only found in eukaryotic organisms and is made of 70 amino acids. There is a conserved HLR sequence motif found in DUF 3719. Below is an image showing the DUF3719 on FAM149A.
From the Sanger Institute, the following image shows the species in which this family exists in. The purple color indicates that DUF3719 is only existent in eukaryotic organisms. Colors, such as green, would indicate that DUF3719 exists in bacteria. When this diagram is used interactively on the website, it states that 23 species in Eukaryota have the domain. [7]
FAM149A diverged from amphibians around 400 million years ago, birds 300 million years ago and mammals, not including primates, 94 million years ago. Divergence from primates last occurred around 5 million years ago. [8]
As previously stated, FAM149A is made up of 482 amino acids. The amino acids which play a part in the translation of the FAM149A gene into the FAM149A protein are shown below, along with matching base pairs. The protein is located between bp 534 and bp 1982.
There are some programs used to determine post-translational modifications in FAM149A. [9] The tests and results for each are listed below.
NetPhos: This will provide predicted phosphorylation sites within your protein, occurring on serines, tyrosines, and threonines. Scores are provided that indicate the quality of the predicted site. A “good” score is closer to 1.0, while a low score is closer to zero. Results: Phosphorylation sites predicted: Ser: 20 Thr: 16 Tyr: 2 All of these predicted sites had scores above 0.514, most between 0.8-0.9. Image generated:
Sulfinator: This is used to predict tyrosine sulfation sites made as proteins go through secretory pathway. There were no results for FAM149A. Therefore, there aren’t any tyrosine sulfation sites.
NetAcet: Predicts N-terminal acetylation sites.
Here are the results:
According to NetAcet, there are no N-terminal acetylation sites for FAM149A.
SUMOplot/SUMOsp: Used to predict potential sumoylation sites. These may explain larger molecular weights than expected on SDS gels due to attachment of SUMO proteins.
The results can be seen below:
The secondary structure of the FAM149A protein is based on a local three-dimensional structure. The structures analyzed include the α-helix, β-strand, β-turn, and random coil. Results were obtained using GOR4 and PELE [10] from Biology WorkBench. GOR4 is a simplified version, and PELE compares predicted structures from other programs.
Here is the promoter for the FAM149A gene provided by ElDorado [11] and the sequence extracted from the information.
Segment | Start Location | Stop Location | Strand | Length | Reference Number | Information |
---|---|---|---|---|---|---|
Promoter Region | 187065495 | 187066181 | + | 687 bp | GXP_210035 | Promoter for GXT_23739713, GXT_23739714, GXT_2803949 Locus: FAM149A/GXL_175098 |
Primary Transcript | 187065995 | 187093817 | + | 27283 bp | GXT_2803949, GXL_175098 | FAM149A Homo sapiens family with sequence similarity 149, member A (FAM149A), transcript variant 1, mRNA. GeneID:25854/NM_015398 |
The following is a FASTA formatted version of the FAM149A promoter.
Through the NCBI website, an additional 1000 basepairs were added to the selected region on chromosome 4 containing FAM149A. Once the start and end positions were established, the positions were transferred to the ECR Browser to create an alignment across other species.
According to the results, there are 14 exons within FAM149A, which are conserved in the monkey, dog, mouse, and opossum. The chicken, frog, and fish show little to no conservation. Within the first 1000 base pairs prior to the start of the transcription, there appears to be no notable conservation across species. Only the dog contains what is considered as an Evolutionary Conserved Region (ECR). [12]
Based on the graphs on the right, the highest levels of expression occur in the trigeminal ganglion, superior cervical ganglion, atrioventricular node (heart), and kidney. However, at least a small amount seems to be expressed in almost all tissues in the human body. Using the same micro arrays provided by Bio GPS, [13] expression of FAM149A was found to vary through the shedding of the endometrium during menstruation. This opens a new avenue for possible exploration of the function of the gene.
A search was performed on the Allen Brain Atlas using FAM149A. According to the levels of expression provided by the Atlas, FAM149A is not expressed in notable levels within the mouse brain. However, with visual observation of the figure, FAM149A could be found in the ventral posterior complex of the thalamus. This can be seen as the dark vertical line in the center of the sagittal brain slice in the image below. As a comparison, the expression of the protein, actin, is used to demonstrate what a mouse brain appears like with high levels of expression. [14]
The data from the figure below indicates that FAM149A is highly expressed in the brain, nerves, pancreas, adrenal gland, and kidney. There is no expression in the heart. From the information in the second table, common complications involving FAM149A expression include adrenal tumors, pancreatic tumors, colorectal tumors, and ovarian tumors. [15]
FAM149A has two transcription variants, transcript variant 1 and transcript variant 2. Both code for the same FAM149A protein. Differences include additional base pairs in the 5' untranslated area as well as the 3' untranslated region. One of two differences in the actual translated area of the protein is a G instead of an A at bp 1590 in Variant 1 and bp 1337 in Variant 2. The other difference consists of a C instead of an A at bp 2214 in TV1 and bp 1961 in TV2.
As stated above, FAM149A is made up of 482 amino acids. The most common amino acid is serine which makes up 9.8% of the gene. The least common amino acids are tryptophan and cysteine which each make up only 1.2% of the gene. The only recurring combination of amino acids in the protein is SLAS which occurs from amino acids 234-237 and from 324-327. In addition, the Isoelectric Point of FAM149A is 9.891999 [16]
The following is an analysis of the promoter region for FAM149A. It shows a number of transcription factor binding sites that may have strong contribution to regulating the genetic expression. The image below shows the locations of the binding sites. The binding sites were analyzed to find any possible unique functions.
There were many results, but the ones with the highest similarity and highest abundance were chosen, as they are most likely to be present on the actual gene. Matrix families of interest include the Huntington's disease gene regulatory region, nerve growth factor, nuclear respiratory factor, pleomorphic adenoma gene, zinc finger transcription factors, and an E2F-myc activator/cell cycle regulator. Many of them had interactions revolving the zinc finger complex, which suggests this may be important for FAM149A. [17]
FAM149A has potential interactions with ZNF385D, C10orf10, PNMAL1, CPN2, C10orf72, VPS13D, and RBMS3. [18] Based on previous research on binding sites, many were frequently involved with zinc finger proteins. According to the results from STRING, the second strongest associating protein is zinc finger protein 385D. However, it cannot be concluded these are the only interacting proteins, as it seems there is little to not research involving FAM149A interactions. The Molecular Interaction Database (MINT) was used as an additional source for protein interactions. However, FAM149A was not in the database. Based on the list of functional partners by STRING, the top 5 are also not in the MINT database. Another interaction database, I2D Protein-Protein Interaction [19] showed possible interaction with the Protein PRKAG1, however interaction was weak.
Below is the list of proteins that potentially interact with FAM149A.
While not conclusively linked, FAM149A has been found to be one of 15 candidate genes for the contribution of development of cancer and dysplastic lesions. [20] The same paper also noted the down regulation of the gene during oral cancer, providing a possible route of study.
Uncharacterized protein KIAA1109 is a protein that in humans is encoded by the KIAA1109 gene.
Gene C11orf16, chromosome 11 open reading frame 16, is a protein in humans that is encoded by the C11orf16 gene. It has 7 exons, and the size of 467 amino acids.
Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.
Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.
Coiled-coil domain containing 109B (CCDC109B) is a potential calcium uniporter protein found in the membrane of human cells and is encoded by the CCDC109B gene. While CCDC109B is a transmembrane protein it is unclear if it is located within the cell membrane or mitochondrial membrane.
The FAM185A is a protein that in humans is encoded by the FAM185A gene. The FAM185A gene is found on the positive strand of Chromosome 7 at 7q22.1. The gene begins 102,389,399bp from the p-terminus of the chromosome and ends at 102,449,672bp from the p-terminus; it covers a total of 73,308 basepairs. The protein encoded by this gene is characterized by the presence of multiple copies of DUF4098 near its C-terminus. It is described as a Long Interspersed Nuclear Element (LINE), a subclass of penaeid repetitive elements (PREs).
Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
Coiled-coil domain containing 94 (CCDC94), is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.
Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.
Family with sequence similarity 63, member A is a protein that, in humans, is encoded by the FAM63A gene. It is located on the minus strand of chromosome 1 at locus 1q21.3.
Transmembrane protein 251, also known as C14orf109 or UPF0694, is a protein that in humans is encoded by the TMEM251 gene. One notable feature of this protein is the presence of proline residues on one of its predicted transmembrane domains., which is a determinant of the intramitochondrial sorting of inner membrane proteins.
PRP36 is an extracellular protein in Homo sapiens that is encoded by the PRR36 gene that contains a domain of unknown function, DUF4596, towards the C terminus of the protein. The function of PRP36 is unknown, but high gene expression has been observed in various regions of the brain such as the prefrontal cortex, cerebellum, and the amygdala. PRP36 has one alias: Putative Uncharacterized Protein FLJ22184.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Zinc Finger Protein 800 or ZNF800 is a protein that in humans is encoded by the ZNF800 gene. The specific function of ZNF800 is not yet well understood by the scientific community.
C11orf42 is an uncharacterized protein in homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
FAM71E2, also known as Family With Sequence Similarity 71 Member E2, is a protein that, in humans, is encoded by the FAM71E2 gene. Aliases include C19orf16, Protein FAM71E2, Chromosome 19 open reading frame 16, and Putative Protein FAM71E2. The gene is primarily conserved in mammals, but it is also conserved in two reptile species.