FAM98A | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | FAM98A , family with sequence similarity 98 member A | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1919972; HomoloGene: 41042; GeneCards: FAM98A; OMA:FAM98A - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. [5] FAM98A is also characterized by a glycine-rich C-terminal domain. [6] FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.
The FAM98A gene is located on 2p22.3 in humans on the "-" (minus) strand. Including the 5' and 3' UTR, the gene spans 15,634 bases and contains 8 exons. [7]
The mRNA is 2745bp, comprising the 8 exons. The coding sequence starts at base 75 and continues until base 1631. The polyA tail signal sequence is a six-nucleotide sequence 20 bases from the 3' end of the transcript at base 2725-2730, and the polyA site is at base 2745. [8]
FAM98A is 518 amino acids in length with a molecular weight of 55.3 kDa, without modifications. Residues 10-329 comprise the DUF2465, and the remainder of the protein is a diglycine-rich C terminus. Glycine makes up approximately 20% of the protein, with the majority of these in the last 200 residues. [9]
FAM98A has six strongly predicted phosphorylation sites in DUF2465. These sites are predicted to phosphorylate S169, T178, S236, T243, S276, and S285 by protein kinase C. [10] GPS also predicts phosphorylation by protein kinase C at S285 and T178. [11] FAM98A is likely sumoylated at K183 and K195. [12] Sumoylation may allow the cell to re-localize FAM98A between the nucleus and the cytoplasm. [13] The glycine-rich C terminus has repeat GRG sequences, which has been shown to be susceptible to methylation of the arginine, either symmetrically or asymmetrically. [14] Another paper explains the effects of arginine methylation on biochemical functions such as transcription activation and repression, mRNA splicing, nuclear-cytosolic shuttling, and DNA repair. [15]
The N terminus is predicted to have multiple alpha helices, though the C terminus likely is only coiled. [16] The alpha helices do not form any channel, and FAM98A is not a transmembrane protein.
The structure of FAM98A was predicted with the program Phyre2. The N-terminal region contains several alpha helices, and a C-terminal coiled region corresponding to the glycine-rich C terminus. These two regions of the protein are connected by an alpha helix approximately 50 residues long from the residues 200-256. Phyre2 found the most similar protein to be the human protein NDC80 kinetochore complex component, a nuclear protein that binds to microtubules. [17]
FAM98A has a domain of unknown function 2465 (DUF2465) from the amino acids 10-329. Within the DUF2465, there is a heptide (VPDRGGR) near the C-terminal end that is conserved in all species tested. The C-terminal end is a glycine-rich domain (glycine makes up about 40% of the C terminus) with GGRGGR repeats. [9] At residues 149-155, there is a predicted nuclear export signal, with the sequence ICIALGM (generally [LIVFM]-X-[LIVFM]-X-[LIVFM]-X-[LIVFM]). [18] Residues 173-176 are predicted to be a nuclear localization signal KKLK (K-[K/R]-X-[K/R]). [19]
FAM98A has two paralogs: FAM98B and FAM98C. FAM98A is longest of the three paralogous protein products with 518 amino acids. It is more similar to FAM98B, whose glycine-rich C terminus is much shorter than FAM98A. FAM98C less similar than FAM98B to FAM98A, all but lacking in a C terminus after DUF2465, as well as containing more differences in the amino acid sequence within the DUF2465. All three protein products have been shown experimentally to associate non-specifically with RNA: FAM98A binds to mRNA and FAM98B is incorporated into a tRNA-splicing complex. [5]
Orthologs for FAM98A have been found in vertebrates. In insects and molluscs, there are predicted proteins for a FAM98A gene. Because there are three paralogs of FAM98 in humans, there is a common ancestor of these genes. A strict ortholog, a gene that is orthologous to FAM98A and not the entire FAM98 family, is less clear. FAM98A has not yet been thoroughly studied, compounded with the fact that many genomes are yet to be recorded, makes it more difficult to determine if the predicted FAM98A gene in mosquitoes is a strict ortholog (the split of FAM98 into FAM98A,B,C occurred before the species diverged) or if it is a homolog ("FAM98A" in mosquitoes is the ancestral FAM98 gene).
Sequence Number | Genus species (Gsp) | Common Name | Date of Divergence (MYA) (from Time Tree) | Accession # (from NCBI) | Sequence Length (AA) | Identity | Similarity |
---|---|---|---|---|---|---|---|
1 | Homo sapiens (Hsa) | Human | 0 | 518 | 100 | 100 | |
2 | Mus musculus (Mmu) | Mouse | 92.3 | 515 | 95 | 96 | |
3 | Camelus ferus (Cfe) | Bactrian Camel | 94.2 | 517 | 97 | 98 | |
4 | Pantholops hodgsonii (Pho) | Tibetan Antelope | 94.2 | 521 | 96 | 97 | |
5 | Elephantulus edwardii (Eed) | Cage Elephant Shrew | 98.7 | 517 | 94 | 96 | |
6 | Geospiza fortis (Gfo) | Medium Ground Finch | 296 | 648 | 84 | 88 | |
7 | Pseudopodoces humilis (Phu) | Ground Tit | 296 | 545 | 84 | 88 | |
8 | Alligator mississippiensis (Ami) | American Alligator | 296 | 556 | 81 | 86 | |
9 | Pelodiscus sinensis (Psi) | Chinese Soft-shelled Turtle | 296 | 549 | 85 | 88 | |
10 | Chrysemys picta bellii (Cpi) | Western Painted Turtle | 296 | 549 | 85 | 88 | |
11 | Xenopus tropicalis (Xtr) | Western Clawed Frog | 371.2 | 520 | 79 | 86 | |
12 | Anoplopoma fimbria (Afi) | Sablefish | 400.1 | 353 | 31 | 48 | |
13 | Ictalurus punctatus (Ipu) | Channel Catfish | 400.1 | 543 | 67 | 75 | |
14 | Camponotus floridanus (Cfl) | Florida Carpenter Ant | 782.7 | 516 | 41 | 53 | |
15 | Culex quinquefasciatus (Cqu) | Mosquito | 782.7 | 498 | 38 | 52 | |
16 | Ceratitis capitata (Cca) | Medfly | 782.7 | 454 | 35 | 51 | |
17 | Lepeophtheirus salmonis (Lsa) | Salmon Louse | 782.7 | 467 | 29 | 45 | |
18 | Crassotrea gigas (Cgi) | Pacific Oyster | 782.7 | 422 | 45 | 59 | |
19 | Clonorchis sinensis (Csi) | Chinese Liver Fluke | 792.4 | 378 | 35 | 47 | |
20 | Echinococcus granulosus (Egr) | Dog Tapeworm | 792.4 | 1177 | 39 | 56 | |
Genes homologous to FAM98A are predicted to occur in many taxa within Animalia, but there are other taxa outside of Animalia that may have homologous FAM98 genes in their genomes. Eukaryotes such as the opisthokonts Monosiga brevicollis (XP_00174505.1) and Capraspora owczarzaki (XP_004346371.1), and even the protist Chlorella variabilis (XP_005845167.1), a green alga, may contain FAM98 in their genomes. [20]
The homologous domain in FAM98A is the DUF2465 (Domain of Unknown Function 2465) domain. The function of this domain, like the gene itself, is largely unknown, though it has been reported that it preferentially binds to RNA, targeting mRNA in FAM98A and tRNA in FAM98B. [5]
The promoter (GXP_90934) assigned to the human FAM98A transcript (GXT_24436545) [21] is 915 bp long, and it overlaps with the transcript to include 243 bp of mRNA transcript. Nuclear respiratory factor 1 (NRF1) is a transcription factor that had seven sites predicted to bind on the promoter, four of which had a Matrix similarity - optimum score of greater than or equal to 0.085 and the two highest scoring transcription factors predicted were NRF1 with scores of 0.204 and 0.199. [22]
In a GEO large-scale human transcriptome, FAM98A was ubiquitously expressed, though not uniformly expressed. Cell types that were most highly expressed were many parts of the brain (cortex, amygdala, thalamus, corpus callosum, and pituitary gland), the testis, uterus, and smooth muscle. [23] According to Aceview, FAM98A is expressed at 3.9 times the expression of the average gene. Eleven transcripts have been identified by AceView, five of which were "good", complete (both N and C termini fully translated) proteins. From the transcripts, there are apparently two main parts of FAM98A: the first four exons and the second four exons, and these parts correspond roughly to the tertiary structure of the protein - the N-terminal alpha-helices to exons 1-4, and the long alpha-helical arm and C terminus coils to exons 5-8. [24]
The function of FAM98A has not been experimentally determined, though it has been shown to bind its DUF2465 with mRNA. [5] Kiraga et al. have noted that basic proteins bind with nucleic acids. [25] In fact, FAM98A (and it orthologs) have an unmodified isoelectric point of approximately 9. [26]
FAM98A has been experimentally shown to interact with UBC, DDX1, C14orf166, and SUMO3, and it is coexpressed with DDX1, C14orf166, and RBM25. [27] These latter three proteins interact with mRNA, as FAM98A is also predicted to do. DDX1 is a putative ATP-dependent RNA helicase in a spliceosome, likely releasing the RNA from the splicing complex. [28] C14orf166 is a polymerase II binding factor, [29] and RBM25 regulates alternative splicing. [30] All of these interactions suggest that FAM98A is a nuclear protein. FAM98A also interacts with SUMO3, which sumoylates lysines in the protein to facilitate transport across the nuclear membrane between the nucleus and cytosol. [13] FAM98A also binds nonspecific mRNA indicating a potential mRNA shuttle out of the nucleus to the ribosomes. [5]
In a study that looked at differences in expression levels of certain genes (including FAM98A) in both young and old men with high or low protein diets, the expression levels were measured as a ratio of low/high protein diets in each group of men (young and old). FAM98A had increased expression in low protein diets in both young and old men, 1.01 and 1.20, respectively. Only one other gene in the study had the same trend of increased expression in lower protein diets in both groups: THOC4. [31] THOC4, THO Complex 4 or Aly/REF export factor, dimerizes to form a larger complex and chaperones spliced mRNA, assisting with processing and export of the mRNA. [32] The paper mentions that up-regulation of mRNA in older individuals is associated with RNA binding/splicing, signaling proteins, and protein degradation; in fact, the older group has the higher expression of FAM98A in low protein diets than the younger men. [31]
Research on a population in Taiwan has found an association between young onset hypertension and two SNPs upstream of four genes at the locus 2p22.3. One of these four genes was FAM98A, though more research must be done to verify that it was FAM98A that was the gene responsible for the hypertension. [33] Indeed, FAM98A is expressed moderately high (roughly the 75th percentile) in smooth muscle and cardiac myocytes. [23]
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.
SR proteins are a conserved family of proteins involved in RNA splicing. SR proteins are named because they contain a protein domain with long repeats of serine and arginine amino acid residues, whose standard abbreviations are "S" and "R" respectively. SR proteins are ~200-600 amino acids in length and composed of two domains, the RNA recognition motif (RRM) region and the RS domain. SR proteins are more commonly found in the nucleus than the cytoplasm, but several SR proteins are known to shuttle between the nucleus and the cytoplasm.
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
Zinc finger protein 280D, also known as Suppressor Of Hairy Wing Homolog 4, SUWH4, Zinc Finger Protein 634, ZNF634, or KIAA1584, is a protein that in humans is encoded by the ZNF280D gene located on chromosome 15q21.3.
Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.
Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.
LOC105377021 is a protein which in humans is encoded by the LOC105377021 gene. LOC105377021 exhibits expressional pathology related to breast cancer, specifically triple negative breast cancer. LOC105377021 contains a serine rich region in addition to predicted alpha helix motifs.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. It is a precursor protein that becomes active after cleavage. The function is not yet well understood, but it is suggested to be active during development
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.