FAM214A

Last updated
FAM214A
FAM214A Gene Location on Chromosome 15.png
Identifiers
Aliases FAM214A , KIAA1370, family with sequence similarity 214 member A
External IDs MGI: 2387648 HomoloGene: 35065 GeneCards: FAM214A
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001286495
NM_019600

NM_001113283
NM_153584
NM_001359816

RefSeq (protein)

NP_001273424
NP_062546

NP_001106754
NP_705812
NP_001346745

Location (UCSC) Chr 15: 52.58 – 52.71 Mb Chr 9: 74.86 – 74.94 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). [5] The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. [6] Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis. [7]

Contents

Gene

Overview

The FAM214A gene is located on the negative DNA strand (see Sense (molecular biology)) of chromosome 15 between position 52,873,514 and 53,002,014; thus making the gene 97,303 base pairs (bp) long. [5] [8] [9] FAM214A has been previously labeled with two other aliases, known as KIAA1370 and FLJ10980. [5] The FAM214A gene is predicted to contain 12 exons which comprise the final 4231 bp mRNA transcript after transcription has occurred. [10] It is this mRNA product that is then translated into the final FAM214A protein with the help of the promoter sequence and transcription factors. The promoter for the FAM214A mRNA sequence was predicted and analyzed by the El Dorado program on Genomatix. [11] This promoter is 601 base pairs long and spans a portion of the 5' UTR. [11]

FAM214A location on chromosome 15 FAM214A Gene Location on Chromosome 15 from Gene Cards.png
FAM214A location on chromosome 15
Diagram of the FAM214A gene including introns and exons on chromosome 15 FAM214A Gene Diagram.PNG
Diagram of the FAM214A gene including introns and exons on chromosome 15

Gene expression

Expression data for FAM214A obtained from Gene Cards Gene Cards Expression of FAM214 Gene.PNG
Expression data for FAM214A obtained from Gene Cards

FAM214A is considered to be ubiquitously expressed (or very nearly so) in low levels according to a number of sources such as BioGPS and the Expression Atlas. [12] [13] [14] As can be seen in the BioGPS image below, there is a significantly higher expression level in immune-related cells and tissues, thus suggesting an immune role; however, there has been no specific in situ evidence to support this claim. Expression data has been collected from a number of studies performed on a large range of genes, therefore, some of the data is contradictory in nature.

Expression data for FAM214A obtained from BioGPS Human FAM214A BioGPS Gene Expression Profile.png
Expression data for FAM214A obtained from BioGPS

Protein

Overview

The function of the FAM214A protein in humans is still unknown; however, there are three functional term associations including "biological process," "cellular component," and molecular function," that describe the function of this protein on The Gene Ontology which predict implications of its primary function in vivo . [15] [16] The protein product of FAM214A consists of 1076 amino acids (aa), has been predicted to have a molecular mass of 121,700 daltons, and has an isoelectric point around pH 7.7. [6] [17] [18] This protein is predicted to remain in the nucleus after transcription based upon its lack of signal peptide sequence and the predictions of the program PSORTII. [19] Due to alternative splicing, two other isoforms (Q32MH5-2 and Q32MH5-3) have been observed. They differ slightly from the primary product. [20] Isoform 2 has four different amino acids from bases 960-960 and is missing the end of the sequence from bases 964-1076. [20] Isoform 3 has seven extra amino acids added to the beginning of the sequence after the methionine. [20]

After being translated, the FAM214A protein is predicted to remain in the nucleus by more than one type of subprogram on PSORT II. [19] This protein has a pat4 signal, one of the two "classical" nuclear localization signals (NLSs), starting at residue 709. [21] Although it does not have the second "classical" NLS, pat7, nor the "non-classical" bipartite NLS it is still predicted to be targeted for the nucleus by the NCNN score. [21] [22] This score predicts whether the protein is targeted for the nucleus or the cytoplasm based upon the amino acid sequence. [21] [22] For the FAM214A protein, the NCNN score predicted nuclear localization with 94.1% certainty. [21] [22] Based upon this information, PSORT generates an overall prediction of the protein's subcellular localization. For FAM214A, the predicted values were 69.6% for the nucleus as compared to 13.0% for the mitochondria, 8.7% for the cytoplasm, and 4.3% for the secretory vesicles and endoplasmic reticulum. [19]

Post-translational modifications

Predicted phosphorylated sites found within the FAM214A protein FAM214A Post-Translational Phosphorylation Predictions.gif
Predicted phosphorylated sites found within the FAM214A protein

This protein most likely does not undergo a significant number of post-translational modifications due to the lack of signal peptide sequence predicted by NetNGlyc and NetOGlyc on the ExPASy web server. [24] [25] This is because much of the intracellular machinery performing post-translational modifications requires the protein to move through organelles such as the endoplasmic reticulum and Golgi apparatus. Without a signal peptide sequence, the protein generally does not leave the nucleus, which was predicted by PSORT II as described above. [19]

A SAPS analysis of this protein was performed against the swp23s.q database, which indicated the presence of an abnormally large number of serine amino acids and an abnormally small number of alanine amino acids in this protein. [17] According to a review article by Fayard et al., phosphoinositide-dependent kinase 2 (PDK2) is a serine/threonine kinase that is important for regulating cell cycle. Because the FAM214A protein has a larger number of serine groups than is considered normal, there is the possibility that PDK2 has an important effect on this protein. [26] In order to determine whether the excessive number of serines were actually predicted to be phosphorylated, the protein sequence was run through the program NetPhos from the ExPASy webserver. [23] This program predicted the phosphorylation of 69 serines, 14 threonines, and 9 tyrosines. [23] According to the SAPS analysis from above, there are a total or 134 serines, thus indicating that approximately half are predicted to be phosphorylated in vivo . A diagram of the phosphorylation predictions is shown to the right.

One other type of post-translational modification was predicted for the FAM214A protein by the program NetCorona on ExPASy. [27] The program predicted a single cleavage site between position 214 and 215 in the FAM214A protein sequence after translation. [27]

Protein interactions

There are number of transcription factor binding sites predicted for the FAM214A promoter sequence. [11] A few of the ones with the highest predicted confidence are provided in the table below. [11]

Possible Transcription Factors Predicted to Bind to the FAM214A Promoter Sequence

Predicted Transcription FactorStartEndStrandConfidence
Transcription factor II B (TFIIB) recognition element97103Negative1.0
Myeloid zinc finger protein MZF1151161Negative1.0
Myelin transcription factor 1-like, neuronal C2HC zinc finger factor 1388400Negative0.945
Androgene receptor binding site, IR3 sites495513Negative0.923
Wilms Tumor Suppressor117Positive0.968
Non-palindromic nuclear factor I binding sites2747Positive0.988
Alternative splicing variant of FOXP1, activated in ESCs383383Positive1.0
Pleomorphic adenoma gene 1488510Positive1.0
ETS-like gene 1 (ELK-1)569589Positive0.961
FAM214A non-transcription factor predicted protein interaction FAM214A Protein Interaction Diagram.PNG
FAM214A non-transcription factor predicted protein interaction

The only protein predicted according STRING to interact with the FAM214A protein is called MFSD6L. This protein belongs to the major facilitator superfamily is predicted to be a transmembrane protein. Like FAM214A, the function of this protein has not yet been characterized through experimentation or research. [28] [29] Because this MFSD6L protein is the only FAM214A protein interaction predicted with any certainty, the sequence for it was run through the PSORT II program. The data from the NLS subprogram predicted the presence of a single pat4 and two pat7 NLS sequences, thus indicating possible nuclear localization. [19] [21] The NCNN score, on the other hand, predicted cytoplasmic localiztion with 94.1% certainty, thus leaving the overall PSORT II score at 39.1% plasma membrane, 39.1% endoplasmic reticulum, 4.3% vacuolar, 4.3% vesicles of secretory system, 4.3% Golgi, 4.3% mitochondrial, and 4.3% nuclear. [21] [22] This is contradictory as there are three total nuclear localization signals, but this may be due to the fact that the significant transmembrane nature of the MFSD6L protein may be causing issues with these predictions. [21]

Small percentage of FAM214A tertiary structure FAM214A Secondary Structure.png
Small percentage of FAM214A tertiary structure

Secondary and tertiary structure

The secondary structure of the FAM214A protein consists of a number of alpha helices and beta sheets as predicted by Biology Workbench and Protein Homology/analogYRecognition Engine (PHYRE). [30] [31] The PHYRE program predicts that 66 percent of the FAM214A secondary structure is disordered and therefore unable to be analyzed and converted into a tertiary structure prediction. [30] It was; however, able to predict approximately 10 percent of the protein's structure with 95 percent significance. [30] The diagram for this is shown to the left. [30]

Conservation

Paralog

A single paralogous gene has been found on chromosome 9 in Homo sapiens and is named FAM214B (family with sequence similarity, B). [32] FAM214B, although considered a paralog, has a significantly different protein sequence from that of FAM214A. When the two were compared against each other on NCBI's BLAST, the only significant similarity observed was within the last 200 amino acids (where the DUF4210 and Chromosome_Seg domains are located). [33] Although the similarity between FAM214A and B is low, these two proteins are in the same protein family and contain the same two conserved domains. [7] [34]

Orthologs

The FAM214A protein has a significant number of orthologs across a large number of taxonomic groups including Mammalia, Aves, Reptilia, Amphibia, Actinopterygii, Echinoidea, Insecta, Trematoda, Crustacea, Tricoplacia, Anthozoa, and Eurotiomycetes. [35] This indicates that the FAM214A protein is well conserved within Eukaryotes but does not appear to be conserved in Bacteria or Archaea. In all orthologs, the most-conserved region was near the end of the protein where the conserved domains are (see below). Orthologs for the human FAM214A protein were found as far back as Tuber melanosporum, Talaromyces stipitatus, and Aspergillus nidulans, which all diverged approximately 1215 million years ago.

Orthologs for the FAM214A Protein

Genus Species Common nameDivergence from human liineage (MYA) [36] NCBI protein accession numberSequence lengthPercent identity to human sequence [33] Common gene name
Homo sapiens Human- NP_062546.2 1076100FAM214A
Pan troglodytes Common Chimpanzee6.3 XP_003314724 108399FAM214A
Pan paniscus Bonobo6.3 XP_003827895.1 1076100FAM214A
Rattus norvegicus Rat92.3 NP_001100308 1074100LOC300836
Bos taurus Cow94.2 XP_601152 1087100KIAA1370
Canus lupus familiaris Dog 94.2 XP_544682 1081100KIAA1370
Ornithorhynchus anatinus Platypus167.4 XP_001515207 116995KIAA1370
Gallus gallus Chicken296.0 NP_001005811 109399FAM214A
Taeniopygia guttata Zebra Finch296.0 XP_002196177 111299FAM214A
Anolis carolinensis Carolina Anole296.0 XP_003227400 108699KIAA1370
Xenopus tropicalis Tropical Clawed Frog371.2 NP_001015702 94698FAM214A
Danio rerio Zebrafish400.1 NP_001189349 102175FAM214A
Apis mellifera Honey Bee782.7 XP_393903 133945LOC410423
Strongylocentrotus purpuratus Sea Urchin742.9 XP_799179 29727FAM214A-like
Drosophila melanogaster Fruit Fly782.7 NP_610688 129727CG9005
Schistosoma mansoni Schistosome Parasite792.4 XP_002579285 76626Hypothetical Protein
Daphnia pulex Common Water Flea782.7 EFX87516 20018Hypothetical Protein DAPPUDRAFT_207300
Nematostella vectensis Sea Anemone855.3 mya XP_001633540 19118Hypothetical Protein
Tuber melanosporum Truffle1215.8 XP_002841833 62215Hypothetical Protein
Talaromyces stipitatus-1215.8 XP_002478567 79725Conserved Hypothetical Protein
Aspergillus nidulans Filamentous Fungus1215.8 XP_658605 72815hypothetical protein AN1001.2

Phylogeny

The evolutionary relationship between FAM214A and its orthologous proteins Phylogenetic Tree of FAM214A Orthologs.gif
The evolutionary relationship between FAM214A and its orthologous proteins

An unrooted phylogenetic tree of 20 orthologs was generated by the CLUSTALW program on Biology Workbench to demonstrate the evolutionary relationship between FAM214A and its orthologs. [31]

Conserved domains

Within the FAM214A protein, there are three well-conserved regions. These include a well-conserved region near the n-terminus of the protein and two conserved domains including the Domain of Unknown Function 4210 (DUF4210) and a Chromosome_Seg domain near the c-terminus. [7] A schematic diagram of these three regions is shown below. The well-conserved region near the n-terminus of the protein is not predicted to contain any known domains or motifs; however, the cleavage site predicted by NetCorona above is located within this region and it is well-conserved in a majority of the proteins orthologous to FAM214A. [27] The two conserved domains located at the end of this protein are the most important portion of the peptide based upon evolutionary history. All organisms in the Ortholog table above except the platypus (which is missing the Chromosome_Seg domain) contain both of these conserved domains within their protein sequence. [7]

Schematic of the Homo sapiens FAM214A protein diagramming well-conserved regions and their locations FAM214A Protein Schematic.PNG
Schematic of the Homo sapiens FAM214A protein diagramming well-conserved regions and their locations

Related Research Articles

<span class="mw-page-title-main">Protein FAM46B</span> Protein-coding gene in the species Homo sapiens

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.

<span class="mw-page-title-main">CCDC144A</span> Protein-coding gene in humans

Coiled-coil domain-containing protein 144A is a protein that in humans is encoded by the CCDC144A gene. An alias of this gene is called KIAA0565. There are four members of the CCDC family: CCDC 144A, 144B, 144C and putative CCDC 144 N-terminal like proteins.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">CCDC94</span> Protein found in humans

Coiled-coil domain containing 94 (CCDC94) is a protein that in humans is encoded by the CCDC94 gene. The CCDC94 protein contains a coiled-coil domain, a domain of unknown function (DUF572), an uncharacterized conserved protein (COG5134), and lacks a transmembrane domain.

C5orf34 is a protein that in humans is encoded by the C5orf34 gene (5p12).

<span class="mw-page-title-main">Proser2</span> Protein-coding gene in the species Homo sapiens

PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.

<span class="mw-page-title-main">FAM76A</span> Protein-coding gene in the species Homo sapiens

FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.

<span class="mw-page-title-main">FAM210B</span> Protein-coding gene in the species Homo sapiens

FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">Proser1</span>

PROSER1 is a protein that in humans is encoded by the PROSER1 gene.

<span class="mw-page-title-main">ERICH2</span> Protein-coding gene in the species Homo sapiens

Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.

<span class="mw-page-title-main">FAM71F2</span> Protein-coding gene in the species Homo sapiens

FAM71F2 or Family with Sequence Similarity 71 member F2 is a protein that in humans is encoded by the Family with Sequence Similarity 71 member F2 gene. This gene is highly active in the reproductive tissues, specifically the testis, and may serve as a potential biomarker for determining metastatic testicular cancer.

Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.

The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.

<span class="mw-page-title-main">LOC101059915</span> Protein-coding gene in the species Homo sapiens

LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C14orf119</span> Protein-coding gene in the species Homo sapiens

C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000047346 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000034858 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 "Gene Cards: FAM214A family with sequence similarity 214, A".
  6. 1 2 "Protein FAM214A". NCBI. Retrieved 2 Feb 2013.
  7. 1 2 3 4 "NCBI Conserved Domains".
  8. "Gene Loc Map Region around Gene FAM214a". Gene Cards.
  9. 1 2 "FAM214A family with sequence similarity 214, A". NCBI.
  10. "Homo sapiens family with sequence similarity 214, member A (FAM214A), mRNA". NCBI. 2013-04-17.
  11. 1 2 3 4 "Genomatix: El Dorado". Genomatix.
  12. 1 2 "FAM214A Gene Expression from Gene Cards". Gene Cards.
  13. 1 2 "FAM214A Gene Expression from BioGPS". BioGPS.
  14. "FAM214A Gene Expression From Expression Atlas". Archived from the original on 2013-06-16.
  15. "The Gene Ontology".
  16. "The Gene Ontology: Term Associations".
  17. 1 2 "Biology Workbench: SAPS".
  18. Kozlowski, LP (2016). "IPC - Isoelectric Point Calculator". Biology Direct. 11 (1): 55. doi: 10.1186/s13062-016-0159-9 . PMC   5075173 . PMID   27769290.
  19. 1 2 3 4 5 "PSORT II Prediction".
  20. 1 2 3 "Protein FAM214A - Homo sapiens (Human)". UniProt.
  21. 1 2 3 4 5 6 7 "PSORT II NLS". PSORT.
  22. 1 2 3 4 Reinhardt A, Hubbard T (May 1998). "Using neural networks for prediction of the subcellular location of proteins". Nucleic Acids Research. 26 (9): 2230–6. doi:10.1093/nar/26.9.2230. PMC   147531 . PMID   9547285.
  23. 1 2 3 "NetPhos". ExPASy.
  24. "NetNGlyc". ExPASy.
  25. "NetOGlyc". ExPASy.
  26. Fayard E, Tintignac LA, Baudry A, Hemmings BA (December 2005). "Protein kinase B/Akt at a glance". Journal of Cell Science. 118 (Pt 24): 5675–8. doi:10.1242/jcs.02724. PMID   16339964. S2CID   8984112.
  27. 1 2 3 "NetCorona". ExPASy.
  28. "Gene Cards MFSD6L". Gene Cards.
  29. "UniProt MFSD6L". UniProt.
  30. 1 2 3 4 5 "PHYRE Protein Fold Recognition Server".
  31. 1 2 3 "Biology Workbench".
  32. "Gene Cards-Paralogs". Gene Cards.
  33. 1 2 "NCBI BLAST". NCBI.
  34. "Conserved Domains FAM214B". NCBI.
  35. "Gene Cards Orthologs". Gene Cards.
  36. Hedges, SB; Dudley J; Kumar S (2006). "TimeTree: a public knowledge-base of divergence times among organisms". pp. 2971–2972.