FAM98C

Last updated
FAM98C
Identifiers
Aliases FAM98C , family with sequence similarity 98 member C
External IDs MGI: 1921083 HomoloGene: 45483 GeneCards: FAM98C
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_174905
NM_001351675

NM_001146023
NM_028661

RefSeq (protein)

NP_777565
NP_001338604

n/a

Location (UCSC) Chr 19: 38.4 – 38.41 Mb Chr 7: 28.85 – 28.86 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. [5] FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

Contents

Gene

Locus

The FAM98C gene is located on 19q13.2 in humans on the "+" strand. FAM98C spans from 38,403,135 to 38,409,088 bp. The primary mRNA transcript for the FAM98C gene is 5,954 base pairs in length. [5] FAM98C neighbors include RASGRP4 and RYR1. [5]

Transcripts

FAM98C has two known transcript variants. [6] The first variant encodes for the longest isoform of 349 amino acids. [7] The second variant is encodes for a short isoform of 267 amino acids. [8] FAM98C is composed of eight exons. [7]

Proteins

The FAM98C protein is 349 amino acids in length with a predicted molecular weight of 37.3 kDa and a predicted isoelectric point of 6.89. [9] Composition of FAM89A protein is notable for is its abundance of Leucine(16%) and the Lysine-rich C-terminus. FAM98C shows a high scoring positive segment with 6 consecutive Lysine residues. [9]

Domains and motifs

FAM98C has a domain of unknown function 2465 (DUF2465) from the amino acids 18-334. [10] This domain of unknown function is unique to the FAM98 family and is conserved in all orthologs. [11] DUF2465 is fairly unknown but its proposed to bind to RNA. The domain in paralogs FAM98A binds to mRNA, FAM98B targets tRNA splicing. [12]

Structure

The secondary structure of FAM98C is predicted to be composed of approximately 46% alpha helix, 46% random coil and 7% extended strand. [13] [14] However, no beta strands were found in any of the predicted secondary structures. [15] The tertiary structure of FAM98C is predicted to have 10 alpha helices by the I-TASSER software. [16] [17]

Gene level regulation

Promoter

The FAM98C promoter(GXP_7536558) region is 1254 base pairs in length. Both E2F-myc activator/cell cycle regulator and Krueppel like transcription factors had nineteen sites predicted to bind on the promoter. [18]

Expression pattern

A GEO multiple normal tissue profile revealed that FAM98C is ubiquitously expressed, though not uniformly expressed. [19] [20] The highest expressions levels are in the jejunum, liver, and kidney. [19]

Sub-cellular localization

The subcellular localization of FAM98C was predicted using the PSORT II tool. [21] FAM98C is predicted to be localized in the nucleus (60.9%), followed by the mitochondria (21.7%) and then the cytoplasm (17.4%).

Protein level regulation

Post-translational modifications

Phosphorylation

FAM98C has three predicted phosphorylation sites located at amino acid positions 225, 239, and 300 that are conserved in distant orthologs. [22] The predicted phosphorylation site at position 225 is Tyrosine Kinase can function as an "on" and "off" switch. A predicted calmodulin-dependent protein kinase site at position 239. [23]

FAM98C Schematic illustration presents DUF2465 and three phosphorylation sites. FAM98C is mostly composed of the domain of unknown function 2465 The lysine-rich C-terminus is also presented in the illustration. Domain drawing.png
FAM98C Schematic illustration presents DUF2465 and three phosphorylation sites. FAM98C is mostly composed of the domain of unknown function 2465 The lysine-rich C-terminus is also presented in the illustration.

SUMOylation

Sumoylation is a post-translation modification process, that regulates a lot of proteins. The GPS CUCKOO workgroup database predicted SUMO protein sites at 347, 348 and 349. [24] These residues were conserved in even the most distant FAM98C orthologs.

Homology

Paralogs

FAM98C only has two paralogs FAM98A and FAM98B. [5]

Orthologs

Orthologs for FAM98C have been found in mammals, reptiles and amphibians. FAM98C’s orthologs are present as far back as amphibians roughly estimated 351.8 million years ago(mya). FAM98C is only present in the Metazoan kingdom but not present in protozoa. Below is a table of a variety of orthologs for human FAM98C. The orthologs listed below are in descending order in the terms of the date of divergence. [25]

Sequence NumberGenus speciesCommon NameTaxonomic Group Date of Divergence(MYA) Accession NumberSequence Length(aa)Sequence IdentitySequence Similarity
1Homo sapiensHumanPrimates0 NP_777565.3 349100%100%
2Pan troglodytesChimpanzeePrimates6.7 XP_524252.3 35099%99%
3Microcebus murinusGray mouse lemurPrimates73.8 XP_012630183.1 35384%88%
4Octodon degusCommon deguRodentia90 XP_023577316.1 35278%84%
5Ochotona princepsAmerican pikaLagomorpha90 XP_004595135.1 35377%83%
6Mus musculusMouseRodentia90 NP_001139495.1 34474%79%
7Rattus norvegicusBrown RatRodentia90 NP_001185513.1 34473%80%
8Bos taurusCattleArtiodactyla96 XP_002695017.1 35381%85%
9Canis lupus familiarisDogCarnivora96 XP_541643.2 35380%83%
10Leptonychotes weddelliiWeddell sealCarnivora96 XP_006739473.1 34579%84%
11Monodon monocerosNarwhalArtiodactyla96 XP_029092965.1 35278%84%
12Desmodus rotundusCommon vampire batChiroptera96 XP_024433437.1 35577%83%
13Chrysochloris asiaticaCape golden moleAfrosoricida105 XP_006871606.1 34875%81%
14Vombatus ursinuscommon wombatDiprotodontia159 XP_027711296.1 35864%73%
15Phascolarctos cinereuskoalaDiprotodontia159 XP_020834255.1 35864%73%
16Ornithorhynchus anatinusPlatypusMonotremata177 XP_028920793.1 33857%65%
17Chelonoidis abingdoniiPinta Island tortoiseTestudines312 XP_032660367.1 32944%57%
18Podarcis muralisCommon wall lizardSquamata312 XP_028597878.1 33043%55%
19Python bivittatusBurmese pythonSquamata312 XP_015745259.1 31842%58%
20Nanorana parkeriHigh Himalaya frogGymnophiona351.8 XP_018411523.1 35138%55%
21Rhinatrema bivittatumtwo-lined caecilianAnura351.8 XP_029475031.1 33838%51%
Rate of Divergence of FAM98C compared to fibrinogen and cytochrome c. FAM98C evolution rate.png
Rate of Divergence of FAM98C compared to fibrinogen and cytochrome c.

Rate of Evolution

FAM98C is rapidly evolving with a rate of divergence faster than both cytochrome C, a slowly evolving gene, and fibrinogen, a rapidly evolving gene.

Interacting proteins

FAM98C has been predicted to interact with DR1, LRRCC1, FAM83F, TMEM256, Pdrm16 and SPRED1. [26] [27] LRRCC1 and TMEM256 were both mentioned with FAM98C as potentially novel genes that are related with ciliopathies. [28]

Clinical significance

In a bioinformatics study, FAM98C and 9 other novel genes were identified to be associated with a prognosis of cholangiocarcinoma. [29]

Related Research Articles

<span class="mw-page-title-main">C11orf49</span> Protein-coding gene in the species Homo sapiens

C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.

<span class="mw-page-title-main">FAM98A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

<span class="mw-page-title-main">Glutamate rich 5</span> Protein-coding gene in the species Homo sapiens

Glutamate rich protein 5 is a protein in humans encoded by the ERICH5 gene, also known as chromosome 8 open reading frame 47 (C8orf47).

<span class="mw-page-title-main">C2orf73</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.

Proline-rich protein 16 (PRR16) is a protein coding gene in Homo sapiens. The protein is known by the alias Largen.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">C1orf94</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.

<span class="mw-page-title-main">C12orf24</span> Protein-coding gene in humans

C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.

<span class="mw-page-title-main">LSMEM2</span> Protein-coding gene in the species Homo sapiens

Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.

<span class="mw-page-title-main">C9orf85</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">C6orf136</span> Protein-coding gene in the species Homo sapiens

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">C11orf98</span> Protein-coding gene in the species Homo sapiens

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000130244 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000030590 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 "FAM98C Gene - GeneCards | FA98C Protein | FA98C Antibody". www.genecards.org. Retrieved 2020-12-19.
  6. "FAM98C family with sequence similarity 98 member C [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-15.
  7. 1 2 "protein FAM98C isoform 1 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  8. "protein FAM98C isoform 2 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  9. 1 2 Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S (March 1992). "Methods and algorithms for statistical analysis of protein sequences". Proceedings of the National Academy of Sciences of the United States of America. 89 (6): 2002–6. Bibcode:1992PNAS...89.2002B. doi: 10.1073/pnas.89.6.2002 . PMC   48584 . PMID   1549558.
  10. "HomoloGene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  11. "CDD Conserved Protein Domain Family: DUF2465". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  12. Dürnberger G, Bürckstümmer T, Huber K, Giambruno R, Doerks T, Karayel E, et al. (July 2013). "Experimental characterization of the human non-sequence-specific nucleic acid interactome". Genome Biology. 14 (7): R81. doi: 10.1186/gb-2013-14-7-r81 . PMC   4053969 . PMID   23902751.
  13. Prof. T. Ashok Kumar. "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2020-12-16.
  14. "NPS@ : GOR4 secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 2020-12-16.
  15. "Bioinformatics Toolkit". toolkit.tuebingen.mpg.de. Retrieved 2020-12-16.
  16. "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2020-12-19.
  17. Roy A, Kucukural A, Zhang Y (April 2010). "I-TASSER: a unified platform for automated protein structure and function prediction". Nature Protocols. 5 (4): 725–38. doi:10.1038/nprot.2010.5. PMC   2849174 . PMID   20360767.
  18. "ElDorado: Annotation & Analysis". www.genomatix.de. Retrieved 2020-12-16.
  19. 1 2 "GEO DataSet Browser". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  20. "GEO Accession viewer". www.ncbi.nlm.nih.gov. Retrieved 2020-12-19.
  21. "PSORT II Prediction". psort.hgc.jp. Retrieved 2020-12-16.
  22. "GPS 5.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.cn. Retrieved 2020-12-16.
  23. "GPS 5.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.cn. Retrieved 2020-12-19.
  24. "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interaction Motifs". sumosp.biocuckoo.org. Retrieved 2020-12-16.
  25. "TimeTree :: The Timescale of Life". www.timetree.org. Retrieved 2020-12-19.
  26. "FAM98C protein (human) - STRING interaction network". string-db.org. Retrieved 2020-12-19.
  27. "PSICQUIC View". www.ebi.ac.uk. Retrieved 2020-12-19.
  28. Shaheen R, Szymanska K, Basu B, Patel N, Ewida N, Faqeih E, et al. (November 2016). "Characterizing the morbid genome of ciliopathies". Genome Biology. 17 (1): 242. doi: 10.1186/s13059-016-1099-5 . PMC   5126998 . PMID   27894351.
  29. Da Z, Gao L, Su G, Yao J, Fu W, Zhang J, et al. (2020-04-22). "Bioinformatics combined with quantitative proteomics analyses and identification of potential biomarkers in cholangiocarcinoma". Cancer Cell International. 20 (1): 130. doi: 10.1186/s12935-020-01212-z . PMC   7178764 . PMID   32336950.