Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. [1] [2] While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, [3] is highly conserved. [4] The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, [4] and no transmembrane domains. [5] This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton. [6]
HGH1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | HGH1 , BRP16, BRP16L, C8orf30A, C8orf30B, FAM203A, FAM203B, HGH1 homolog | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1930628 HomoloGene: 48742 GeneCards: HGH1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
FAM203B is located on the positive DNA strand of the long arm of chromosome 8 at locus 24.3 (8q24.3) from 76,368,898 - 76,371,411 in the human genome. The gene product contains 2,402 bp of mRNA with 6 predicted exons in the human gene. [2] [11] There are no known isoforms.
The pseudogene TSSK5P2 is located on the negative strand opposite FAM203B (145,440,975 - 145,443,775), [13] while LOC377711 is located immediately downstream on the positive strand (145,448,755 - 145,485,896). [14] FAM203A, MROH1, and SCXB are located upstream of FAM203B. [11] [15]
Expression Profile: mRNA expression has been localized in many tissue types (immune, nervous, muscle, internal, secretory, and reproductive) in similar quantities and may therefore be ubiquitous. [12]
Promoter: The predicted promoter region of FAM203B is located between 145,437,380 and 145,438,015 on Chromosome 8 and has a length of 636 bp. [16]
The function of FAM203B is not currently understood. The FAM203B protein has 390 amino acids, [1] a molecular weight of 42.1 kdal, [5] and an isoelectric point of 4.56. [17]
FAM203B contains two domains of unknown function: DUF383 (residues 110–288) and DUF384 (residues 292–349). [1] The protein is alanine-, proline-, and leucine-rich, but poor in serine, asparagine, threonine, isoleucine, lysine, and phenylalanine. The following internal repeats can be found in the primary sequence: LPFL (26-29, 245–248), ELAP (70-73), GRAL (54-57, 111–114), and LAADPGL (88-94, 99–105). There are no positive, negative, mixed charge, or hydrophobic clusters; no transmembrane domains; and no clusters of amino acid multiplets. [5] The secondary structure prediction generated by the Phyre 2.0 bioinformatic server shows only α-helices, almost all of which have high confidence values. The overall confidence value of the model is 99.5%. [18]
There are at least six predicted phosphorylation sites in FAM203B: S17, S153, Y167, T223, S259, and S320. [19] The FAM203B protein is also predicted to locate to the cytoplasm. [20]
There are many possible transcription factor binding sites in the FAM203B promoter. Below is a table of the best possibilities, which have high confidence values, evolutionary conservation, and/or multiple possible binding sites in the promoter. [16]
Table of Possible Transcription Factor Binding Sites in Predicted FAM203B Promoter: [16]
Transcription Factor | Start | End | Strand | Sequence |
---|---|---|---|---|
Winged-helix transcription factor IL-2 enhancer binding factor, forkhead box K2 | 6 | 22 | - | gacaggacAACAcaggg |
Hypermethylated in Cancer 1 | 49 | 61 | + | ccgTGCCagcctg |
Zinc finger transcription factor ZBP-89 | 94 | 116 | + | tggccactCCCCcattcagccct |
Kidney-enriched kruppel-like factor, KLF15 | 142 | 158 | + | gagccGGGGcgcgggcc |
Transcription factor II B recognition element | 149 | 155 | - | ccgCGCC |
Glial cells missing homolog 1, chorion-specific transcription factor GCMα | 159 | 173 | + | tcagaCCCTcagggc |
Transcription factor AP-2α | 161 | 175 | - | gggcCCTGagggtct |
Smad4 transcription factor involved in TGFβ signaling | 245 | 255 | - | gtaGTCTcggc |
Nuclear factor 1 | 278 | 298 | - | gatTTGGccgcctgccgcgtc |
ZF5 POZ domain zinc finger, zinc finger protein 161 | 295 | 309 | + | aatCGCGccgggcct |
Smad3 transcription factor involved in TGFβ signaling | 365 | 375 | - | ggcGTCTggcc |
Myeloid zinc finger protein MZF1 | 384 | 394 | - | gcGGGGagtta |
X-linked zinc finger protein | 397 | 407 | + | gcGGCCtggcc |
Myeloid zinc finger protein MZF1 | 406 | 416 | - | gaGGGGagggg |
Core promoter-binding protein with 5 kruppel-type zinc fingers | 423 | 445 | + | ccggtcCCGCcccttgagcccag |
X gene core promoter element 1 | 424 | 434 | - | ggGCGGgaccg |
Zinc finger and BTB domain-containing 7A | 479 | 501 | - | cgcaaCCCCgcccaccagaggag |
Kruppel-like factor 7 | 483 | 499 | + | tctggtgGGCGgggttg |
Erythroid kruppel-like factor | 533 | 549 | + | ggcaccggtcGGGTggc |
Hypermethylated in cancer 1 | 541 | 553 | - | tgcTGCCacccga |
There are several other proteins that may interact directly with the FAM203B protein including C1orf112, HEATR3, MRTO4, BYSL, GINS1, DKC1, TXNDC12, PWP2, IMP4, and NIP7. [21]
FAM203A is 99% identical to FAM203B with only one amino acid difference (E264Q) due to a point mutation (G857C). [1] [15] [22] This indicates that the duplication event that produced FAM203B 242,266 bp downstream [11] from FAM203A occurred very recently in evolutionary history. The FAM203A protein is highly conserved and has orthologs in primates, rodents, ungulates, marsupials, amphibians, fish, fungi, plants, and at least one monotreme, one reptile, and one hemichordate. [4] [23]
Table of FAM203B Paralog and Homologs:
Scientific Name | Common Name | Divergence from Humans (MYA) [24] | NCBI Protein Accession | Gene Name | Protein Length | Sequence Similarity |
---|---|---|---|---|---|---|
Homo sapiens | Human | 0.0 | NP_057542 | FAM203A | 390 | 100% |
Macaca mulatta | Rhesus macaque | 29.2 | XM_001090013 | BRP16L | 396 | 94% |
Pan troglodytes | Chimpanzee | 6.3 | XP_520011 | FAM203A | 395 | 98% |
Mus musculus | Mouse | 92.3 | NP_067530 | FAM203A | 393 | 86% |
Sus scrofa | Wild boar | 94.2 | XP_003125495 | FAM203A-like | 406 | 85% |
Monodelphis domestica | Gray short-tailed opossum | 162.6 | XP_003340757 | FAM203A-like | 483 | 78% |
Columba livia | Rock dove | 296.0 | EMC87403 | BRP16 (partial) | 194 | 64% |
Danio rerio | Zebrafish | 400.1 | NP_001002522 | FAM203A | 377 | 70% |
Xenopus tropicalis | Western clawed frog | 371.2 | AAI60980 | LOC100145412 | 377 | 70% |
Xenopus tropicalis | Western clawed frog | 371.2 | NP_001007916 | FAM203A | 359 | 68% |
Strongylocentrotus purpuratus | Purple sea urchin | 742.9 | XP_793139 | FAM203A-like | 372 | 62% |
Anolis carolinensis | Carolina anole | 301.7 | XP_003228921 | BRP16L | 286 | 57% |
Saccoglossus kowglevski | Acorn worm | 661.2 | XP_002739897 | BRP16L | 362 | 61% |
Danio rerio | Zebrafish | 400.1 | XP_002665502 | BRP16L | 181 | 57% |
Saccharomyces cerevisiae | Budding yeast | 1369.0 | NP_011703 | Hgh1p | 394 | 52% |
Arabidopsis thaliana | Thale cress | 1369.0 | NP_172882 | Armadillo/beta-catenin-like repeats-containing | 339 | 49% |
There is one ortholog of FAM203B, brain protein 16-like (BRP16L) in Macaca mulatta, [4] [23] although no other primates appear to have orthologous proteins. There are two possible explanations for this anomaly: (1) DNA of other primates has not been sequenced thoroughly in the genomic region of the FAM203B ortholog, or (2) FAM203B is the result of a gene duplication event unique to humans, meaning that BRP16L in M. mulatta resulted from an earlier duplication event unique to that species. The second explanation is supported by the following evidence:
However, because FAM203A and FAM203B are so similar, it is difficult to determine whether proteins are orthologs or just simply homologs.
The phylogenetic tree of FAM203B and its homologs matches with the overall divergence of the respective lineages. [22] [24]
Every ortholog and homolog of FAM203B has a DUF383 domain and a DUF384 domain (except Anolis carolinensis, which is missing DUF384 due to a large 3' truncation [23] [27] ). There is significant variation among mammals, marsupials, and monotremes as to where the DUF383 domain begins, whereas this variation is smaller in reptiles, amphibians, fish, invertebrates, plants, and fungi. Additionally, the DUF383 domain ends at the same location for all homologs, while the DUF384 domain starts and ends at roughly the same location in all homologs. There is high homology in the DUF384 domain (292..349) and in the DUF383 domain (154..288), and several amino acids are completely conserved in vertebrates, invertebrates, plants, and fungi, which include Arg190, Gly219, Asn226, Lys273, and Lys338. Other highly conserved amino acids include Asn87, Lys88, Arg216, and Phe229. [4] [22]
Zinc finger protein 280D, also known as Suppressor Of Hairy Wing Homolog 4, SUWH4, Zinc Finger Protein 634, ZNF634, or KIAA1584, is a protein that in humans is encoded by the ZNF280D gene located on chromosome 15q21.3.
TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long and its cDNA has 1214 base pairs. It was previously designated C16orf42.
KIAA0895 is a protein that in Homo sapiens is encoded by the KIAA0895 gene. The gene encodes a protein commonly known as the KIAA0895 protein. It's aliases include hypothetical protein LOC23366, OTTHUMP00000206979, OTTHUMP00000206980, 9530077C05Rik, and 1110003N12Rik. It is located at 7p14.2.
Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.
Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 and DAZAP2.
Coiled coil domain containing protein 120 (CCDC120), also known as JM11 protein, is a protein that, in humans, is encoded by the CCDC120 gene. The function of CCDC120 has not been formally identified but structural components, conservation, and interactions can be identified computationally.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
Uncharacterized protein C17orf50 is a protein which in humans is encoded by the C17orf50 gene.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of 5 transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.
Secernin-3 (SCRN3) is a protein that is encoded by the human SCRN3 gene. SCRN3 belongs to the peptidase C69 family and the secernin subfamily. As a part of this family, the protein is predicted to enable cysteine-type exopeptidase activity and dipeptidase activity, as well as be involved in proteolysis. It is ubiquitously expressed in the brain, thyroid, and 25 other tissues. Additionally, SCRN3 is conserved in a variety of species, including mammals, birds, fish, amphibians, and invertebrates. SCRN3 is predicted to be an integral component of the cytoplasm.
{{cite journal}}
: Cite journal requires |journal=
(help)