FAM200A | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||
Aliases | FAM200A , C7orf38, family with sequence similarity 200 member A | ||||||||||||||||||||||||
External IDs | HomoloGene: 89159 GeneCards: FAM200A | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
| |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Orthologs | |||||||||||||||||||||||||
Species | Human | Mouse | |||||||||||||||||||||||
Entrez |
| ||||||||||||||||||||||||
Ensembl |
| ||||||||||||||||||||||||
UniProt |
| ||||||||||||||||||||||||
RefSeq (mRNA) |
| ||||||||||||||||||||||||
RefSeq (protein) |
| ||||||||||||||||||||||||
Location (UCSC) | Chr 7: 99.55 – 99.56 Mb | n/a | |||||||||||||||||||||||
PubMed search | [2] | n/a | |||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||
|
C7orf38 is a gene located on chromosome 7 in the human genome. [3] The gene is expressed in nearly all tissue types at very low levels. [4] Evolutionarily, it can be found throughout the kingdom animalia. While the function of the protein is not fully understood by the scientific community, bioinformatic tools have shown that the protein bares much similarity to zinc finger or transposase proteins. Many of its orthologs, paralogs, and neighboring genes have been shown to possess zinc finger domains. [5] The protein contains a hAT dimerization domain nears its C-terminus. [6] This domain is highly conserved in transposase enzymes. [7]
C7orf38 is located on chromosome 7 at q22.1. Its genomic sequence contains 5,612 bp. The predominant transcript contains two exons and is 2,507 bp in length. [8] The translated protein contains 573 amino acids. [9]
The 573 amino acid protein has a molecular weight of 66,280.05. [10] The isoelectric point was found to occur at a pH of 5.775, about 1.6 pH lower than that of the average human pH. [11] Two deviations from prototypical human proteins are evident. The protein contains a less than expected number of glycine residues, and is rich in leucine residues. [12] There are not sections of strong hydrophobicity or hydrophilicity. Thus, it is not predicted to be a transmembrane protein.
The four genes in closest proximity to C7orf38 on chromosome 7 exhibit similar function, many of which are transcription factors. [13]
Name | Orientation | Function |
---|---|---|
ZNF789 | Start: 98,908,451 bp from pter End: 98,923,153 bp from pter Size: 14,703 bases Orientation: plus strand | The gene encodes the zinc finger protein 789. Functionally, the gene has been proposed to participate in regulation of transcription. It is expected to use zinc ion binding. |
ZNF394 | Start: 98,928,790 bp from pter End: 98,935,813 bp from pter Size: 7,024 bases Orientation : minus strand | The gene encodes zinc finger protein 394. Over expression over ZNF394 inhibits the transchription of c-jun and Ap-1. Suggesting that it is a transcriptional repressor. |
ZKSCAN5 | Start: 98,940,209 bp from pter End: 98,969,381 bp from pter Size: 29,173 bases Orientation: plus strand | The gene encodes zinc finger with KRAB and SCAN domains 5. This gene encodes a zinc finger protein of the Kruppel family. The protein contains a SCAN box and a KRAB A domain. |
ZNF655 | Start: 98,993,981 bp from pter End: 99,012,012 bp from pter Size: 18,032 bases Orientation: plus strand | The gene encodes zinc finger protein 655. Numerous alternatively spliced transcripts encoding distinct isoforms have been discovered. |
Mihuya | Start: 99,149,738 bp from pter End: 99,149,626 bp from pter Size: 112 bases Orientation: plus strand | The Mihuya gene does not encode a large or known functional protein. The antisense relationship to C7orf38 raises the possibility for regulation of expression. |
Eight paralogs are found in the human proteome. [5] Similar to the neighboring genes, many of the paralogs function as zinc fingers, or transcription factors.
Name | NCBI Accession Number | Length (AA) | % Identity to C7orf38 | % Similarity to C7orf38 |
---|---|---|---|---|
hypothetical protein LOC285550 | NP_001138663.1 | 657 | 79 | 91 |
zinc finger MYM-type protein 6 | NP_009098.3 | 1325 | 38 | 60 |
SCAN domain-containing protein 3 | NP_443155.1 | 1325 | 39 | 60 |
zinc finger BED domain-containing protein 5 | NP_067034.2 | 692 | 35 | 57 |
transposon-derived Buster3 transposase-like | NP_071373.2 | 594 | 32 | 53 |
general transcription factor II-I repeat domain-containing protein 2B | NP_001003795.1 | 949 | 25 | 46 |
GTF2I repeat domain containing 2 | NP_775808.2 | 949 | 24 | 45 |
EPM2A interacting protein 1 | NP_055620.1 | 607 | 22 | 42 |
Orthologs to C7orf38 can be traced back evolutionarily through plants. [5] The following is not an extensive list of orthologs. It is intended to provide an evolutionary overview of the conservation of C7orf38.
Common name | Genus & species | NCBI accession number | Length (AA) | % Identity to C7orf38 | % Similarity to C7orf38 |
---|---|---|---|---|---|
Chimp | Pan troglodytes | XP_001139775.1 | 573 | 99 | 99 |
Macaque monkey | Macaca fascicularis | BAE01234.1 | 573 | 96 | 98 |
Horse | Equus caballus | XP_001915370.1 | 573 | 81 | 84 |
Pig | Sus scrofa | XP_001929194 | 1323 | 39 | 61 |
Cow | Bos taurus | XP_875656.2 | 1320 | 38 | 61 |
Mouse | Mus musculus | CAM15594.1 | 1157 | 37 | 60 |
Domestic dog | Canis lupus familiaris | ABF22701.1 | 609 | 37 | 60 |
Rat | Rattus rattus | NP_001102151.1 | 1249 | 37 | 59 |
Opossum | Monodelphis domestica | XP_001372983.1 | 608 | 37 | 59 |
Chicken | Gallus gallus | XP_424913.2 | 641 | 37 | 58 |
Frog | Xenopus (Silurana) tropicalis | ABF20551.1 | 656 | 37 | 56 |
Zebra fish | Danio rerio | XP_001340213.1 | 609 | 37 | 56 |
Pea aphid | Acyrthosiphon pisum | XP_001943527.1 | 659 | 36 | 54 |
Beatle | Tribolium castaneum | ABF20545.1 | 599 | 35 | 55 |
Sea squirt | Ciona intestinalis | XP_002119512.1 | 524 | 34 | 52 |
Hydra | Hydra magnipapillata | XP_002165429.1 | 572 | 29 | 52 |
Puffer fish | Tetraodon nigroviridis | CAF95678.1 | 539 | 28 | 47 |
Mosquito | Anopheles gambiae | XP_558399.5 | 591 | 28 | 47 |
Sea urchin | Strongylocentrotus purpuratus | ABF20546.1 | 625 | 27 | 47 |
Grass plant | Sorghum bicolor | XP_002439156.1 | 524 | 25 | 40 |
Broad leaf tree | Populus trichocarpa | XP_002319808.1 | 788 | 21 | 39 |
CBLast was used to determine a structurally related protein with experimentally determined structure. The protein Hermes DNA transposase, of the Hermes DBD superfamily, was shown to be structurally similar (Evalue: 1E-6). [14]
hAT Dimerization Domain | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | hAT | ||||||||
Pfam | PF05699 | ||||||||
InterPro | IPR008906 | ||||||||
|
The hAT dimerization domain is found at the C-terminus of transposase elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerization domain forms extremely stable dimers in vitro. [7]
The MFOLD program available at Rensselaer BioInformatics Server was used to predict secondary structure of the mature mRNA sequence. [15] The primary sequence of the mRNA secondary structures displayed high levels of conservation in orthologs, suggesting structural importance.
The gene appears to be expressed in most tissue types. [16] Very low levels of expression were observed through est profiles, and no deviation was observed between health or developmental states.
CCDC186 is a protein that in humans is encoded by the CCDC186 gene The CCDC186 gene is also known as the CTCL-tumor associated antigen with accession number NM_018017.
TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long. and its cDNA has 1214 base pairs It was previously designated C16orf42.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
C5orf34 is a protein that in humans is encoded by the C5orf34 gene (5p12).
PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.
WD repeat-containing protein 90 is a protein that in humans is encoded by the WDR90 gene (16p13.3). This human protein is 1750 amino acids, and has a molecular weight of 187.7 kDa. It contains multiple WD40 repeat domains and one domain of unknown function. This protein is conserved all the way back to invertebrates. Proteins containing WD transducin repeating domains have been found to play a role in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control, autophagy and apoptosis.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
Chromosome 11 open reading frame 86, also known as C11orf86, is a protein-coding gene in humans. It encodes for a protein known as uncharacterized protein C11orf86, which is predicted to be a nuclear protein. The function of this protein is currently unknown.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.
OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.
Chromosome 1 open reading frame 162 is a protein that in humans is encoded by the C1orf162 gene. It has been found to be hypomethylated in instances of gastric cancer.
C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.
Uncharacterized protein C17orf50 is a protein which in humans is encoded by the C17orf50 gene.
C8orf34 is a protein that, in Homo sapiens, is encoded by the C8orf34 gene. Aliases for C8orf34 include vestibule-1 or VEST-1. Within the cell, C8orf34 is localized to the nucleus and nucleoli where it may play a role in the regulation of gene expression as well as the cell cycle.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.