FAM200A

Last updated
FAM200A
CBLASTHermesTransposase.jpg
Identifiers
Aliases FAM200A , C7orf38, family with sequence similarity 200 member A
External IDs HomoloGene: 89159 GeneCards: FAM200A
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_145111

n/a

RefSeq (protein)

NP_659802

n/a

Location (UCSC) Chr 7: 99.55 – 99.56 Mb n/a
PubMed search [2] n/a
Wikidata
View/Edit Human

C7orf38 is a gene located on chromosome 7 in the human genome. [3] The gene is expressed in nearly all tissue types at very low levels. [4] Evolutionarily, it can be found throughout the kingdom animalia. While the function of the protein is not fully understood by the scientific community, bioinformatic tools have shown that the protein bares much similarity to zinc finger or transposase proteins. Many of its orthologs, paralogs, and neighboring genes have been shown to possess zinc finger domains. [5] The protein contains a hAT dimerization domain nears its C-terminus. [6] This domain is highly conserved in transposase enzymes. [7]

Contents

Gene

C7orf38 is located on chromosome 7 at q22.1. Its genomic sequence contains 5,612 bp. The predominant transcript contains two exons and is 2,507 bp in length. [8] The translated protein contains 573 amino acids. [9]

C7orf38GeneLoci.jpg

Protein composition

The 573 amino acid protein has a molecular weight of 66,280.05. [10] The isoelectric point was found to occur at a pH of 5.775, about 1.6 pH lower than that of the average human pH. [11] Two deviations from prototypical human proteins are evident. The protein contains a less than expected number of glycine residues, and is rich in leucine residues. [12] There are not sections of strong hydrophobicity or hydrophilicity. Thus, it is not predicted to be a transmembrane protein.

C7orf38Transmembrane.jpg

Gene neighborhood

The four genes in closest proximity to C7orf38 on chromosome 7 exhibit similar function, many of which are transcription factors. [13]

C7orf38GeneNeighborhood.jpg
NameOrientationFunction
ZNF789Start: 98,908,451 bp from pter

End: 98,923,153 bp from pter Size: 14,703 bases Orientation: plus strand

The gene encodes the zinc finger protein 789. Functionally, the gene has been proposed to participate in regulation of transcription. It is expected to use zinc ion binding.
ZNF394Start: 98,928,790 bp from pter

End: 98,935,813 bp from pter Size: 7,024 bases Orientation : minus strand

The gene encodes zinc finger protein 394. Over expression over ZNF394 inhibits the transchription of c-jun and Ap-1. Suggesting that it is a transcriptional repressor.
ZKSCAN5Start: 98,940,209 bp from pter

End: 98,969,381 bp from pter Size: 29,173 bases Orientation: plus strand

The gene encodes zinc finger with KRAB and SCAN domains 5. This gene encodes a zinc finger protein of the Kruppel family. The protein contains a SCAN box and a KRAB A domain.
ZNF655Start: 98,993,981 bp from pter

End: 99,012,012 bp from pter Size: 18,032 bases Orientation: plus strand

The gene encodes zinc finger protein 655. Numerous alternatively spliced transcripts encoding distinct isoforms have been discovered.
MihuyaStart: 99,149,738 bp from pter

End: 99,149,626 bp from pter Size: 112 bases Orientation: plus strand

The Mihuya gene does not encode a large or known functional protein. The antisense relationship to C7orf38 raises the possibility for regulation of expression.

Paralogs

Eight paralogs are found in the human proteome. [5] Similar to the neighboring genes, many of the paralogs function as zinc fingers, or transcription factors.

NameNCBI Accession NumberLength (AA) % Identity to C7orf38 % Similarity to C7orf38
hypothetical protein LOC285550NP_001138663.16577991
zinc finger MYM-type protein 6NP_009098.313253860
SCAN domain-containing protein 3NP_443155.113253960
zinc finger BED domain-containing protein 5NP_067034.26923557
transposon-derived Buster3 transposase-likeNP_071373.25943253
general transcription factor II-I repeat domain-containing protein 2BNP_001003795.19492546
GTF2I repeat domain containing 2NP_775808.29492445
EPM2A interacting protein 1NP_055620.16072242

Orthologs

Orthologs to C7orf38 can be traced back evolutionarily through plants. [5] The following is not an extensive list of orthologs. It is intended to provide an evolutionary overview of the conservation of C7orf38.

Common nameGenus & speciesNCBI accession numberLength (AA) % Identity to C7orf38 % Similarity to C7orf38
ChimpPan troglodytesXP_001139775.15739999
Macaque monkeyMacaca fascicularisBAE01234.15739698
HorseEquus caballusXP_001915370.15738184
PigSus scrofaXP_00192919413233961
CowBos taurusXP_875656.213203861
MouseMus musculusCAM15594.111573760
Domestic dogCanis lupus familiarisABF22701.16093760
RatRattus rattusNP_001102151.112493759
OpossumMonodelphis domesticaXP_001372983.16083759
ChickenGallus gallusXP_424913.26413758
FrogXenopus (Silurana) tropicalisABF20551.16563756
Zebra fishDanio rerioXP_001340213.16093756
Pea aphidAcyrthosiphon pisumXP_001943527.16593654
BeatleTribolium castaneumABF20545.15993555
Sea squirtCiona intestinalisXP_002119512.15243452
HydraHydra magnipapillataXP_002165429.15722952
Puffer fishTetraodon nigroviridisCAF95678.15392847
MosquitoAnopheles gambiaeXP_558399.55912847
Sea urchinStrongylocentrotus purpuratusABF20546.16252747
Grass plantSorghum bicolorXP_002439156.15242540
Broad leaf treePopulus trichocarpaXP_002319808.17882139

Structure

Protein

CBLast was used to determine a structurally related protein with experimentally determined structure. The protein Hermes DNA transposase, of the Hermes DBD superfamily, was shown to be structurally similar (Evalue: 1E-6). [14]

hAT Dimerization Domain
Identifiers
SymbolhAT
Pfam PF05699
InterPro IPR008906

The hAT dimerization domain is found at the C-terminus of transposase elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerization domain forms extremely stable dimers in vitro. [7]

CBLASTHermesTransposase.jpg

mRNA

The MFOLD program available at Rensselaer BioInformatics Server was used to predict secondary structure of the mature mRNA sequence. [15] The primary sequence of the mRNA secondary structures displayed high levels of conservation in orthologs, suggesting structural importance.

C7orf38mRNASecondaryStructure.jpg

Tissue distribution

The gene appears to be expressed in most tissue types. [16] Very low levels of expression were observed through est profiles, and no deviation was observed between health or developmental states.

HomologyEstProfileSite.jpg
HomologyEstProfileHealth.jpg
HomologyEstProfileDevelopment.jpg

Related Research Articles

CCDC186

CCDC186 is a protein that in humans is encoded by the CCDC186 gene The CCDC186 gene is also known as the CTCL-tumor associated antigen with accession number NM_018017.

TSR3

TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long. and its cDNA has 1214 base pairs It was previously designated C16orf42.

HIKESHI

HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.

METTL26

METTL26, previously designated C16orf13, is a protein-coding gene for Methyltransferase Like 26, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.

C5orf34 is a protein that in humans is encoded by the C5orf34 gene (5p12).

Proser2

PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.

WD repeat-containing protein 90 is a protein that in humans is encoded by the WDR90 gene (16p13.3). This human protein is 1750 amino acids, and has a molecular weight of 187.7 kDa. It contains multiple WD40 repeat domains and one domain of unknown function. This protein is conserved all the way back to invertebrates. Proteins containing WD transducin repeating domains have been found to play a role in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control, autophagy and apoptosis.

C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.

C11orf86

Chromosome 11 open reading frame 86, also known as C11orf86, is a protein-coding gene in humans. It encodes for a protein known as uncharacterized protein C11orf86, which is predicted to be a nuclear protein. The function of this protein is currently unknown.

Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.

C9orf135 Mammalian protein found in Homo sapiens

C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.

PRR29

PRR29 is a protein located on human chromosome 17 that in humans is encoded by the PRR29 gene.

OCC-1 is a protein, which in humans is encoded by the gene C12orf75. The gene is approximately 40,882 bp long and encodes 63 amino acids. OCC-1 is ubiquitously expressed throughout the human body. OCC-1 has shown to be overexpressed in various colon carcinomas. Novel splice variant of this gene was also detected in various human cancer types; in addition to encoding a novel smaller protein, OCC-1 gene produces a non-protein coding RNA splice variant lncRNA.

Chromosome 1 open reading frame 162 Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 162 is a protein that in humans is encoded by the C1orf162 gene. It has been found to be hypomethylated in instances of gastric cancer.

C14orf93 Protein-coding gene in the species Homo sapiens

C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.

C2orf73

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

C8orf58 Protein-coding gene in the species Homo sapiens

Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.

C17orf50

Uncharacterized protein C17orf50 is a protein which in humans is encoded by the C17orf50 gene.

C8orf34 Gene of the species Homo sapiens

C8orf34 is a protein that, in Homo sapiens, is encoded by the C8orf34 gene. Aliases for C8orf34 include vestibule-1 or VEST-1. Within the cell, C8orf34 is localized to the nucleus and nucleoli where it may play a role in the regulation of gene expression as well as the cell cycle.

C12orf50 Protein encoding gene C12orf50

Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000221909 - Ensembl, May 2017
  2. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  3. "University of California Santa Cruz" . Retrieved 2010-05-10.
  4. "NCBI UniGene" . Retrieved 2010-05-10.
  5. 1 2 3 "NCBI BLAST" . Retrieved 2010-05-10.
  6. "KEGG" . Retrieved 2010-05-10.
  7. 1 2 Essers L, Adolphs RH, Kunze R (2000). "A highly conserved domain of the maize activator transposase is involved in dimerization". Plant Cell. 12 (2): 211–224. doi:10.2307/3870923. JSTOR   3870923. PMC   139759 . PMID   10662858.
  8. "Fam200A" . Retrieved 2010-05-10.
  9. "NCBI Protein Accession Number" . Retrieved 2010-05-10.
  10. "AAStats. SDSC Biology WorkBench" . Retrieved 2010-05-10.[ permanent dead link ]
  11. "IP. SDSC Biology WorkBench" . Retrieved 2010-05-10.[ permanent dead link ]
  12. "SAPS. SDSC Biology WorkBench" . Retrieved 2010-05-10.[ permanent dead link ]
  13. "AceView" . Retrieved 2010-05-10.
  14. "Hermes DNA Transposase" . Retrieved 2010-05-10.
  15. "Fam200A". Archived from the original on 2010-05-22. Retrieved 2010-05-10.
  16. "NCBI UniGene" . Retrieved 2010-04-22.