C3orf62 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C3orf62 , chromosome 3 open reading frame 62, MAPS | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 2148248 HomoloGene: 14230 GeneCards: C3orf62 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C3orf62 | |||||||
---|---|---|---|---|---|---|---|
Identifiers | |||||||
Symbol | C3orf62 | ||||||
Alt. names | CC062, FLJ43654 | ||||||
NCBI gene | 375341 | ||||||
HGNC | 24771 | ||||||
RefSeq | NM_198562.21 | ||||||
UniProt | Q6ZUJ4 | ||||||
Other data | |||||||
Locus | Chr. 3 p21.31{{{LocusSupplementaryData}}} | ||||||
|
Chromosome 3 open reading frame 62 (C3orf62) is a protein that in humans is encoded by the C3orf62 gene. C3orf62 is a glycine-depleted protein relative to the amount of glycine in proteins in the rest of the genome. [5] C3orf62 has a KKXX-like motif and is predicted to be localized in the nucleus. [6] Expression of C3orf62 remains highest in whole blood. [7]
C3orf62 is mapped to the reverse strand of chromosome 3 at 3p21.31 and spans 9,313 bases. [8] C3orf62 starts at 49,268,597 base pairs from the terminus of the short arm (pter) and ending at 49,277,909 base pairs pter. This gene is known to have 3 exons, 4 transcripts, and 37 orthologues. [9] [7] [10] [11] [12]
C3orf62 is flanked by Ubiquitin Specific Protease 4 ( USP4 ) and Coil-Coiled Domain Containing 36 (CCDC36).
C3orf62 possesses the following alternate names and synonyms: CC062; FLJ43654. [10] [13]
C3orf62 human protein (Q6ZUJ4) is 267 amino acids long, and has a molecular mass of 30,194 daltons. [9] The isoelectric point of C3orf62 is roughly 5.2. The unmodified C3orf62 protein is a “glycine depleted protein” relative to amounts of glycine in proteins in the rest of the genome. [5] It appears that glycine is evenly distributed throughout the C3orf62 sequence with no preference of areas to cluster in. Before post-translational modifications, C3orf62 is an acidic protein. No charge clusters are present in C3orf62, and no specific spacing of cysteine is found. The isoelectric point of C3orf62 is 5.211000. [14]
Name | Ensembl Transcript ID [11] [7] | Base Pairs | Protein | Biotype | CCDS | Uniprot | Refseq | |
---|---|---|---|---|---|---|---|---|
C3orf62-001 | ENST00000343010.7 | 4235 | 267aa | Protein encoding | CCDS2792 | Q6ZUJ4 | NM_198562, NP_940964 | |
C3orf62-004 | ENST00000436325.1 | 581 | 190aa | Protein encoding | - | C9JW57 | - | |
C3orf62-003 | ENST00000424960.1 | 602 | 98aa | Nonsense mediated decay | - | H7BZX3 | - | |
C3orf62-002 | ENST00000479673.1 | 3330 | No protein | Retained intron | - | - | - |
There are no known transmembrane domains for C3orf62. [13] C3orf62 has a KKXX-like motif in the C-terminus meaning C3orf62 may be responsible for retrieval of endoplasmic reticulum (ER) membrane proteins from the Golgi apparatus. [15]
Roughly 7 alpha helices are predicted for C3orf62 through Pele Protein Structure Protein Prediction and strengthened through orthologous secondary structure predictions by Ali2D. [13] [16]
C3orf62 is predicted to be localized in the nucleus. [6] The k-nearest neighbors algorithm predicts C3orf62 to be classified as follows: k=9/23; 69.6% nuclear, 13.0% mitochondrial, 13.0% cytoskeletal, 4.3% cytoplasmic. [6]
C3orf62 is expressed in more than 30 different tissues; highest expression is in whole blood. [10] [7] [9] Specifically, highest expression of C3orf62 is in the following tissues: lung, tonsil, trachea, small intestine, mammary gland, and salivary gland. Through analysis of various microarray studies, C3orf62 is found to have consistently high expression compared to other genes tested in the datasets. [17] C3orf62 has low expression in brain tissues.
C3orf62 possess two post-translational modifications, both are phosphorylation sites with locations at amino acid 210 and 224. [9] A natural variant is found at amino acid 110 (Glutamic acid (E)--> Lysine K). [12] [11]
It appears as though C3orf62 may have a YinOYang site at residue 115, meaning that this Threonine residue is predicted to be O-GlycNAcylated as well as phosphorylated. This site may be reversibly and dynamically modified by O-GlcNAc or Phosphate groups at different times in the cell. [18]
Transcription of C3orf62 produces 5 alternatively spliced variants and 1 unspliced form. Of the four splice variants, two of them are protein coding, one is nonsense meditated decay, and one is a retained intron. [10] QIAGEN denotes the following as transcription factor binding sites in the C3orf62 promoter: TFCP2, Pax-6, p53, MyoD, YY1, Ik-2, AREB6, IRF-7A3. [7]
Function of C3orf62 is not currently understood by the scientific community.
Upwards of 12 interacting proteins have been predicted for C3orf62. [20] [21] [22] Interacting proteins with the strongest confidence to interact with C3orf62 include: HAUS augmin-like complex subunit 1 (HAUS-1), Inhibitor of growth protein 5 (ING5), Thioredoxin domain-containing protein 9 (TXNDC9), and MORF4-family associated proteins (MORF4L1, MFRAP1).
Chemicals known to interact with C3orf62 include the following: Aflatoxin B1, Hydralazine, Valproic acid, and Decitabine. [10]
Interstitial deletions of chromosome 3 are rare, and only a few patients with a microdeletion of 3p21.31 have been reported to date. Characteristic clinical features found in patients with a microdeletion of 3p21.31 include developmental delay and distinctive facial features (including arched eyebrows, hypertelorism, epicanthus, and micrognathia). [23] [24] [25]
In the gene region, NCBI SNP identified 1,326 SNPS on the reverse minus strand of C3orf62. [26] In the coding region, NCBI SNP identified 147 common SNPs.
There are no known paralogs of C3orf62. [27]
The ortholog space of C3orf62 is fairly narrow, with the majority of orthologs found in mammals. [27] A small fraction of orthologs have also been found in the following classes: Reptila, Sarcopterygii, and Actinoptergii.
The groupings of nearly all Mammalia ortholog sequences of C3orf62 are as follows: E-value: 2e-94 to 1e-169; similarity 56-84%. Mammals in this group consist largely of primates but also include the following orders: Perissodactyla, Rodentia, Carnivora, Proboscidea, Cetartiodactyla, Cingulata, Artiodactyla, Eulipotyphla, Diselphimorphia, and Afrosoricida. [27]
More distantly related ortholog sequences of C3orf62 include organisms from classes Reptilia, Sarcopterygii, and Actinopterygii ranging from an E-value of 8e-10 to 3e-59 with similarity of 24-39%. [27] Organisms in this grouping consist of Testudines, Coelacanthiformes, Squamata, and Osteoglossiformes orders. No ortholog sequences of C3orf62 were found for the following life forms: Bacteria, archaea, protist, plant, fungus, trichoplax, invertebrate, amphibian, or bird.
Genus and Species | Common Name | Class | Accession | Percent Identity |
---|---|---|---|---|
Homo sapiens | Human | Mammalia | NP_940964 | 100 |
Microcebus murinus | Grey Mouse Lemur | Mammalia | XP_012626718 | 88 |
Propithecus coquereli | Coquerel's sifaka (lemur) | Mammalia | XP_012510880 | 86.9 |
Equus caballus | Horse | Mammalia | NP_001295877 | 84.3 |
Loxodonta Africana | African elephant | Mammalia | XP_003409711 | 83.2 |
Castor Canadensis | North American Beaver | Mammalia | XP_020037316 | 81.6 |
Otolemur garnettii | Garnett's Greater Galago | Mammalia | XP_003800633 | 81.6 |
Camelus bactrianus | Bactrian camel | Mammalia | XP_010967491.1 | 78.3 |
Ailuropoda melanoleuca | Giant Panda | Mammalia | XP_019656626 | 77.7 |
Canis lupus familiaris | Dog | Mammalia | XP_003432924 | 77.2 |
Vicugna pacos | Alpaca | Mammalia | XP_006196356 | 77.2 |
Condylura cristata | Star-nosed mole | Mammalia | XP_012575760 | 76.8 |
Felis catus | Cat | Mammalia | XP_003982269 | 75.1 |
Pteropus vampyrus | Large flying fox | Mammalia | XP_011373720 | 73.3 |
Pantholops hodgsonii | Tibetan antelope | Mammalia | XP_005969318 | 72.6 |
Ictidomys tridecemlineatus | Thirteen lines ground squirrel | Mammalia | XP_005326967 | 71 |
Sorex araneus | Common Shrew | Mammalia | XP_012789682 | 69.5 |
Monodelphis domestica | Gray short-tailed opossum | Mammalia | XP_001367907 | 65.4 |
Echinops telfairi | Lesser Hedgehog Tenrec | Mammalia | XP_004715283 | 63.7 |
Orcinus orca | Killer whale | Mammalia | XP_004283985 | 61.2 |
Dasypus novemcinctus | Nine banded armadillo | Mammalia | XP_004451950 | 58.2 |
Dipodomys ordii | Ord's Kangaroo Rat | Mammalia | XP_012883511 | 56.3 |
Myotis lucifugus | Little Brown Myotis | Mammalia | XP_006107033 | 39.3 |
Pelodiscus sinensis | Chinese softshell turtle | Reptillia | XP_014426235 | 38.5 |
Chelonia mydas | Green Sea Turtle | Reptillia | XP_007061837 | 37.1 |
Latimeria chalumnae | West Indian Ocean coelacanth (fish) | Sarcopterygii | XP_005992740 | 35.3 |
Anolis carolinensis | Green anole (lizard) | Reptillia | XP_008103227 | 33.1 |
Gekko japonicus | Japanese Gecko | Reptillia | XP_015262861 | 30.1 |
The most distant ortholog of C3orf62 are species of fish and amphibians. Orthologs of C3orf62 are not seen in birds, invertebrates, or bacteria. [27]
Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
C3orf70 also known as Chromosome 3 Open Reading Frame 70, is a 250aa protein in humans that is encoded by the C3orf70 gene. The protein encoded is predicted to be a nuclear protein; however, its exact function is currently unknown. C3orf70 can be identified with known aliases: Chromosome 3 Open Reading Frame 70, AK091454, UPF0524, and LOC285382.
Chromosome 11 open reading frame 86, also known as C11orf86, is a protein-coding gene in humans. It encodes for a protein known as uncharacterized protein C11orf86, which is predicted to be a nuclear protein. The function of this protein is currently unknown.
FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
FAM227A is a protein that in humans is encoded by FAM227A gene. Current studies have determined the location of this gene to be in the nuclear region of the cell. FAM227A is most highly expressed in the tissues of the fallopian tube, testis, and pituitary gland. FAM227A is present in species of mammals, birds and reptiles, and gene alignment sequences have shown that FAM227A is a rapidly evolving gene.
C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 20 open reading frame 144 (c20orf144) is a human protein-encoding gene. The human c20orf144 protein consists of 153 amino acids, with the first 150 amino acids being characterized as part of the Bcl-2 like protein of testis (Bclt) family.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.