SMCO3 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||
Aliases | SMCO3 , C12orf69, single-pass membrane protein with coiled-coil domains 3 | ||||||||||||||||||||||||
External IDs | MGI: 2443451 HomoloGene: 79087 GeneCards: SMCO3 | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Orthologs | |||||||||||||||||||||||||
Species | Human | Mouse | |||||||||||||||||||||||
Entrez | |||||||||||||||||||||||||
Ensembl | |||||||||||||||||||||||||
UniProt | |||||||||||||||||||||||||
RefSeq (mRNA) | |||||||||||||||||||||||||
RefSeq (protein) | |||||||||||||||||||||||||
Location (UCSC) | Chr 12: 14.8 – 14.81 Mb | Chr 6: 136.83 – 136.84 Mb | |||||||||||||||||||||||
PubMed search | [3] | [4] | |||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||
|
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
SMCO3 has 2 aliases, C12orf69 and LOC440087.
SMCO3 is located on the negative strand of chromosome 12 (12p12.3) and spans 10,460 base pairs (chr12:14,803,723-14,814,182) [5] . It has 2 exons that flank a single intron [5] .
Chromosome 12 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome. Chromosome 12 spans about 133 million base pairs and represents between 4 and 4.5 percent of the total DNA in cells.
SMCO3 is flanked by WW domain binding protein 11 (WBP11) and Ecto-ADP-ribosyltransferase 4 (ART4) on the minus strand and overlaps with C12orf60 on the plus strand [6] . There is only a single isoform of this gene.
WW domain binding protein 1-like (WBP1L) also known as outcome predictor in acute leukemia 1 (OPA1L) is a protein that in humans is encoded by the WBP1L gene.
Ecto-ADP-ribosyltransferase 4 is an enzyme that in humans is encoded by the ART4 gene. ART4 has also been designated as CD297.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
SMCO3 is expressed in very low levels in several different human tissues including cervix, connective tissue, eye, lung and prostate [7] . This highest expression of SMCO3 is seen in the kidney, liver and spleen [8] . SMCO3 is also expressed at higher levels in cancers, especially chondrosarcoma and clear-cell renal cell carcinoma [7] [9] . SMCO3 expression is only seen in the fetus and adult and not in the embryoid bodies, blastocysts, infants and juveniles stages of development [7] .
Chondrosarcoma is a cancer composed of cells derived from transformed cells that produce cartilage. Chondrosarcoma is a member of a category of tumors of bone and soft tissue known as sarcomas. About 30% of skeletal system cancers are chondrosarcomas. It is resistant to chemotherapy and radiotherapy. Unlike other primary bone cancers that mainly affect children and adolescents, chondrosarcoma can present at any age. It more often affects the axial skeleton than the appendicular skeleton.
The Clear Cell Renal Cell Carcinoma (CCRCC) is a type of renal cell carcinoma.
The blastocyst is a structure formed in the early development of mammals. It possesses an inner cell mass (ICM) which subsequently forms the embryo. The outer layer of the blastocyst consists of cells collectively called the trophoblast. This layer surrounds the inner cell mass and a fluid-filled cavity known as the blastocoel. The trophoblast gives rise to the placenta. The name "blastocyst" arises from the Greek βλαστός blastos and κύστις kystis.
The expression of SMCO3 appears to depend upon the species, with the Mus musculus homolog of SMCO3 expressed at much higher levels in the eye compared to humans.
The house mouse is a small mammal of the order Rodentia, characteristically having a pointed snout, large rounded ears, and a long and hairy tail. It is one of the most abundant species of the genus Mus. Although a wild animal, the house mouse has benefited significantly from associating with human habitation to the point that truly wild populations are significantly less common than the semi-tame populations near human activity.
The promoter region of SMCO3 is 1,100 base pairs long and begins 961 base pairs upstream of the 5' UTR with the end of the promoter completely overlapping the first exon [10] .
There are 2,152 known nucleotide-level variants of which 27 are coding synonymous single nucleotide polymorphisms [11] . The vast majority of single nucleotide polymorphisms (SNPs) occur within the intron with only a quarter occurring translated regions. No SMCO3 variants are known to be associated with any disorder.
A single-nucleotide polymorphism is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of <1% in the population.
Region | Number of SNPs | % of SNPs |
---|---|---|
3' UTR | 299 | 13.9% |
5' UTR | 16 | <1% |
Exons | 234 | 10.8% |
Intron | 1603 | 74.5% |
The mRNA transcript of SMCO3 is 2,104 base pair long. There are no mRNA variants of SMCO3 [12] .
The SMCO3 promoter has many transcription factors binding sites including for cartilage homeoprotein 1, cAMP-responsive element binding proteins, PAR/bZIP family and vertebrate TATA binding protein factor.
SMCO3 is 225 amino acid long with a predicted molecular weight of 24.9 [13] . It is a slightly basic protein with a predicted isoelectric point of 8.3 [14] .
SMCO3 is comparably enriched in lysine and comparably poor in proline and phenylalanine compared to other human proteins [15] . SMCO3 contains several long, uncharged segments but does not have any significantly charged segments. Despite being a transmembrane protein there are no significantly hydrophobic regions nor any significantly hydrophilic regions [15] .
SMCO3 has a single domain, DUF4344 (aa15:221) which is currently uncharacterised [16] . C12orf60 also contains this domain. It contains a single transmembrane region (aa155-175) and has two coiled-coil regions (aa62-92, aa183-207) [17] . The C-terminus of SMCO3 contains a KKXX-like motif suggesting endoplasmic reticulum localisation [18] .
The secondary structure of SMCO3 consists of several α-helices and a single β-pleated sheet interspersed with disordered coiled coil regions [19] . in Orthologs of SMCO3 similarly show secondary structure dominated by alpha helices. There are no disulfide bridges predicted in the tertiary structure. [20]
The function of the SMCO3 protein is currently unknown.
The N-terminus of SMCO3 is cleaved, the first methionine residue removed and the N-terminus acetylated to improve stability. [21] Additionally there are several sites that are likely phosphorylated and a single N-linked glycosylation site which is typical in ER integral membrane proteins. [22] Unlike typical ER integral membrane proteins there is no amino-acid signal sequence. [23] [24]
SMCO3 contains a transmembrane domain (aa155-175). Additionally the KKXX-like motif highly suggest that it is an endoplasmic reticulum integral membrane protein [18] .
Two-hybrid assays have identified that SMCO3 interacts with five proteins: FUS RNA Binding Protein (FUS), mitogen-activated protein kinase 9 (MAPK9), STN1 subunit of CST complex (OBFC1), protein phosphatase 2 catalytic subunit alpha (PPP2CA) and tripartite motif containing 39 (TRIM39) [25] . However, it is not known to take part in any pathway although the structure indicates that it takes part in protein-protein interactions. [26] PP2CA, OBFC1, FUS1 and MAPK9 are all either implicated in cancer or have altered expression in cancer which suggests that SMCO3 may be useful as an eQTL for certain cancers.
Only 3.4% of SNPs were predicted to be deleterious, of which none had any clinical significance. [27]
GWAS showed no significant associations of SMCO3 with any disease or traits. SMCO3 is not known to be implicated in any disease. SMCO3 is expressed at higher levels in certain cancers, especially chondrosarcoma and clear-cell renal cell carcinoma [7] [9] .
The amino acid sequence of SMCO3 is highly conserved compared to other human proteins. There is dramatically lower levels of sequence divergence than expected, even compared to proteins known to have low levels of sequence divergence with time.
SMCO3 in largely conserved in amniotes. Orthologs have been identified in many mammals, reptiles and birds [28] . The closest ortholog is found in Pan troglodytes and has a 99.7% sequence similarity. More distant homologs have also been identified in a select few bony fish but orthologs are not seen in cartilaginous fish, insects or other invertabrates. No paralogs of SMCO3 in humans have been identified [28] .
Species | Common Name | Estimated Time of Divergence (MYA) | NCBI Accession Number | Sequence Length (aa) | Sequence Identity (%) |
---|---|---|---|---|---|
Homo sapiens | Humans | 0 | XP_016874801.1 | 225 | 100 |
Rhinopithecus roxellana | Golden snub nosed monkey | 29.44 | XP_010366768.1 | 225 | 94.7 |
Oryctolagus cuniculus | European rabbit | 90 | XP_002712692.1 | 225 | 91.1 |
Delphinapterus leucas | Beluga whale | 96 | XP_022433365.1 | 225 | 92.0 |
Phascolarctos cinereus | Koala | 159 | XP_020849872.1 | 225 | 80 |
Pygoscelis adeliae | Adaliae penguin | 312 | XP_009320673.1 | 225 | 59.6 |
Anolis carolinensis | Green anole | 312 | XP_016849216.1 | 227 | 53.8 |
Lepisosteus oculatus | Spotted Gar | 435 | XP_015199541.1 | 215 | 39.9 |
C16orf42, or chromosome 16 open reading frame 42, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long. and its cDNA has 1214 base pairs
Interferon-inducible GTPase 5 also known as immunity-related GTPase cinema 1 (IRGC1) is an enzyme that in humans is coded by the IRGC gene. It is predicted to behave like other proteins in the p47-GTPase-like and IRG families. It is most expressed in the testis.
Chromosome 16 open reading frame 13, also called C16orf13, is a protein-coding gene of unknown function, also known as JFP2. Though the function of this gene is unknown, various data have revealed that it is expressed at high levels in various cancerous tissues. Underexpression of this gene has also been linked to disease consequences in humans.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
Glutamate Rich Protein 5 is a protein in humans encoded by the ERICH5 gene, also known as Chromosome 8 open reading frame 47 (C8orf47).
Chromosome 10 open reading frame 67 (C10orf67), also known as C10orf115, LINC01552, and BA215C7.4, is an un-characterized human protein-coding gene. Several studies indicate a possible link between genetic polymorphisms of this and several other genes to chronic inflammatory barrier diseases such as Crohn’s Disease and sarcoidosis.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating possibility of regulated alternate expression.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
SMIM11A, or small integral membrane protein 11A, is a protein which in humans is encoded by the C21orf51 gene.