SNED1 (Sushi, Nidogen, and EGF-like Domains) is an extracellular matrix (ECM) protein expressed at low levels in a wide range of tissues. The gene encoding SNED1 is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein is 1413 amino-acid long. The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al. [1] SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding EGF-like domains (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin (FN3) domains in the carboxy-terminal region.
SNED1 is located on the plus strand of chromosome 2 at locus 2q37.3. The Refseq identification number is NM_001080437.3 The genomic DNA sequence of SNED1 contains 98,159bp and the longest spliced mRNA as predicted by AceView is 7048bp and contains 31 exons. There are 9 predicted splice variants of SNED1 that exhibited protein structure matches using the Phyre 2 database which is discussed under "Tertiary and Quaternary Structure". [2]
SNED1 is an acronym for Sushi, Nidogen, and EGF-like Domains 1. Obsolete aliases for SNED1 include Snep, SST3, and IRE-BP1. [2]
SNED1 is highly conserved throughout evolutionary history and is shown to exhibit this conservation across vertebrates including fish, reptiles, amphibians, birds, and mammals. [3] It is unclear that SNED1 is conserved in invertebrates, but protein domains found in SNED1 are also found in invertebrates. [3] It may be worth noting that the abundance of cysteine residues, mostly located within EGF-like domains where they form disulfide bonds, appears to be very highly conserved, suggesting that the cysteine richness is a very important feature of this protein. [4]
SNED1 has several paralogs within the human genome, which cover small portions of the entire peptide sequence. Genes encoding proteins sharing domains (EGF-like, Sushi) with SNED 1 include the neurogenic locus notch homolog (NOTCH) proteins, the jagged proteins, eyes shut homolog proteins, the crumbs homolog proteins, delta and notch-like epidermal growth factor receptors, the sushi von Wilebrand factor A protein (SVEP1), and slit homolog three protein.
The Protein Knowledge Database, UniProt, reports that the full length SNED1 protein is 1413 amino-acid long (UniProt Q8TER0).
The full sequence obtained by an NCBI BLAST search can be accessed with the reference ID NP_001073906.1. One presumably important feature of this protein that is worth noting is that it is extraordinarily cysteine rich, with 107 cysteines total, giving an overall cysteine composition of 13.2%. [5]
SNED1 is a secreted protein of the extracellular matrix. It contains a signal peptide (amino acid 1-24) directing the protein to the secretory pathway. [5]
Precise prediction of domain boundaries can be obtained using the InterPro domain database or SMART.
There are various interesting domains in this protein. [1] [3] [5] The first in the annotated sequence above shown in pink, is the NIDO domain, also found in the Nidogen-1 protein, also known as Entactin. Other than SNED1, this domain is shared with only four human proteins: the basement membrane proteins nidogen-1, nidogen-2, and alpha-tectorin; and mucin-4, which has been demonstrated to play a role in promoting pancreatic cancer metastasis. [6] [7]
The second regions of interest shown by an underline are calcium-binding EGF domain (EGF-CA). There are many of these domains in the sequence and they are often present in a large number of membrane bound and extracellular proteins. These EGF-CA domains may suggest a "sticky" nature to this protein as oftentimes extracellular matrix (ECM) proteins require calcium cations to form homo- and hetero-dimeric complexes between other ECM proteins. The Sushi domain or complement control protein (CCP) motif is annotated in green in the figure and this domain has been identified in many proteins involved in the complement system. Other aliases for this domain include short consensus repeats (SCRs) and the Sushi domain, from which the protein gets its name. The Fibronectin type III domain (FN3) is annotated in blue and the presence of this domain may suggest one of the properties of this protein as being involved in cell adhesion. SNED1 contains an RGD and a LDV sequence, important in the binding of other ECM proteins to integrins that are proteins found in cell membranes, an mediate cell-ECM interactions. [5]
13 N-glycosylation sites are predicted in the sequence of SNED1, and the presence of N-linked sites has been determined experimentally. [5] SNED1 also has several predicted attachment sites for O-linked glycans and glycosaminoglycans, but these have not yet been validated experimentally at this time.
There was only a few post-translational kinase dependant phosphorylation sites worth noting that resulted in a score of >0.8 by the NetPhosK program in the ExPASy Bioinformatics suite proteomics tools. These sites are annotated with yellow highlight in the conceptual translation above. All of these sites are predicted to be phosphorylated by either Protein kinase A (PKA) or Protein kinase C (PKC). Experimental evidence exists for phosphorylation at 12 residues: 5 serine, 5 threonine, and 2 tyrosine residues. [5]
The amino acid sequence of the longest variant is incredibly cysteine rich, presumably resulting in a large amount of disulfide bond formation. The beta sheets are annotated as purple text in the conceptual translation and the alpha-helices are annotated as red text.
The percentage of intrinsic disorder of processed human SNED1 (residues 25–1413) predicted by IUPred2A is 15.3%. [5] A large proportion of random coil (73%) was predicted in SNED1 together with 26% of β-strands, and 1% of helix corresponding to a sequence found in the amino-terminal region of SNED1 [5]
[This section needs referencing to figures and experimental demonstration] The program Phyre2 was used to construct predictions of both the conserved domain regions NIDO, CCP, and FN3, as well as each of the splice variants. There were some interesting results consistent with the proposed function of an extracellular "sticky" protein possibly involved in cell-cell adhesion or in clotting. Protein matches found in Phyre2 comprise an array of proteins with functions of; clotting, hydrolysis, plasminogen activation, hormone/growth factor, protein binding, cell-adhesion, and ECM proteins. Splice variants a, b, and e, ihave >99% structural similarity to the protein neurexin 1-alpha (NRXN1). Neurexins are cell adhesion molecules and often contain EGF binding domains, enhancing intracellular junction forming between cells. NRXN1 is also proposed to play a role in angiogenesis. Alpha-neurexins interact with neurexophilins and possibly function in the synaptic junctions of the vertebrate nervous system. Alpha neurexins often utilize alternate promoters and splice sites, resulting in many different transcripts from one gene, may be an explanation of this gene's abundance of alternative transcripts. Splice variant d has a 100% structural match to Low density lipoprotein receptor-related protein 4 (LRP4). This protein is involved in SOST-mediated bone formation inhibition and inhibition of Wnt signaling. LRP4 plays an important role in the formation of neuromuscular junctions. Splice variants f and g have >99% similarity to fibrillin-1, an ECM protein that is a structural component of calcium binding microfibrils. Splice variant i and conserved domain CCP are >99% structurally similar to t-plasminogen activator (PLAT). PLAT is secreted by vascular endothelial cells and acts as a serine protease that converts plasminogen to plasmin. Plasmin is a fibrolytic enzyme that aids in the breakdown of blood clots and is used clinically for that exact purpose. The conserved domain NIDO, was >99% similar to coagulation factor IX, also known as Factor IX (F9). F9 is a secreted coagulation factor involved in the clotting cascade that required activation by multiple other coagulation factors within the cascade. The 3 consecutive conserved FN3 domains together are 100% similar with 100% coverage to anosmin 1. Anosmin-1 is an ECM glycoprotein responsible for normal neural development of the brain, spinal cord and kidney.
Computational prediction by several databases, focusing on secreted proteins and membrane proteins, resulted in the prediction of 114 unique interactions by at least one algorithm, including SNED1 auto-interaction. [5] More than half of the protein partners of SNED1 were annotated as membrane proteins in UniProtKB. 47 extracellular proteins were identified as SNED1 binding partners, including 30 core matrisome proteins, [8] 10 matrisome-associated proteins, and seven secreted proteins. Among the 30 matrisome proteins are 6 collagens: COL6A3, found in basement membranes and other ECMs, COL7A1, and the Fibril-Associated Collagens with Interrupted triple-helices (FACITS), all containing a thrombospondin domain, COL12A1, COL14A1, COL16A1, COL20A1); and a number of ECM glycoproteins: 4 tenascins (TNC, TNN, TNR, and TNXB), fibronectin (FN1), the latent-TGFβ binding protein 2 (LTBP2), and the basement membrane glycoproteins nidogens 1 and 2. [5]
Independently, the STRING-Known and Predicted Protein Interaction database was used to determine proteins that may be interacting and the following proteins were candidates for interaction: somatostatin (SST), somatostatin receptor 2 (SSTR2)as well as a variety of other somatostatin receptors, [9] spermine synthase (SMS), and TMEM132C. All of the somatostatin related proteins are involved in the inhibition of hormones. There is very little known about TMEM132C and all publications related to the protein are mass genome screens. The protein expression profiles of TMEM132C and SNED1 are very similar to SNED1, with protein abundance found in blood plasma, platelets, and liver. All of the interacting proteins described are expressed in these three common areas.
SNED1 is ubiquitously expressed at low to intermediate levels in adult tissues, making it unclear from RNA expression profiles, which cells are secreting SNED1 in tissues. Experimental data obtained in mice have shown that the Sned1 promoter is broadly active during embryogenesis, particularly in the limb buds, tail, sclerotome, vertebrate and ribs, lung, kidney, adrenal gland, cerebellum, choroid plexus, and head mesenchyme. [1] [3] The protein expression profiles of SNED1 predicted with MOPED-Multi-Omics Profiling Expression Database and PaxB-Protein Abundance Across Organisms database indicate that the protein is found in blood serum, blood plasma, blood T-lymphocytes, platelets, kidney Hek-293 cells, liver, and low levels in the brain.
The program Aceview was used to predict transcript variants, shown in Figure 6. There are 9 spliced forms and 3 unspliced forms. Three of the transcript variants, b, c, and e, contain green regions that represent uORFs which indicate that they contain regulatory elements within the coding region of the transcript. All of the spliced transcript variants a-i were analyzed with the Phyre2 server to predict protein structure. See, "Tertiary and Quaternary Structure". The existence of the splice variants are has not been yet validated experimentally.
The promoter was predicted and analyzed for transcription factor binding sites using the ElDorado software on the Genomatix software suite. There were alternative promoters downstream of the selected 845bp promoter.
The following transcription factors were found with a matrix similarity of 1.00 and the entire binding domain was matched in the ElDorado predicted promoter.
Matrix Family | Detailed Family Information | Matrix | Detailed Matrix information | Strand | Matrix similarity | Sequence |
---|---|---|---|---|---|---|
BRAC | Brachyury gene, mesoderm developmental factor | TBX20.01 | T-box transcription factor TBX20 | (-) | 1.00 | gcatcgcggAGGTgtgcgggcgg |
TF2B | RNA polymerase II transcription factor II B | BRE.01 | Transcription factor II B (TFIIB) recognition element | (-/+) | 1.00 | ccgCGCC |
XCPE | Activator-, mediator-, and TBO-dependent core promoter element for RNA polymerase II transcription from TATA-less promoter | XCPE1.01 | X gene core promoter element 1 | (-) | 1.00 | ggGCGGgaccg |
ZF02 | C2H2 zinc finger transcription factors 2 | ZKSCAN3.01 | Zinc finger with KRAB and SCAN domains 3 | (+) | 1.00 | catggCCCCaccacagggcgcgc |
SP1F | GC-Box factors SP1/GC | SP1.03 | Stimulating protein 1, ubiquitous zinc finger transcription factor | (-) | 1.00 | cggggGGGCggggccat |
PLAG | Pleomorphic adeoma gene | PLAG1.02 | Pleomorphic adeoma gene 1 | (+) | 1.00 | aaGGGGgcagcacggaacgggtt |
A select cases on NCBI's GeoProfiles highlighted some clinically relevant expression data regarding SNED1 expression levels in response to certain conditions. In aldosterone producing adenoma versus control lung tissue, SNED1 expression decreased about 25 fold in the adenoma tissue. In a development study on the transition from oligodendrocyte precursors to mature oligodendrocytes, expression decreased almost 100 fold upon differentiation into mature oligodendrocytes. It may be interesting to explore the expression in clotting disorders or other blood related diseases. A seminal study published in 2014 has demonstrated that SNED1 was a promoter of breast cancer metastasis. [10] [11]
The recent generation of a Sned1 knockout mouse model is also shedding light on the multiple roles of SNED1 in development and physiology. [3] The global Sned1 knockout leads to early post-natal lethality and severe craniofacial and skeletal anomalies, indicating that Sned1 is an essential gene. [3]
Integrins are transmembrane receptors that facilitate cell-cell and cell-extracellular matrix (ECM) adhesion. Upon ligand binding, integrins activate signal transduction pathways that mediate cellular signals such as regulation of the cell cycle, organization of the intracellular cytoskeleton, and movement of new receptors to the cell membrane. The presence of integrins allows rapid and flexible responses to events at the cell surface.
Nidogen-1 (NID-1), formerly known as entactin, is a protein that in humans is encoded by the NID1 gene. Both nidogen-1 and nidogen-2 are essential components of the basement membrane alongside other components such as type IV collagen, proteoglycans, laminin and fibronectin.
Laminins are high-molecular weight proteins of the extracellular matrix. They are a major component of the basal lamina, a protein network foundation for most cells and organs. The laminins are an important and biologically active part of the basal lamina, influencing cell differentiation, migration, and adhesion.
Fibulin (FY-beau-lin) is the prototypic member of a multigene family, currently with seven members. Fibulin-1 is a calcium-binding glycoprotein. In vertebrates, fibulin-1 is found in blood and extracellular matrices. In the extracellular matrix, fibulin-1 associates with basement membranes and elastic fibers. The association with these matrix structures is mediated by its ability to interact with numerous extracellular matrix constituents including fibronectin, proteoglycans, laminins and tropoelastin. In blood, fibulin-1 binds to fibrinogen and incorporates into clots.
Neurexins (NRXN) are a family of presynaptic cell adhesion proteins that have roles in connecting neurons at the synapse. They are located mostly on the presynaptic membrane and contain a single transmembrane domain. The extracellular domain interacts with proteins in the synaptic cleft, most notably neuroligin, while the intracellular cytoplasmic portion interacts with proteins associated with exocytosis. Neurexin and neuroligin "shake hands," resulting in the connection between the two neurons and the production of a synapse. Neurexins mediate signaling across the synapse, and influence the properties of neural networks by synapse specificity. Neurexins were discovered as receptors for α-latrotoxin, a vertebrate-specific toxin in black widow spider venom that binds to presynaptic receptors and induces massive neurotransmitter release. In humans, alterations in genes encoding neurexins are implicated in autism and other cognitive diseases, such as Tourette syndrome and schizophrenia.
Matrilysin also known as matrix metalloproteinase-7 (MMP-7), pump-1 protease (PUMP-1), or uterine metalloproteinase is an enzyme in humans that is encoded by the MMP7 gene. The enzyme has also been known as matrin, putative metalloproteinase-1, matrix metalloproteinase pump 1, PUMP-1 proteinase, PUMP, metalloproteinase pump-1, putative metalloproteinase, MMP). Human MMP-7 has a molecular weight around 30 kDa.
Fibrillin-1 is a protein that in humans is encoded by the FBN1 gene, located on chromosome 15.
The EGF-like domain is an evolutionary conserved protein domain, which derives its name from the epidermal growth factor where it was first described. It comprises about 30 to 40 amino-acid residues and has been found in a large number of mostly animal proteins. Most occurrences of the EGF-like domain are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted. An exception to this is the prostaglandin-endoperoxide synthase. The EGF-like domain includes 6 cysteine residues which in the epidermal growth factor have been shown to form 3 disulfide bonds. The structures of 4-disulfide EGF-domains have been solved from the laminin and integrin proteins. The main structure of EGF-like domains is a two-stranded β-sheet followed by a loop to a short C-terminal, two-stranded β-sheet. These two β-sheets are usually denoted as the major (N-terminal) and minor (C-terminal) sheets. EGF-like domains frequently occur in numerous tandem copies in proteins: these repeats typically fold together to form a single, linear solenoid domain block as a functional unit.
Neurexin-1-alpha is a protein that in humans is encoded by the NRXN1 gene.
Fibulin-2 is a protein that in humans is encoded by the FBLN2 gene.
EGF-containing fibulin-like extracellular matrix protein 2 is a protein that in humans is encoded by the EFEMP2 gene.
Neurexin-2-alpha is a protein that in humans is encoded by the NRXN2 gene.
Neurexin-3-alpha is a protein that in humans is encoded by the NRXN3 gene.
Neuroligin (NLGN), a type I membrane protein, is a cell adhesion protein on the postsynaptic membrane that mediates the formation and maintenance of synapses between neurons. Neuroligins act as ligands for β-Neurexins, which are cell adhesion proteins located presynaptically. Neuroligin and β-neurexin "shake hands", resulting in the connection between two neurons and the production of a synapse. Neuroligins also affect the properties of neural networks by specifying synaptic functions, and they mediate signalling by recruiting and stabilizing key synaptic components. Neuroligins interact with other postsynaptic proteins to localize neurotransmitter receptors and channels in the postsynaptic density as the cell matures. Additionally, neuroligins are expressed in human peripheral tissues and have been found to play a role in angiogenesis. In humans, alterations in genes encoding neuroligins are implicated in autism and other cognitive disorders.
YWTD repeats are four-stranded beta-propeller repeats found in low-density lipoprotein receptors (LDLR). The six YWTD repeats together fold into a six-bladed beta-propeller. Each blade of the propeller consists of four antiparallel beta-strands; the innermost strand of each blade is labeled 1 and the outermost strand, 4. The sequence repeats are offset with respect to the blades of the propeller, such that any given 40-residue YWTD repeat spans strands 24 of one propeller blade and strand 1 of the subsequent blade. This offset ensures circularization of the propeller because the last strand of the final sequence repeat acts as an innermost strand 1 of the blade that harbors strands 24 from the first sequence repeat. The repeat is found in a variety of proteins that include, vitellogenin receptor from Drosophila melanogaster, low-density lipoprotein (LDL) receptor, preproepidermal growth factor, and nidogen (entactin).
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.
Transmembrane protein 8A is a protein that in humans is encoded by the TMEM8A gene (16p13.3.). Evolutionarily, TMEM8A orthologs are found in primates and mammals and in a few more distantly related species. TMEM8A contains five transmembrane domains and one EGF-like domain which are all highly conserved in the ortholog space. Although there is no confirmed function of TMEM8A, through analyzing expression and experimental data, it is predicted that TMEM8A is an adhesion protein that plays a role in keeping T-cells in their resting state.
Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1, which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.
TM6SF2 is the Transmembrane 6 superfamily 2 human gene which codes for a protein by the same name. This gene is otherwise called KIAA1926. Its exact function is currently unknown.
Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.