Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. [5] It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). [6] There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version (analyzed throughout this page) is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1 (Attractin-like 1), which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool [7] (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.
A fairly highly conserved protein, MEGF8 has conserved orthologs from P. paniscus to N. vectensis. Orthologs are found in mammals, amphibians, fish, insects, crustaceans, and invertebrates. [8] Organization of the data showed that as time since divergence between humans and orthologs increased, the sequence identity decreased.
Genus/Species | Organism Common Name | Accession Number | Sequence Identity | Sequence Similarity | Length (AAs) |
Pan Paniscus | Pygmy Chimpanzee | XP_003811808 | 99% | 99% | 2778 |
Bos Mutus | Yak | XP_005909034 | 79% | 82% | 2842 |
Orcinius Orca | Orca Whale | XP_004271289 | 93% | 94% | 2789 |
Trichechus manatus latirostris | Florida Manatee | XP_004388865 | 88% | 89% | 2708 |
Leptonychotes weddellii | Weddell Seal | XP_006748348 | 91% | 92% | 2068 |
Rattus norvegicus | Rat | NP_446080.1 | 88% | 89% | 2789 |
Mus musculus | Mouse | NP_001153872.1 | 89% | 90% | 2789 |
Ophiophagus hannah | King Cobra | ETE71721 | 63% | 70% | 404 |
Alligator mississippiensis | American Alligator | XP_006273703 | 63% | 71% | 2793 |
Alligator sinensis | Chinese alligator | XP_006038171 | 67% | 75% | 2465 |
Xenopus tropicalis | Western clawed frog | XP_002936442 | 56% | 67% | 2730 |
Neolamprologus brichardi | African Cichlid | XP_006808273 | 55% | 67% | 2813 |
Danio rerio | Zebra fish | XP_005158088 | 54% | 66% | 2870 |
IIctalurus punctatus | Channel Catfish | AHI50432 | 54% | 77% | 2875 |
Oryzias latipes | Japanese Rice Fish | XP_004078282 | 54% | 67% | 2952 |
Apis mellifera | Western Honey Bee | XP_006568067 | 31% | 45% | 2913 |
Ceratitis capitata | Mediterranean Fruit Fly | JAB95791 | 32% | 45% | 2959 |
Daphnia pulex | Common Water Flea | EFX84934 | 35% | 48% | 2888 |
Strongylocentrotus purpuratus | Purple Sea Urchin | XP_789561 | 37% | 51% | 194 |
Nematostella vectensis | Starlet Sea Anemone | XP_001635521 | 38% | 51% | 2534 |
MEGF8 has one known paralog: ATRNL1. The ATRNL1 protein is approximately half the length of MEGF8, and contains several of the same conserved domains, including the CUB domain and transmembrane sequence. It is key to note that ATRNL1 is found in many birds and amphibians, where MEGF8 is not found in any birds, and only one amphibian.
Genomatix's ElDorado (http://www.genomatix.de/ Archived 2021-12-02 at the Wayback Machine ), a gene promoter database, predicted ten different possible promoters for megf8. The promoter having promoter ID number GXP_1262882 and transcript ID GXT_22531930, was predicted with the highest confidence. This promoter is located on the plus strand of chromosome 19, ranging from nucleotide 42829077 to 42830497, making it a 1421 nucleotide long sequence. The promoter sequence overlaps with the transcriptional start codon in the gene.
More than one hundred transcription factor binding sites were predicted to be found in the megf8 promoter region through Genomatix. The top twenty most confidently predicted factors include the following:
MEGF8 is composed of either 2845 amino acids (Isoform 1) or 2778 amino acids (Isoform 2). Isoform 2 undergoes a 67 amino acid removal from 700-766, which accounts for its shortened length; otherwise, the two isoforms are identical. Using SAPS, a Statistical Analysis of Protein Sequence [9] software, amino acid bias was able to be determined. Isoform one is rich in cysteine and glycine, and deficient in isoleucine and lysine. Isoform 2 of MEGF8 was found to have very high levels of cysteine, moderately high levels of glycine, and low levels of isoleucine and lysine. The high levels of cysteine residues contributes to the numerous disulfide bonds found in the mature protein's folded structure. Overall, MEGF8 has a pH between 6.4 and 7.0, depending on the organism's sequence. Human MEGF8's pH is 6.4. This nearly neutral pH enables the protein to fold properly and inhibits denaturation. The twenty most conserved amino acids, found through a multiple sequence alignment of 20 orthologs, were found to be located in the CUB and transmembrane domains.
Prediction software PELE [10] from UCSC Biology Workbench indicated that MEGF8 is primarily composed of beta-folded sheets, with occasional short alpha helix segments. PELE uses eight different prediction programs to compare and confirm predictions, enhancing the confidence level. The beta-folded sheets occur at many of the key domains, including the EGF-domains, kelch domains, and EGF-laminin domains. This information from PELE also corresponded with the secondary structure and 3D structure predictions made by PHYRE2 [11]
MEGF8 is predicted to contain several different types of features, domains, and motifs that play a key role in the protein's function, structure, and location. These are listed in Table 1. Functions, found through SMART [7] analysis, as well as NCBI Conserved Domains Search [12] include:
Feature, Domain, or Motif Name | Number in MEGF8 | Amino Acid Location Range (1-2785) |
Signal Peptide | 1 | 1-34 |
CUB Domain | 1 | 40-147 |
Epidermal Growth Factor (EGF) Domain | 6 | 148-177; 180-210; 1057-1100; 2121-2160; 2162-2190; 2200-2240 |
D1k3ia Structural Domain | 2 | 233-550; 1449-1801 |
Kelch Repeat | 9 | 241-276; 340-388; 454-504; 519-575; 1450-1492; 1505-1552; 1724-1764; 1780-1820; 2239-2255 |
Leucine Zipper Pattern | 1 | 1698-1719 |
PSI Domain | 6 | 787-839; 889-931; 945-1013; 1864-1919; 2008-2058; 2060-2117 |
EGF_Ca Domain | 1 | 1014-1055 |
EGF_Like Domain | 4 | 1103-1148; 1346-1485; 2244-2317; 2320-2381 |
EGF_LAM Domain | 1 | 1151-1199 |
Transmembrane Region | 1 | 2588-2610 |
One of the key attributes of MEGF8's tertiary structure is its 7-bladed beta propeller which is formed by the kelch motif found in its D1k3ia3 structural domain, which was identified by SCOP. SCOP [15] also indicated that the beta-propeller in MEGF8 is a member of the galactose oxidase super family. Each of the seven blades are made up of a four stranded beta-folded motifs. It is also important to note that although many phosphorylation sites are predicted at high confidence, several other topographic predictions (i.e. disulfide bonds, glycosylation, other extracellular features), do not support these predictions.
Feature | Number Predicted in MEGF8 | Amino Acid Location Range (1-2785) | Source |
Cysteine involved in Disulfide Bond | 99+ Possible Sites | - | DISULFIND [16] & UniProt |
SUMOylation | 3 (confidently) | K886; K1681; K1737 | SUMOplot [17] |
Phosphorylation | 116 | - | NetPhos [18] |
Internal Repeats | 1 | CQCNGH 1144-1149 & 2313-2318 | SAPS [19] |
N-linked Glycosylation | 20 | 56; 223; 267; 427; 699; 749; 968; 987; 1054; 1140; 1210; 1539; 1908; 1929; 2006; 2153; 2168; 2340; 2778 | NetNGlyc [20] |
Signal Peptide Cleavage | 1 | between amino acids 34 and 35 | SignalP [21] |
Hydrophobic Domain | 1 | 2588-2610 | SAPS |
Extracellular Domain | 1 | 1- 2587 | Phobius [22] |
Transmembrane Region | 1 | 2588-2610 | Phobius, SAPS, SMART |
Intracellular Domain | 1 | 2611-2785 | Phobius, SMART |
MEGF8 is found to be expressed at high levels in cardiac myocytes and fetal brain tissue, according to GeoProfiles, [23] from NCBI. This GeoProfile also indicated that MEGF8 was found to be at moderate to moderately low expression levels in all other tissues examined. NCBI GeoProfile data also provided the tissue expression graph for MEGF8 in humans, which is displayed to the right, further illustrating specific sites and levels of expression [24]
According to BioGPS [25] gene ontology information, MEGF8 is an active participant in receptor activity, calcium ion binding, protein binding.
Analysis of gene ontology information by BioGPS [25] was able to produce a list of biological processes in each of which MEGF8 plays a significant role:
In the table below, all predicted interactions, except SMARCD3, are supported by two-hybrid screen experimental data. This information is supported by both NextProt [26] database and IntAct database. [27] The two interactions with the highest confidence value are also supported by materials found by text-mining in STRING. [28] Together, it is with reasonably high confidence that the proteins in red are interacting with MEGF8, and with moderate confidence that the proteins in green interact with MEGF8. The confidence level for the proteins in blue is much lower, which may mean that the two-hybrid assay provided a false positive, or that they actually are interacting.
Predicted Interacting Protein | Confidence | Location | Description | Experimental/Text Support | Function | Source | |
GFI1B | Conf:0.866 | Found in Endothelial & Erythroid | GFI1B is a growth factor independent 1B transcription repressor | Two-Hybrid (IntAct) Text-mining (STRING/OMIM) | Essential proto-oncogenic transcriptional regulator; Transcriptional repressor or activator depending on both promoter and cell type context; represses promoter activity of SOCS1 and SOCS3 and thus, may regulate cytokine signaling pathways. | IntAct, STRING, NextProt | |
ATN1 | Conf: 0.538 | Everywhere | Atrophin 1 (ATN1) | Two Hybrid Assay | Transcriptional corepressor. Recruits NR2E1 to repress transcription. Promotes vascular smooth cell (VSMC) migration and orientation | IntAct, STRING | |
ATXN7 | Conf: 0.510 | Mod-High Everywhere | Apinocerebellar ataxia type 7 protein (ATXN7) | Two Hybrid, Pull-Down | Acts as component of the STAGA transcription coactivator-HAT complex. Mediates the interaction of STAGA complex with the CRX and is involved in CRX-dependent gene activation. Necessary for microtubule cytoskeleton stabilization | Int Act, NextProt | |
CACNA1A | Conf: 0.510 | Certain Brain Tissues | Calcium Channel, Voltage-Dependent, P/Q Type, Alpha 1A Subunit (Cav2.1) | Two Hybrid Assay, Pull-Down | Mediates the entry of calcium ions into excitable cells and are also involved in a variety of calcium-dependent processes, including muscle contraction, hormone or neurotransmitter release, gene expression, cell motility, cell division and cell death. | IntAct, NextProt | |
SMARCD3 | Conf: 0.778 | High Everywhere | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily d, member (SMARCD3) | Text-mining (OMIM article for SMARCS3) | Plays a role in ATP dependent nucleosome remodeling by SMARCA4 containing complexes. Stimulates nuclear receptor mediated transcription | STRING | |
FIHB1 | Conf: 0.370 | Two Hybrid Pooling | Uncharacterized | IntAct, NextProt | |||
Y3542 | Conf: 0.370 | (Q8CKF8 in UniProtKB) | Two Hybrid Pooling | Uncharacterized | IntAct, NextProt | ||
ProW | Conf: 0.370 | Two Hybrid Pooling | Uncharacterized | IntAct, NextProt |
The four primary splice variants and their distinctions are described below (labels correspond to those in image below):
A: has spliced out Exon 13. Looking at the attached working conceptual translation, it can be seen that exon 3 does not code for any feature, domain, motif or other functional section of aa, and is likely therefore not key to the function of MEGF8 protein. This is the variant that corresponds to the splice model of the analyzed megf8.
B: Spliced out exons 1-6; these exons hold several key domains and motifs including the CUB domain, two PSI domains, a D1k3ia3 structural domain, and a kelch repeat. This may result in a misfolded protein without the structural segments, and inhibit participation in development events (loss of PSI and CUB). Still has signal and TMEM so may still be able to partially function
C: part of the D1k3ia3 structural domain remains in exon 29, but the kelch repeat has been excised, which could lead to structural issues. Also this variant contains almost 3 PSI domains, and an area of low complexity in exons 32-35, which may allow this variant to function in the cell, but no signal or TMEM to place in membrane so not a normal function
D: This variant is exons 36-40, excised 41, and a shortened 42 exon. It possesses EGF calcium domains and EGF/EGF-like domains. Loss of 41 will drastically alter the function as it possesses the TMEM segment. It depends on where 41 is lost and 42 is cleaved.
There are several SNPs, found through NCBI GeneView, [29] that cause missense or silent mutations in MEGF8. However, three SNP mutations were identified as causes of Carpenter Syndrome 2 by Twigg et al. [30] The three SNP mutations are: Gly199 to Arg; Arg1499 to His; Ser2367 to Gly. The article by Twigg includes a supplementary data set that shows a multiple sequence alignment of the regions surrounding the SNPs and the domain in which the SNP lies. The Gly199 to Arg mutation is located inside an EGF-domain; the Arg1499 to His mutation is located within a kelch domain in the 7-bladed beta-sheet propeller; the Ser2367 to Gly is located within an EGF-Laminin domain. These domain are important to maintaining a properly folded protein and its function.
Visit Carpenter syndrome for more extensive details related to the disease. Genetic mutations in MEGF8 have been found to be a principal cause of this rare genetic syndrome.
Mutations in MEGF8 have been found to be linked to defective lateralization during development, as reported by Twigg et al. [30] Common features of individuals with Carpenter Syndrome Subtype II include the following:
There is no research being done currently to develop treatment or cures for Carpenter Syndrome 2. Researchers are still striving to understand the cause of the point mutations in MEGF8 that result in this extremely rare genetic disease.
L1, also known as L1CAM, is a transmembrane protein member of the L1 protein family, encoded by the L1CAM gene. This protein, of 200-220 kDa, is a neuronal cell adhesion molecule with a strong implication in cell migration, adhesion, neurite outgrowth, myelination and neuronal differentiation. It also plays a key role in treatment-resistant cancers due to its function. It was first identified in 1984 by M. Schachner who found the protein in post-mitotic mice neurons.
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.
Paired box protein Pax-6, also known as aniridia type II protein (AN2) or oculorhombin, is a protein that in humans is encoded by the PAX6 gene.
Betacellulin is a protein that in humans is encoded by the BTC gene located on chromosome 4 at locus 4q13-q21. Betacellulin was initially identified as a mitogen. Betacellulin, is a part of an Epidermal Growth Factor (EGF) family and functions as a ligand for the epidermal growth factor receptor (EGFR). The role of betacellulin as an EGF is manifested differently in various tissues, and it has a great effect on nitrogen signaling in retinal pigment epithelial cells and vascular smooth muscle cells. While many studies attest a role for betacellulin in the differentiation of pancreatic β-cells, the last decade witnessed the association of betacellulin with many additional biological processes, ranging from reproduction to the control of neural stem cells. Betacellulin is a member of the EGF family of growth factors. It is synthesized primarily as a transmembrane precursor, which is then processed to mature molecule by proteolytic events.
In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.
SNED1 is an extracellular matrix (ECM) protein expressed at low levels in a wide range of tissues. The gene encoding SNED1 is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein is 1413 amino-acid long. The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al. SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding EGF-like domains (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin (FN3) domains in the carboxy-terminal region.
Transmembrane protein 8A is a protein that in humans is encoded by the TMEM8A gene (16p13.3.). Evolutionarily, TMEM8A orthologs are found in primates and mammals and in a few more distantly related species. TMEM8A contains five transmembrane domains and one EGF-like domain which are all highly conserved in the ortholog space. Although there is no confirmed function of TMEM8A, through analyzing expression and experimental data, it is predicted that TMEM8A is an adhesion protein that plays a role in keeping T-cells in their resting state.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
Zinc finger protein 226 is a protein that in humans is encoded by the ZNF226 gene.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.
MIPOL1 , also known as CCDC193 , is a protein that in humans is encoded by the MIPOL1 gene. Mutation of this gene is associated with mirror-image polydactyly in humans, which is a rare genetic condition characterized by mirror-image duplication of digits.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
Transmembrane protein 39B (TMEM39B) is a protein that in humans is encoded by the gene TMEM39B. TMEM39B is a multi-pass membrane protein with eight transmembrane domains. The protein localizes to the plasma membrane and vesicles. The precise function of TMEM39B is not yet well-understood by the scientific community, but differential expression is associated with survival of B cell lymphoma, and knockdown of TMEM39B is associated with decreased autophagy in cells infected with the Sindbis virus. Furthermore, the TMEM39B protein been found to interact with the SARS-CoV-2 ORF9C protein. TMEM39B is expressed at moderate levels in most tissues, with higher expression in the testis, placenta, white blood cells, adrenal gland, thymus, and fetal brain.
OCEL1, also called Occludin//ELL Domain Containing 1, is a protein encoding gene located at chromosome 19p13.11 in the human genome. Other aliases for the gene include FLJ22709, FWP009, and S863-9. The function of OCEL1 has not yet been identified.
CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.