Glycan-Protein interactions represent a class of biomolecular interactions that occur between free or protein-bound glycans and their cognate binding partners. Intramolecular glycan-protein (protein-glycan) interactions occur between glycans and proteins that they are covalently attached to. Together with protein-protein interactions, they form a mechanistic basis for many essential cell processes, especially for cell-cell interactions and host-cell interactions. [2] For instance, SARS-CoV-2, the causative agent of COVID-19, employs its extensively glycosylated spike (S) protein to bind to the ACE2 receptor, allowing it to enter host cells. [3] The spike protein is a trimeric structure, with each subunit containing 22 N-glycosylation sites, making it an attractive target for vaccine search. [3] [4]
Glycosylation, i.e., the addition of glycans (a generic name for monosaccharides and oligosaccharides) to a protein, is one of the major post-translational modification of proteins contributing to the enormous biological complexity of life. Indeed, three different hexoses could theoretically produce from 1056 to 27,648 unique trisaccharides in contrast to only 6 peptides or oligonucleotides formed from 3 amino acids or 3 nucleotides respectively. [2] In contrast to template-driven protein biosynthesis, the "language" of glycosylation is still unknown, making glycobiology a hot topic of current research given their prevalence in living organisms. [2]
The study of glycan-protein interactions provides insight into the mechanisms of cell-signaling and allows to create better-diagnosing tools for many diseases, including cancer. Indeed, there are no known types of cancer that do not involve erratic patterns of protein glycosylation. [5]
The binding of glycan-binding proteins (GBPs) to glycans could be modeled with simple equilibrium. Denoting glycans as and proteins as :
With an associated equilibrium constant of
Which is rearranged to give dissociation constant following biochemical conventions:
Given that many GBPs exhibit multivalency, this model may be expanded to account for multiple equilibria:
Denoting cumulative equilibrium of binding with ligands as
With corresponding equilibrium constant:
And writing material balance for protein ( denotes the total concentration of protein):
Expressing the terms through an equilibrium constant, a final result is found:
The concentration of free protein is, thus:
If , i.e. there is only one carbohydrate receptor domain, the equation reduces to
With increasing the concentration of free protein decreases; hence, the apparent decreases too.
The chemical intuition suggests that the glycan-binding sites may be enriched in polar amino acid residues that form non-covalent interactions, such as hydrogen bonds, with polar carbohydrates. Indeed, a statistical analysis of carbohydrate-binding pockets shows that aspartic acid and asparagine residues are present twice as often as would be predicted by chance. [6] Surprisingly, there is an even stronger preference for aromatic amino acids: tryptophan has a 9-fold increase in prevalence, tyrosine a 3-fold one, and histidine a 2-fold increase. It has been shown that the underlying force is the interaction between the aromatic system and the in carbohydrate as shown in Figure 1. The interaction is identified if the °, the distance (distance from to ) is less than 4.5Å. [6]
This interaction strongly depends on the stereochemistry of the carbohydrate molecule. For example, consider the top () and bottom () faces of -D-Glucose and -D-Galactose. It has been shown that a single change in the stereochemistry at C4 carbon shifts preference for aromatic residues from side (2.7 fold preference for glucose) to the side (14 fold preference for galactose). [6]
The comparison of electrostatic surface potentials (ESPs) of aromatic rings in tryptophan, tyrosine, phenylalanine, and histidine suggests that electronic effects also play a role in the binding to glycans (see Figure 2). After normalizing the electron densities for surface area, the tryptophan still remains the most electron rich acceptor of interactions, suggesting a possible reason for its 9-fold prevalence in carbohydrate binding pockets. [6] Overall, the electrostatic potential maps follow the prevalence trend of .
There are many proteins capable of binding to glycans, including lectins, antibodies, microbial adhesins, viral agglutinins, etc.
Lectins is a generic name for proteins with carbohydrate-recognizing domains (CRD). Although it became almost synonymous with glycan-binding proteins, it does not include antibodies which also belong to the class.
Lectins found in plants and fungi cells have been extensively used in research as a tool to detect, purify, and analyze glycans. However, useful lectins usually have sub-optimal specificities. For instance, Ulex europaeus agglutinin-1 (UEA-1), a plant-extracted lectin capable of binding to human blood type O antigen, can also bind to unrelated glycans such as 2'-fucosyllactose, GalNAcα1-4(Fucα1-2)Galβ1-4GlcNAc, and Lewis-Y antigen. [7]
Although antibodies exhibit nanomolar affinities toward protein antigens, the specificity against glycans is very limited. [8] In fact, available antibodies may bind only <4% of the 7000 mammalian glycan antigens; moreover, most of those antibodies have low affinity and exhibit cross-reactivity. [9] [7]
In contrast with jawed vertebrates whose immunity is based on variable, diverse, and joining gene segments (VDJs) of immunoglobulins, the jawless invertebrates, such as lamprey and hagfish, create a receptor diversity by somatic DNA rearrangement of leucine-rich repeat (LRR) modules that are incorporate in *vlr* genes (variable leukocyte receptors). [10] Those LRR form 3D structures resembling curved solenoids that selectively bind specific glycans. [11]
A study from University of Maryland has shown that lamprey antibodies (lambodies) could selectively bind to tumor-associated carbohydrate antigens (such as Tn and TF) at nanomolar affinities. [9] The T-nouvelle antigen (Tn) and TF are present in proteins in as much as 90% of different cancer cells after post-translational modification, whereas in healthy cells those antigens are much more complex. A selection of lambodies that could bind to aGPA, a human erythrocyte membrane glycoprotein that is covered with 16 TF moieties, through magnetic-activated cell sorting (MACS) and fluorescence-activated cell sorting (FACS) has yielded a leucine-rich lambody VLRB.aGPA.23. This lambody selectively stained (over healthy samples) cells from 14 different types of adenocarcinomas: bladder, esophagus, ovary, tongue, cheek, cervix, liver, nose, nasopharynx, greater omentum, colon, breast, larynx, and lung. [9] Moreover, patients whose tissues stained positive with VLRB.aGPA.23 had a significantly smaller survival rate. [9]
A close look at the crystal structure of VLRB.aGPA.23 reveals a tryptophan residue at position 187 right over the carbohydrate binding pocket. [12]
Many glycan binding proteins (GBPs) are oligomeric and typically contain multiple sites for glycan binding (also called carbohydrate-recognition domains). The ability to form multivalent protein-ligand interactions significantly enhances the strength of binding: while values for individual CRD-glycan interactions may be in the mM range, the overall affinity of GBP towards glycans may reach nanomolar or even picomolar ranges. The overall strength of interactions is described as avidity (in contrast with an affinity which describes single equilibrium). Sometimes the avidity is also called an apparent to emphasize the non-equilibrium nature of the interaction. [13]
Common oligomerization structures of lectins are shown below. For example, galectins are usually observed as dimers, while intelectins form trimers and pentraxins assemble into pentamers. Larger structures, like hexameric Reg proteins, may assemble into membrane penetrating pores. Collectins may form even more bizarre complexes: bouquets of trimers or even cruciform-like structures (e.g. in SP-D). [14]
Given the importance of glycan-protein interactions, there is an ongoing research dedicated to the a) creation of new tools to detect glycan-protein interactions and b) using those tools to decipher the so-called sugar code.
One of the most widely used tools for probing glycan-protein interactions is glycan arrays. A glycan array usually is an NHS- or epoxy-activated glass slides on which various glycans were printed using robotic printing. [15] [16] These commercially available arrays may contain up to 600 different glycans, specificity of which has been extensively studied. [17]
Glycan-protein interactions may be detected by testing proteins of interest (or libraries of those) that bear fluorescent tags. The structure of the glycan-binding protein may be deciphered by several analytical methods based on mass-spectrometry, including MALDI-MS, LC-MS, tandem MS-MS, and/or 2D NMR. [18]
Computational methods have been applied to search for parameters (e.g. residue propensity, hydrophobicity, planarity) that could distinguish glycan-binding proteins from other surface patches. For example, a model trained on 19 non-homologous carbohydrate binding structures was able to predict carbohydrate-binding domains (CRDs) with an accuracy of 65% for non-enzymatic structures and 87% for enzymatic ones. [19] Further studies have employed calculations of Van der Waals energies of protein-probe interactions and amino acid propensities to identify CRDs with 98% specificity at 73% sensitivity. [20] More recent methods can predict CRDs even from protein sequences, by comparing the sequence with those for which structures are already known. [21]
In contrast with protein studies, where a primary protein structure is unambiguously defined by the sequence of nucleotides (the genetic code), the glycobiology still cannot explain how a certain "message" is encoded using carbohydrates or how it is "read" and "translated" by other biological entities.
An interdisciplinary effort, combining chemistry, biology, and biochemistry, studies glycan-protein interactions to see how different sequences of carbohydrates initiate different cellular responses. [22]
Glycomics is the comprehensive study of glycomes, including genetic, physiologic, pathologic, and other aspects. Glycomics "is the systematic study of all glycan structures of a given cell type or organism" and is a subset of glycobiology. The term glycomics is derived from the chemical prefix for sweetness or a sugar, "glyco-", and was formed to follow the omics naming convention established by genomics and proteomics.
Glycoproteins are proteins which contain oligosaccharide chains covalently attached to amino acid side-chains. The carbohydrate is attached to the protein in a cotranslational or posttranslational modification. This process is known as glycosylation. Secreted extracellular proteins are often glycosylated.
A glycome is the entire complement or complete set of all sugars, whether free or chemically bound in more complex molecules, of an organism. An alternative definition is the entirety of carbohydrates in a cell. The glycome may in fact be one of the most complex entities in nature. "Glycomics, analogous to genomics and proteomics, is the systematic study of all glycan structures of a given cell type or organism" and is a subset of glycobiology.
The Consortium for Functional Glycomics (CFG) is a large research initiative funded in 2001 by a glue grant from the National Institute of General Medical Sciences (NIGMS) to “define paradigms by which protein-carbohydrate interactions mediate cell communication”. To achieve this goal, the CFG studies the functions of:
Defined in the narrowest sense, glycobiology is the study of the structure, biosynthesis, and biology of saccharides that are widely distributed in nature. Sugars or saccharides are essential components of all living things and aspects of the various roles they play in biology are researched in various medical, biochemical and biotechnological fields.
Lectins are carbohydrate-binding proteins that are highly specific for sugar groups that are part of other molecules, so cause agglutination of particular cells or precipitation of glycoconjugates and polysaccharides. Lectins have a role in recognition at the cellular and molecular level and play numerous roles in biological recognition phenomena involving cells, carbohydrates, and proteins. Lectins also mediate attachment and binding of bacteria, viruses, and fungi to their intended targets.
Glycosylation is the reaction in which a carbohydrate, i.e. a glycosyl donor, is attached to a hydroxyl or other functional group of another molecule in order to form a glycoconjugate. In biology, glycosylation usually refers to an enzyme-catalysed reaction, whereas glycation may refer to a non-enzymatic reaction.
An oligosaccharide is a saccharide polymer containing a small number of monosaccharides. Oligosaccharides can have many functions including cell recognition and cell adhesion.
Fucose is a hexose deoxy sugar with the chemical formula C6H12O5. It is found on N-linked glycans on the mammalian, insect and plant cell surface. Fucose is the fundamental sub-unit of the seaweed polysaccharide fucoidan. The α(1→3) linked core of fucoidan is a suspected carbohydrate antigen for IgE-mediated allergy.
The terms glycans and polysaccharides are defined by IUPAC as synonyms meaning "compounds consisting of a large number of monosaccharides linked glycosidically". However, in practice the term glycan may also be used to refer to the carbohydrate portion of a glycoconjugate, such as a glycoprotein, glycolipid, or a proteoglycan, even if the carbohydrate is only an oligosaccharide. Glycans usually consist solely of O-glycosidic linkages of monosaccharides. For example, cellulose is a glycan composed of β-1,4-linked D-glucose, and chitin is a glycan composed of β-1,4-linked N-acetyl-D-glucosamine. Glycans can be homo- or heteropolymers of monosaccharide residues, and can be linear or branched.
In molecular biology and biochemistry, glycoconjugates are the classification family for carbohydrates – referred to as glycans – which are covalently linked with chemical species such as proteins, peptides, lipids, and other compounds. Glycoconjugates are formed in processes termed glycosylation.
Siglecs(Sialic acid-binding immunoglobulin-type lectins) are cell surface proteins that bind sialic acid. They are found primarily on the surface of immune cells and are a subset of the I-type lectins. There are 14 different mammalian Siglecs, providing an array of different functions based on cell surface receptor-ligand interactions.
Galectins are a class of proteins that bind specifically to β-galactoside sugars, such as N-acetyllactosamine, which can be bound to proteins by either N-linked or O-linked glycosylation. They are also termed S-type lectins due to their dependency on disulphide bonds for stability and carbohydrate binding. There have been about 15 galectins discovered in mammals, encoded by the LGALS genes, which are numbered in a consecutive manner. Only galectin-1, -2, -3, -4, -7, -7B, -8, -9, -9B, 9C, -10, -12, -13, -14, and -16 have been identified in humans. Galectin-5 and -6 are found in rodents, whereas galectin-11 and -15 are uniquely found in sheep and goats. Members of the galectin family have also been discovered in other mammals, birds, amphibians, fish, nematodes, sponges, and some fungi. Unlike the majority of lectins they are not membrane bound, but soluble proteins with both intra- and extracellular functions. They have distinct but overlapping distributions but found primarily in the cytosol, nucleus, extracellular matrix or in circulation. Although many galectins must be secreted, they do not have a typical signal peptide required for classical secretion. The mechanism and reason for this non-classical secretion pathway is unknown.
The mannose receptor is a C-type lectin primarily present on the surface of macrophages, immature dendritic cells and liver sinusoidal endothelial cells, but is also expressed on the surface of skin cells such as human dermal fibroblasts and keratinocytes. It is the first member of a family of endocytic receptors that includes Endo180 (CD280), M-type PLA2R, and DEC-205 (CD205).
C-type lectin domain family 10 member A (CLEC10A) also designated as CD301 is a protein that in humans is encoded by the CLEC10A gene. CLEC10A is part of the C-type lectin superfamily and binds to N-Acetylgalactosamine (GalNAc). It is mainly expressed on myeloid cells and also on oocytes and very early stages of embryogenesis. CLEC10A is used as a marker of the CD1c+ dendritic cell subgroup, also called cDC2. The actions of CLEC10A are diverse, depending on the ligand and environment.
Carbohydrate–protein interactions are the intermolecular and intramolecular interactions between protein and carbohydrate moieties. These interactions form the basis of specific recognition of carbohydrates by lectins. Carbohydrates are important biopolymers and have a variety of functions. Often carbohydrates serve a function as a recognition element. That is, they are specifically recognized by other biomolecules. Proteins which bind carbohydrate structures are known as lectins. Compared to the study of protein–protein and protein–DNA interaction, it is relatively recent that scientists get to know the protein–carbohydrate binding.
BanLec is a lectin from the jacalin-related lectin family isolated from the fruit of the bananas Musa acuminata and Musa balbisiana. BanLec is one of the predominant proteins in the pulp of ripe bananas and has binding specificity for mannose and mannose-containing oligosaccharides. A 2010 study reported that BanLec was a potent inhibitor of HIV replication.
Translational glycobiology or applied glycobiology is the branch of glycobiology and glycochemistry that focuses on developing new pharmaceuticals through glycomics and glycoengineering. Although research in this field presents many difficulties, translational glycobiology presents applications with therapeutic glycoconjugates, with treating various bone diseases, and developing therapeutic cancer vaccines and other targeted therapies. Some mechanisms of action include using the glycan for drug targeting, engineering protein glycosylation for better efficacy, and glycans as drugs themselves.
Glycan arrays, like that offered by the Consortium for Functional Glycomics (CFG), National Center for Functional Glycomics (NCFG) and Z Biotech, LLC, contain carbohydrate compounds that can be screened with lectins, antibodies or cell receptors to define carbohydrate specificity and identify ligands. Glycan array screening works in much the same way as other microarray that is used for instance to study gene expression DNA microarrays or protein interaction Protein microarrays.
Ten Feizi is a Turkish Cypriot/British molecular biologist who is Professor and Director of the Glycosciences Laboratory at Imperial College London. Her research considers the structure and function of glycans. She was awarded the Society for Glycobiology Rosalind Kornfeld award in 2014. She was also awarded the Fellowship of the Academy of Medical Sciences in 2021.
{{cite book}}
: CS1 maint: DOI inactive as of January 2024 (link){{cite book}}
: CS1 maint: DOI inactive as of January 2024 (link)