Glycoinformatics is a field of bioinformatics that pertains to the study of carbohydrates involved in protein post-translational modification. It broadly includes (but is not restricted to) database, software, and algorithm development for the study of carbohydrate structures, glycoconjugates, enzymatic carbohydrate synthesis and degradation, as well as carbohydrate interactions. Conventional usage of the term does not currently include the treatment of carbohydrates from the better-known nutritive aspect.
Even though glycosylation is the most common form of protein modification, with highly complex carbohydrate structures, the bioinformatics on glycome is still very poor. [2] [3]
Unlike proteins and nucleic acids which are linear, carbohydrates are often branched and extremely complex. [4] For instance, just four sugars can be strung together to form more than 5 million different types of carbohydrates [5] or nine different sugars may be assembled into 15 million possible four-sugar-chains. [6]
Also, the number of simple sugars that make up glycans is more than the number of nucleotides that make up DNA or RNA. Therefore, it is more computationally expensive to evaluate their structures. [7]
One of the main constrains in the glycoinformatics is the difficulty of representing sugars in the sequence form especially due to their branching nature. [6] Owing to the lack of a genetic blue print, carbohydrates do not have a "fixed" sequence. Instead, the sequence is largely determined by the presence of a variety of enzymes, their kinetic differences and variations in the biosynthetic micro-environment of the cells. This increases the complexity of analysis and experimental reproducibility of the carbohydrate structure of interest. [8] It is for this reason that carbohydrates are often considered as the "information poor" molecules.
Table of major glyco-databases. [9] [10]
Database | Description | URL |
---|---|---|
GlycomeDB (outdated) | Portal for glycan structures that have been integrated from several of the major glycan-related databases. | http://www.glycome-db.org |
GLYCOSCIENCES.de | One of the earliest databases of glycan structure data, also includes NMR data and literature references. | https://web.archive.org/web/20180521104202/http://www.glycosciences.de/ |
Consortium for Functional Glycomics (CFG) | Glycan structures, glycan binding affinity data, glycan profiling data from MALDI-TOF analysis, knock-out mouse phenotype data and glyco-enzyme expression data. | http://www.functionalglycomics.org |
Japanese Consortium for Glycobiology and Glycotechnology Database (JCGGDB) | A comprehensive database portal for major glyco- related databases in Japan, including mass spectral data of glycan profiles, lectin array data, glycoproteindata, glycogene information including disease information, etc. | http://jcggdb.jp |
KEGG GLYCAN | Glycan structures and their pathway data, including glycogene information as organized by the KEGGORTHOLOGY. | http://www.genome.jp/kegg/glycan/ |
GlyConnect | Curated glycomic and glycoproteomic structural and site information based on published data in scientific journal. [11] | https://glyconnect.expasy.org |
UniCarb-DB | Curated tandem MS spectra with associated glycan structures. [12] | https://unicarb-db.expasy.org |
UniCarb-DR | Public repository for tandem MS spectra with associated glycan structures for MIRAGE compatible submission of glycomic data for supplementing glycomic publications. [13] | https://unicarb-dr.glycosmos.org |
GlyGen | Retrieves information from multiple international data sources and integrates and harmonizes data for glycoconjugates and carbohydrates. The web portal provides an easy starting point for users to search for information regarding protein glycosylation, glycan occurrence, glycosylation in diseases etc. | https://www.glygen.org/ |
Carbohydrate Structure Database (CSDB) | Curated structural, bibliographic, taxonomical, NMR and other data on carbohydrates from prokaryotes, plants, and fungi. | http://csdb.glycoscience.ru |
GlyTouCan | GlyTouCan is an international repository that assigns unique accession numbers to glycan structure. [14] | https://glytoucan.org/ |
Glycomics is the comprehensive study of glycomes, including genetic, physiologic, pathologic, and other aspects. Glycomics "is the systematic study of all glycan structures of a given cell type or organism" and is a subset of glycobiology. The term glycomics is derived from the chemical prefix for sweetness or a sugar, "glyco-", and was formed to follow the omics naming convention established by genomics and proteomics.
Glycoproteins are proteins which contain oligosaccharide chains covalently attached to amino acid side-chains. The carbohydrate is attached to the protein in a cotranslational or posttranslational modification. This process is known as glycosylation. Secreted extracellular proteins are often glycosylated.
The glycome is the entire complement of sugars, whether free or present in more complex molecules, of an organism. An alternative definition is the entirety of carbohydrates in a cell. The glycome may in fact be one of the most complex entities in nature. "Glycomics, analogous to genomics and proteomics, is the systematic study of all glycan structures of a given cell type or organism" and is a subset of glycobiology.
The Consortium for Functional Glycomics (CFG) is a large research initiative funded in 2001 by a glue grant from the National Institute of General Medical Sciences (NIGMS) to “define paradigms by which protein-carbohydrate interactions mediate cell communication”. To achieve this goal, the CFG studies the functions of:
Defined in the narrowest sense, glycobiology is the study of the structure, biosynthesis, and biology of saccharides that are widely distributed in nature. Sugars or saccharides are essential components of all living things and aspects of the various roles they play in biology are researched in various medical, biochemical and biotechnological fields.
The terms glycans and polysaccharides are defined by IUPAC as synonyms meaning "compounds consisting of a large number of monosaccharides linked glycosidically". However, in practice the term glycan may also be used to refer to the carbohydrate portion of a glycoconjugate, such as a glycoprotein, glycolipid, or a proteoglycan, even if the carbohydrate is only an oligosaccharide. Glycans usually consist solely of O-glycosidic linkages of monosaccharides. For example, cellulose is a glycan composed of β-1,4-linked D-glucose, and chitin is a glycan composed of β-1,4-linked N-acetyl-D-glucosamine. Glycans can be homo- or heteropolymers of monosaccharide residues, and can be linear or branched.
Glycoproteomics is a branch of proteomics that identifies, catalogs, and characterizes proteins containing carbohydrates as a result of post-translational modifications. Glycosylation is the most common post-translational modification of proteins, but continues to be the least studied on the proteome level. Mass spectrometry (MS) is an analytical technique used to improve the study of these proteins on the proteome level. Glycosylation contributes to several concerted biological mechanisms essential to maintaining physiological function. The study of the glycosylation of proteins is important to understanding certain diseases, like cancer, because a connection between a change in glycosylation and these diseases has been discovered. To study this post-translational modification of proteins, advanced mass spectrometry techniques based on glycoproteomics have been developed to help in terms of therapeutic applications and the discovery of biomarkers.
Richard D. Cummings is an American biochemist who is the S. Daniel Abraham Professor of Surgery at Beth Israel Deaconess Medical Center and Harvard Medical School in Boston, Massachusetts. He also the chief of the division of surgical sciences within the department of surgery. He is the director of the Harvard Medical School Center for Glycoscience, director of the National Center for Functional Glycomics, and also founder of the Glycomics Core at BIDMC. As of 2018 Cummings is also the scientific director of the Feihi Nutrition Laboratory at BIDMC. Before moving to BIDMC/HMS, Cummings was the William Patterson Timmie Professor and chair of the department of biochemistry at Emory University School of Medicine in Atlanta, Georgia from 2006 to 2015. At Emory, Cummings was a founder in 2007 of the Emory Glycomics Center.
Anne Dell is an Australian biochemist specialising in the study of glycomics and the carbohydrate structures that modify proteins. Anne's work could be used to figure out how pathogens such as HIV are able to evade termination by the immune system which could be applied toward understanding how this occurs in fetuses. Her research has also led to the development of higher sensitivity mass spectroscopy techniques which have allowed for the better studying of the structure of carbohydrates. Anne also established GlycoTRIC at Imperial College London, a research center that allows for glycobiology to be better understood in biomedical applications. She is currently Professor of Carbohydrate Biochemistry and Head of the Department of Life Sciences at Imperial College London. Dell's other contributions to the study of Glycobiology are the additions she has made to the textbook "Essentials of Glycobiology" Dell was appointed Commander of the Order of the British Empire (CBE) in the 2009 Birthday Honours.
Translational glycobiology or applied glycobiology is the branch of glycobiology and glycochemistry that focuses on developing new pharmaceuticals through glycomics and glycoengineering. Although research in this field presents many difficulties, translational glycobiology presents applications with therapeutic glycoconjugates, with treating various bone diseases, and developing therapeutic cancer vaccines and other targeted therapies. Some mechanisms of action include using the glycan for drug targeting, engineering protein glycosylation for better efficacy, and glycans as drugs themselves.
UniCarb-DB is a structural and mass spectrometric database used in glycomics. UniCarb-DB provides over 1000 LC-MS/MS spectra for N- and O-linked glycans released from glycoproteins that were manually annotated. Each entry contains reference to published work, information about structure, GlyToucan Accession Number, MS/MS fragmentation with complete peak lists, biological contexts and experimental metadata. The database was created by a collaboration between University of Gothenburg and Macquarie University and since November 2016 is hosted by Swiss Institute for Bioinformatics. The database is the first to implement the Minimum Information standard MIRAGE for submission of glycomic MS/MS data into the database.
Carbohydrate Structure Database (CSDB) is a free curated database and service platform in glycoinformatics, launched in 2005 by a group of Russian scientists from N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences. CSDB stores published structural, taxonomical, bibliographic and NMR-spectroscopic data on natural carbohydrates and carbohydrate-related molecules.
In biochemistry, paucimannosylation is an enzymatic post-translational modification involving the attachment of relatively simple mannose (Man) and N-Acetylglucosamine (GlcNAc) containing carbohydrates (glycans) to proteins. The paucimannosidic glycans may also be modified with other types of monosaccharides including fucose (Fuc) and xylose (Xyl) depending on the species, tissue and cell origin.
The Minimum Information Required About a Glycomics Experiment (MIRAGE) initiative is part of the Minimum Information Standards and specifically applies to guidelines for reporting on a glycomics experiment. The initiative is supported by the Beilstein Institute for the Advancement of Chemical Sciences. The MIRAGE project focuses on the development of publication guidelines for interaction and structural glycomics data as well as the development of data exchange formats. The project was launched in 2011 in Seattle and set off with the description of the aims of the MIRAGE project.
Glycan arrays, like that offered by the Consortium for Functional Glycomics (CFG), National Center for Functional Glycomics (NCFG) and Z Biotech, LLC, contain carbohydrate compounds that can be screened with lectins, antibodies or cell receptors to define carbohydrate specificity and identify ligands. Glycan array screening works in much the same way as other microarray that is used for instance to study gene expression DNA microarrays or protein interaction Protein microarrays.
The Symbol Nomenclature For Glycans (SNFG) is a community-curated standard for the depiction of simple monosaccharides and complex carbohydrates (glycans) using various colored-coded, geometric shapes, along with defined text additions. It is hosted by the National Center for Biotechnology Information at the NCBI-Glycans Page. It is curated by an international groups of researchers in the field that are collectively called the SNFG Discussion Group. The overall goal of the SNFG is to:
Charles E. Warren was an assistant professor of biochemistry and molecular biology at the University of New Hampshire.
Glycan-Protein interactions represent a class of biomolecular interactions that occur between free or protein-bound glycans and their cognate binding partners. Intramolecular glycan-protein (protein-glycan) interactions occur between glycans and proteins that they are covalently attached to. Together with protein-protein interactions, they form a mechanistic basis for many essential cell processes, especially for cell-cell interactions and host-cell interactions. For instance, SARS-CoV-2, the causative agent of COVID-19, employs its extensively glycosylated spike (S) protein to bind to the ACE2 receptor, allowing it to enter host cells. The spike protein is a trimeric structure, with each subunit containing 22 N-glycosylation sites, making it an attractive target for vaccine search.
Glycan nomenclature is the systematic naming of glycans, which are carbohydrate-based polymers made by all living organisms. In general glycans can be represented in: (i) text formats: These includes commonly used CarbBank, IUPAC name, and several other types and (ii) symbol formats: These are consisting of Symbol Nomenclature For Glycans and Oxford Notations.
Nicki Packer FRSC is a distinguished professor of glycoproteomics in the School of Natural Sciences at Macquarie University and principal research leader at Griffith University's Institute for Glycomics. Packer is a Fellow of the Royal Society of Chemistry and in 2021 received the Distinguished Achievement in Proteomic Sciences Award from the Human Proteome Organization. Her research focuses on biological functional of glycoconjugates by linking glycomics with proteomics and bioinformatics.