Metabolic gene cluster

Last updated

Metabolic gene clusters or biosynthetic gene clusters are tightly linked sets of mostly non-homologous genes participating in a common, discrete metabolic pathway. The genes are in physical vicinity to each other on the genome, and their expression is often coregulated. [1] [2] [3] Metabolic gene clusters are common features of bacterial [4] and most fungal [5] genomes. They are less often found in other [6] organisms. They are most widely known for producing secondary metabolites, the source or basis of most pharmaceutical compounds, natural toxins, chemical communication, and chemical warfare between organisms. Metabolic gene clusters are also involved in nutrient acquisition, toxin degradation, [7] antimicrobial resistance, and vitamin biosynthesis. [5] Given all these properties of metabolic gene clusters, they play a key role in shaping microbial ecosystems, including microbiome-host interactions. Thus several computational genomics tools have been developed to predict metabolic gene clusters.

Contents

Databases

MIBiG, BiG-FAM

Bioinformatic tools

Tools based on rules

Bioinformatic tools have been developed to predict, and determine the abundance and expression of, this kind of gene cluster in microbiome samples, from metagenomic data. [8] Since the size of metagenomic data is considerable, filtering and clusterization thereof are important parts of these tools. These processes can consist of dimensionality -reduction techniques, such as Minhash, [9] and clusterization algorithms such as k-medoids and affinity propagation. Also several metrics and similarities have been developed to compare them.

Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). BiG-SLiCE (Biosynthetic Genes Super-Linear Clustering Engine), a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion.

Satria et al., 2021 [10] across BiG-SLiCE demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential, opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. [10]

tools based on machine learning

Evolution

The origin and evolution of metabolic gene clusters have been debated since the 1990s. [11] [12] It has since been demonstrated that metabolic gene clusters can arise in a genome by genome rearrangement, gene duplication, or horizontal gene transfer, [13] and some metabolic clusters have evolved convergently in multiple species. [14] Horizontal gene cluster transfer has been linked to ecological niches in which the encoded pathways are thought to provide a benefit. [15] It has been argued that clustering of genes for ecological functions results from reproductive trends among organisms, and goes on to contribute to accelerated adaptation by increasing refinement of complex functions in the pangenome of a population. [16]

Related Research Articles

<span class="mw-page-title-main">Human microbiome</span> Microorganisms in or on human skin and biofluids

The human microbiome is the aggregate of all microbiota that reside on or within human tissues and biofluids along with the corresponding anatomical sites in which they reside, including the gastrointestinal tract, skin, mammary glands, seminal fluid, uterus, ovarian follicles, lung, saliva, oral mucosa, conjunctiva, and the biliary tract. Types of human microbiota include bacteria, archaea, fungi, protists, and viruses. Though micro-animals can also live on the human body, they are typically excluded from this definition. In the context of genomics, the term human microbiome is sometimes used to refer to the collective genomes of resident microorganisms; however, the term human metagenome has the same meaning.

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

<span class="mw-page-title-main">Fusion gene</span>

A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplasia. The identification of these fusion genes play a prominent role in being a diagnostic and prognostic marker.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

<span class="mw-page-title-main">CYP3A5</span> Enzyme involved in drug metabolism

Cytochrome P450 3A5 is a protein that in humans is encoded by the CYP3A5 gene.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

<span class="mw-page-title-main">Microbiota</span> Community of microorganisms

Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.

<i>Nannochloropsis</i> Genus of algae

Nannochloropsis is a genus of algae comprising six known species. The genus in the current taxonomic classification was first termed by Hibberd (1981). The species have mostly been known from the marine environment but also occur in fresh and brackish water. All of the species are small, nonmotile spheres which do not express any distinct morphological features that can be distinguished by either light or electron microscopy. The characterisation is mostly done by rbcL gene and 18S rRNA sequence analysis.

<i>Podospora anserina</i> Species of fungus

Podospora anserina is a filamentous ascomycete fungus from the order Sordariales. It is considered a model organism for the study of molecular biology of senescence (aging), prions, sexual reproduction, and meiotic drive. It has an obligate sexual and pseudohomothallic life cycle. It is a non-pathogenic coprophilous fungus that colonizes the dung of herbivorous animals such as horses, rabbits, cows and sheep.

<i>Penicillium rubens</i> Species of fungus

Penicillium rubens is a species of fungus in the genus Penicillium and was the first species known to produce the antibiotic penicillin. It was first described by Philibert Melchior Joseph Ehi Biourge in 1923. For the discovery of penicillin from this species Alexander Fleming shared the Nobel Prize in Physiology or Medicine in 1945. The original penicillin-producing type has been variously identified as Penicillium rubrum, P. notatum, and P. chrysogenum among others, but genomic comparison and phylogenetic analysis in 2011 resolved that it is P. rubens. It is the best source of penicillins and produces benzylpenicillin (G), phenoxymethylpenicillin (V) and octanoylpenicillin (K). It also produces other important bioactive compounds such as andrastin, chrysogine, fungisporin, roquefortine, and sorbicillins.

<span class="mw-page-title-main">Benzoxazinone biosynthesis</span>

The biosynthesis of benzoxazinone, a cyclic hydroxamate and a natural insecticide, has been well-characterized in maize and related grass species. In maize, genes in the pathway are named using the symbol bx. Maize Bx-genes are tightly linked, a feature that has been considered uncommon for plant genes of a biosynthetic pathways. Especially notable are genes encoding the different enzymatic functions BX1, BX2 and BX8 and which are found within about 50 kilobases. Results from wheat and rye indicate that the cluster is an ancient feature. In wheat the cluster is split into two parts. The wheat genes Bx1 and Bx2 are located in close proximity on chromosome 4 and wheat Bx3, Bx4 and Bx5 map to the short arm of chromosome 5; an additional Bx3 copy was detected on the long arm of chromosome 5B. Recently, additional biosynthetic clusters have been detected in other plants for other biosynthetic pathways and this organization might be common in plants.

<span class="mw-page-title-main">Lactocillin</span> Chemical compound

Lactocillin is a thiopeptide antibiotic which is encoded for and produced by biosynthetic genes clusters in the bacteria Lactobacillus gasseri. Lactocillin was discovered and purified in 2014. Lactobacillus gasseri is one of the four Lactobacillus bacteria found to be most common in the human vaginal microbiome. Due to increasing levels of pathogenic resistance to known antibiotics, novel antibiotics are increasingly valuable. Lactocillin could function as a new antibiotic that could help people fight off infections that are resistant to many other antibiotics.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

Michael Andrew Fischbach is an American chemist, microbiologist, and geneticist. He is an associate professor of Bioengineering and ChEM-H Faculty Fellow at Stanford University and a Chan Zuckerberg Biohub Investigator.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Plant–fungus horizontal gene transfer is the movement of genetic material between individuals in the plant and fungus kingdoms. Horizontal gene transfer is universal in fungi, viruses, bacteria, and other eukaryotes. Horizontal gene transfer research often focuses on prokaryotes because of the abundant sequence data from diverse lineages, and because it is assumed not to play a significant role in eukaryotes.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

<span class="mw-page-title-main">Eriko Takano</span> Japanese-British biologist

Eriko Takano is a professor of synthetic biology and a director of the Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) at the University of Manchester. She develops antibiotics and other high-value chemicals using microbial synthetic biology tools.

<span class="mw-page-title-main">Genome mining</span>

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.

References

  1. Schläpfer P, Zhang P, Wang C, Kim T, Banf M, Chae L, et al. (April 2017). "Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants". Plant Physiology. 173 (4): 2041–2059. doi:10.1104/pp.16.01942. PMC   5373064 . PMID   28228535.
  2. Miller BL, Miller KY, Roberti KA, Timberlake WE (January 1987). "Position-dependent and -independent mechanisms regulate cell-specific expression of the SpoC1 gene cluster of Aspergillus nidulans". Molecular and Cellular Biology. 7 (1): 427–34. doi:10.1128/MCB.7.1.427. PMC   365085 . PMID   3550422.
  3. Banf M, Zhao K, Rhee SY (September 2019). "METACLUSTER-an R package for context-specific expression analysis of metabolic gene clusters". Bioinformatics. 35 (17): 3178–3180. doi:10.1093/bioinformatics/btz021. PMC   6735823 . PMID   30657869.
  4. Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, Mavrommatis K, et al. (July 2014). "Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters". Cell. 158 (2): 412–421. doi:10.1016/j.cell.2014.06.034. PMC   4123684 . PMID   25036635.
  5. 1 2 Slot JC (2017). "Fungal Gene Cluster Diversity and Evolution". Fungal Phylogenetics and Phylogenomics. Advances in Genetics. Vol. 100. pp. 141–178. doi:10.1016/bs.adgen.2017.09.005. ISBN   978-0-12-813261-6. PMID   29153399.
  6. Wisecaver JH, Borowsky AT, Tzin V, Jander G, Kliebenstein DJ, Rokas A (May 2017). "A Global Coexpression Network Approach for Connecting Genes to Specialized Metabolic Pathways in Plants". The Plant Cell. 29 (5): 944–959. doi:10.1105/tpc.17.00009. PMC   5466033 . PMID   28408660.
  7. Gluck-Thaler E, Slot JC (June 2018). "Specialized plant biochemistry drives gene clustering in fungi". The ISME Journal. 12 (7): 1694–1705. Bibcode:2018ISMEJ..12.1694G. doi:10.1038/s41396-018-0075-3. PMC   6018750 . PMID   29463891.
  8. Pascal-Andreu V, Augustijn H, van den Berg K, van der Hooft J, Fischbach M, Medema M (2020). "BiG-MAP: an automated pipeline to profile metabolic gene cluster abundance and expression in microbiomes" (PDF). bioRxiv. 6 (5): e00937-21. doi:10.1101/2020.12.14.422671. PMC   8547482 . PMID   34581602.
  9. Ondov B, Treangen T, Melsted P, Mallonee A, Bergman N, Koren S, et al. (2016). "Mash: fast genome and metagenome distance estimation using MinHash". Genome Biology. 17 (32): 14. doi: 10.1186/s13059-016-0997-x . PMC   4915045 . PMID   27323842.
  10. 1 2 Kautsar SA, van der Hooft JJ, de Ridder D, Medema MH (13 January 2021). "BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters". GigaScience. 10 (1): giaa154. doi: 10.1093/gigascience/giaa154 . PMC   7804863 . PMID   33438731.
  11. Lawrence JG, Roth JR (1996-08-01). "Selfish Operons: Horizontal Transfer May Drive the Evolution of Gene Clusters". Genetics. 143 (4): 1843–1860. doi:10.1093/genetics/143.4.1843. ISSN   0016-6731. PMC   1207444 . PMID   8844169.
  12. Pál C, Hurst LD (2004-06-01). "Evidence against the selfish operon theory". Trends in Genetics. 20 (6): 232–234. doi:10.1016/j.tig.2004.04.001. PMID   15145575.
  13. Reynolds HT, Vijayakumar V, Gluck-Thaler E, Korotkin HB, Matheny PB, Slot JC (2018). "Horizontal gene cluster transfer increased hallucinogenic mushroom diversity". Evolution Letters. 2 (2): 88–101. doi:10.1002/evl3.42. ISSN   2056-3744. PMC   6121855 . PMID   30283667.
  14. Slot JC, Rokas A (2010-06-01). "Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi". Proceedings of the National Academy of Sciences. 107 (22): 10136–10141. Bibcode:2010PNAS..10710136S. doi: 10.1073/pnas.0914418107 . PMC   2890473 . PMID   20479238.
  15. Greene GH, McGary KL, Rokas A, Slot JC (January 2014). "Ecology drives the distribution of specialized tyrosine metabolism modules in fungi". Genome Biology and Evolution. 6 (1): 121–132. doi:10.1093/gbe/evt208. ISSN   1759-6653. PMC   3914699 . PMID   24391152.
  16. Slot JC, Gluck-Thaler E (2019-10-01). "Metabolic gene clusters, fungal diversity, and the generation of accessory functions". Current Opinion in Genetics & Development. 58–59: 17–24. doi: 10.1016/j.gde.2019.07.006 . ISSN   0959-437X. PMID   31466036. S2CID   201674539.