Metabolic gene clusters or biosynthetic gene clusters are tightly linked sets of mostly non-homologous genes participating in a common, discrete metabolic pathway. The genes are in physical vicinity to each other on the genome, and their expression is often coregulated. [1] [2] [3] Metabolic gene clusters are common features of bacterial [4] and most fungal [5] genomes. They are less often found in other [6] organisms. They are most widely known for producing secondary metabolites, the source or basis of most pharmaceutical compounds, natural toxins, chemical communication, and chemical warfare between organisms. Metabolic gene clusters are also involved in nutrient acquisition, toxin degradation, [7] antimicrobial resistance, and vitamin biosynthesis. [5] Given all these properties of metabolic gene clusters, they play a key role in shaping microbial ecosystems, including microbiome-host interactions. Thus several computational genomics tools have been developed to predict metabolic gene clusters.
MIBiG, BiG-FAM
Bioinformatic tools have been developed to predict, and determine the abundance and expression of, this kind of gene cluster in microbiome samples, from metagenomic data. [8] Since the size of metagenomic data is considerable, filtering and clusterization thereof are important parts of these tools. These processes can consist of dimensionality -reduction techniques, such as Minhash, [9] and clusterization algorithms such as k-medoids and affinity propagation. Also several metrics and similarities have been developed to compare them.
Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). BiG-SLiCE (Biosynthetic Genes Super-Linear Clustering Engine), a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion.
Satria et al., 2021 [10] across BiG-SLiCE demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential, opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. [10]
The origin and evolution of metabolic gene clusters have been debated since the 1990s. [11] [12] It has since been demonstrated that metabolic gene clusters can arise in a genome by genome rearrangement, gene duplication, or horizontal gene transfer, [13] and some metabolic clusters have evolved convergently in multiple species. [14] Horizontal gene cluster transfer has been linked to ecological niches in which the encoded pathways are thought to provide a benefit. [15] It has been argued that clustering of genes for ecological functions results from reproductive trends among organisms, and goes on to contribute to accelerated adaptation by increasing refinement of complex functions in the pangenome of a population. [16]
In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splicing to create monocistronic mRNAs that are translated separately, i.e. several strands of mRNA that each encode a single gene product. The result of this is that the genes contained in the operon are either expressed together or not at all. Several genes must be co-transcribed to define an operon.
Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.
In genetics, a fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplasia. The identification of these fusion genes play a prominent role in being a diagnostic and prognostic marker.
Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
Cytochrome P450 3A5 is a protein that in humans is encoded by the CYP3A5 gene.
Rhodobacter sphaeroides is a kind of purple bacterium; a group of bacteria that can obtain energy through photosynthesis. Its best growth conditions are anaerobic phototrophy and aerobic chemoheterotrophy in the absence of light. R. sphaeroides is also able to fix nitrogen. It is remarkably metabolically diverse, as it is able to grow heterotrophically via fermentation and aerobic and anaerobic respiration. Such a metabolic versatility has motivated the investigation of R. sphaeroides as a microbial cell factory for biotechnological applications.
The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.
Microbiota are the range of microorganisms that may be commensal, mutualistic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found to be crucial for immunologic, hormonal, and metabolic homeostasis of their host.
Nannochloropsis is a genus of algae comprising six known species. The genus in the current taxonomic classification was first termed by Hibberd (1981). The species have mostly been known from the marine environment but also occur in fresh and brackish water. All of the species are small, nonmotile spheres which do not express any distinct morphological features that can be distinguished by either light or electron microscopy. The characterisation is mostly done by rbcL gene and 18S rRNA sequence analysis.
Podospora anserina is a filamentous ascomycete fungus from the order Sordariales. It is considered a model organism for the study of molecular biology of senescence (aging), prions, sexual reproduction, and meiotic drive. It has an obligate sexual and pseudohomothallic life cycle. It is a non-pathogenic coprophilous fungus that colonizes the dung of herbivorous animals such as horses, rabbits, cows and sheep.
Penicillium rubens is a species of fungus in the genus Penicillium and was the first species known to produce the antibiotic penicillin. It was first described by Philibert Melchior Joseph Ehi Biourge in 1923. For the discovery of penicillin from this species Alexander Fleming shared the Nobel Prize in Physiology or Medicine in 1945. The original penicillin-producing type has been variously identified as Penicillium rubrum, P. notatum, and P. chrysogenum among others, but genomic comparison and phylogenetic analysis in 2011 resolved that it is P. rubens. It is the best source of penicillins and produces benzylpenicillin (G), phenoxymethylpenicillin (V) and octanoylpenicillin (K). It also produces other important bioactive compounds such as andrastin, chrysogine, fungisporin, roquefortine, and sorbicillins.
The biosynthesis of benzoxazinone, a cyclic hydroxamate and a natural insecticide, has been well-characterized in maize and related grass species. In maize, genes in the pathway are named using the symbol bx. Maize Bx-genes are tightly linked, a feature that has been considered uncommon for plant genes of a biosynthetic pathways. Especially notable are genes encoding the different enzymatic functions BX1, BX2 and BX8 and which are found within about 50 kilobases. Results from wheat and rye indicate that the cluster is an ancient feature. In wheat the cluster is split into two parts. The wheat genes Bx1 and Bx2 are located in close proximity on chromosome 4 and wheat Bx3, Bx4 and Bx5 map to the short arm of chromosome 5; an additional Bx3 copy was detected on the long arm of chromosome 5B. Recently, additional biosynthetic clusters have been detected in other plants for other biosynthetic pathways and this organization might be common in plants.
Lactocillin is a thiopeptide antibiotic which is encoded for and produced by biosynthetic genes clusters in the bacteria Lactobacillus gasseri. Lactocillin was discovered and purified in 2014. Lactobacillus gasseri is one of the four Lactobacillus bacteria found to be most common in the human vaginal microbiome. Due to increasing levels of pathogenic resistance to known antibiotics, novel antibiotics are increasingly valuable. Lactocillin could function as a new antibiotic that could help people fight off infections that are resistant to many other antibiotics.
Michael Andrew Fischbach is an American chemist, microbiologist, and geneticist. He is an associate professor of Bioengineering and ChEM-H Faculty Fellow at Stanford University and a Chan Zuckerberg Biohub Investigator.
Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.
Plant–fungus horizontal gene transfer is the movement of genetic material between individuals in the plant and fungus kingdoms. Horizontal gene transfer is universal in fungi, viruses, bacteria, and other eukaryotes. Horizontal gene transfer research often focuses on prokaryotes because of the abundant sequence data from diverse lineages, and because it is assumed not to play a significant role in eukaryotes.
SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.
Eriko Takano is a professor of synthetic biology and a director of the Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) at the University of Manchester. She develops antibiotics and other high-value chemicals using microbial synthetic biology tools.
Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.