Genome mining

Last updated
Genome mining is associated with bioinformatics investigations. Bioinformatics.png
Genome mining is associated with bioinformatics investigations.

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. [1] It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data (represented by DNA sequences and annotations) accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, [2] [3] such as discovering novel natural products. [4]

Contents

History

In the mid- to late 1980s, researchers have increasingly focused on genetic studies with the advancing sequencing technologies. [5] The GenBank database was established in 1982 for the collection, management, storage, and distribution of DNA sequence data due to the increasing availability of DNA sequences. With the increasing number of genetic data, biotechnological companies have been able to use human DNA sequence to develop protein and antibody drugs through genome mining since 1992. [6] In the late 1990s, many companies, such as Amgen, Immunec, Genentech were able to develop drugs that progressed to the clinical stage by adopting genome mining. [7] Since the Human Genome Project was completed in the early 2000, researchers have been sequencing the genomes of many microorganisms. [8] Subsequently, many of these genomes have been carefully studied to identify new genes and biosynthetic pathways. [9]

Algorithms

As large quantities of genomic sequence data began to accumulate in public databases, genetic algorithms became important to decipher the enormous collection of genomic data. They are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. [10] The followings are commonly used genetic algorithms:


Applications

Genome mining applies on the discovery of natural product by facilitating the characterization of novel molecules and biosynthetic pathways. [4] [18]

Natural product discovery

The production of natural products is regulated by the biosynthetic gene clusters (BGCs) encoded in the microorganism. [19] By adopting genome mining, the BGCs that produce the target natural product can be predicted. [20] Some important enzymes responsible for the formation of natural products are polyketide synthases (PKS), non-ribosomal peptide synthases (NRPS), ribosomally and post-translationally modified peptides (RiPPs), and terpenoids, and many more. [21] Mining for enzymes, researchers can figure out the classes that BGCs encode and compare target gene clusters to known gene clusters. [22] To verify the relation between the BGCs and natural products, the target BGCs can be expressed by suitable host through the use of molecular cloning. [23]

Databases and tools

Genetic data has been accumulated in databases. Researchers are able to utilize algorithms to decipher the data accessible from databases for the discovery of new processes, targets, and products. [10] The following are databases and tools:

References

  1. Albarano L, Esposito R, Ruocco N, Costantini M (April 2020). "Genome Mining as New Challenge in Natural Products Discovery". Marine Drugs. 18 (4): 199. doi: 10.3390/md18040199 . PMC   7230286 . PMID   32283638.
  2. Hannigan GD, Prihoda D, Palicka A, Soukup J, Klempir O, Rampula L, et al. (October 2019). "A deep learning genome-mining strategy for biosynthetic gene cluster prediction". Nucleic Acids Research. 47 (18): e110. doi:10.1093/nar/gkz654. PMC   6765103 . PMID   31400112.
  3. Lee N, Hwang S, Kim J, Cho S, Palsson B, Cho BK (2020-01-01). "Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in Streptomyces". Computational and Structural Biotechnology Journal. 18: 1548–1556. doi:10.1016/j.csbj.2020.06.024. PMC   7327026 . PMID   32637051.
  4. 1 2 Challis GL (May 2008). "Genome mining for novel natural product discovery". Journal of Medicinal Chemistry. 51 (9): 2618–2628. doi:10.1021/jm700948z. PMID   18393407.
  5. Bains W, Smith GC (December 1988). "A novel method for nucleic acid sequence determination". Journal of Theoretical Biology. 135 (3): 303–307. Bibcode:1988JThBi.135..303B. doi:10.1016/S0022-5193(88)80246-7. PMID   3256722.
  6. Cook-Deegan R, Heaney C (2010-09-01). "Patents in genomics and human genetics". Annual Review of Genomics and Human Genetics. 11 (1): 383–425. doi:10.1146/annurev-genom-082509-141811. PMC   2935940 . PMID   20590431.
  7. Ziemert N, Alanjary M, Weber T (August 2016). "The evolution of genome mining in microbes - a review". Natural Product Reports. 33 (8): 988–1005. doi: 10.1039/C6NP00025H . PMID   27272205.
  8. Omura S, Ikeda H, Ishikawa J, Hanamoto A, Takahashi C, Shinose M, et al. (October 2001). "Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites". Proceedings of the National Academy of Sciences of the United States of America. 98 (21): 12215–12220. Bibcode:2001PNAS...9812215O. doi: 10.1073/pnas.211433198 . PMC   59794 . PMID   11572948.
  9. Tang X, Li J, Millán-Aguiñaga N, Zhang JJ, O'Neill EC, Ugalde JA, et al. (December 2015). "Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining". ACS Chemical Biology. 10 (12): 2841–2849. doi:10.1021/acschembio.5b00658. PMC   4758359 . PMID   26458099.
  10. 1 2 Brandon MC, Wallace DC, Baldi P (July 2009). "Data structures and compression algorithms for genomic sequence data". Bioinformatics. 25 (14): 1731–1738. doi:10.1093/bioinformatics/btp319. PMC   2705231 . PMID   19447783.
  11. 1 2 "AntiSMASH-DB".
  12. Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, et al. (July 2011). "antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences". Nucleic Acids Research. 39 (Web Server issue): W339 –W346. doi:10.1093/nar/gkr466. PMC   3125804 . PMID   21672958.
  13. Navarro-Muñoz, Jorge C.; Selem-Mojica, Nelly; Mullowney, Michael W.; Kautsar, Satria A.; Tryon, James H.; Parkinson, Elizabeth I.; De Los Santos, Emmanuel L. C.; Yeong, Marley; Cruz-Morales, Pablo; Abubucker, Sahar; Roeters, Arne; Lokhorst, Wouter; Fernandez-Guerra, Antonio; Cappelini, Luciana Teresa Dias; Goering, Anthony W. (January 2020). "A computational framework to explore large-scale biosynthetic diversity". Nature Chemical Biology. 16 (1): 60–68. doi:10.1038/s41589-019-0400-9. ISSN   1552-4469. PMC   6917865 . PMID   31768033.
  14. "PRISM". Adapsyn Bioscience.
  15. Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, MacLellan RJ, et al. (November 2020). "Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences". Nature Communications. 11 (1): 6058. Bibcode:2020NatCo..11.6058S. doi:10.1038/s41467-020-19986-1. PMC   7699628 . PMID   33247171.
  16. King RD, Wise PH, Clare A (May 2004). "Confirmation of data mining based predictions of protein function". Bioinformatics. 20 (7): 1110–1118. doi: 10.1093/bioinformatics/bth047 . PMID   14764546.
  17. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID   2231712.
  18. Medema MH, de Rond T, Moore BS (September 2021). "Mining genomes to illuminate the specialized chemistry of life". Nature Reviews. Genetics. 22 (9): 553–571. doi:10.1038/s41576-021-00363-7. PMC   8364890 . PMID   34083778.
  19. Rutledge PJ, Challis GL (August 2015). "Discovery of microbial natural products by activation of silent biosynthetic gene clusters". Nature Reviews. Microbiology. 13 (8): 509–523. doi:10.1038/nrmicro3496. PMID   26119570. S2CID   6474118.
  20. Belknap KC, Park CJ, Barth BM, Andam CP (February 2020). "Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria". Scientific Reports. 10 (1): 2003. Bibcode:2020NatSR..10.2003B. doi:10.1038/s41598-020-58904-9. PMC   7005152 . PMID   32029878.
  21. Hoffmeister D, Keller NP (April 2007). "Natural products of filamentous fungi: enzymes, genes, and their regulation". Natural Product Reports. 24 (2): 393–416. doi:10.1039/B603084J. PMID   17390002.
  22. Micallef ML, D'Agostino PM, Sharma D, Viswanathan R, Moffitt MC (September 2015). "Genome mining for natural product biosynthetic gene clusters in the Subsection V cyanobacteria". BMC Genomics. 16 (1): 669. doi: 10.1186/s12864-015-1855-z . PMC   4558948 . PMID   26335778.
  23. Gomez-Escribano JP, Bibb MJ (February 2014). "Heterologous expression of natural product biosynthetic gene clusters in Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways". Journal of Industrial Microbiology & Biotechnology. 41 (2): 425–431. doi:10.1007/s10295-013-1348-5. PMID   24096958. S2CID   15215660.
  24. Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I (January 2021). "GenBank". Nucleic Acids Research. 49 (D1): D92 –D96. doi:10.1093/nar/gkaa1023. PMC   7778897 . PMID   33196830.
  25. "IMG-ABC".
  26. Palaniappan K, Chen IA, Chu K, Ratner A, Seshadri R, Kyrpides NC, et al. (January 2020). "IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase". Nucleic Acids Research. 48 (D1): D422 –D430. doi:10.1093/nar/gkz932. PMC   7145673 . PMID   31665416.
  27. "BIG-FAM".
  28. Kautsar SA, Blin K, Shaw S, Weber T, Medema MH (January 2021). "BiG-FAM: the biosynthetic gene cluster families database". Nucleic Acids Research. 49 (D1): D490 –D497. doi:10.1093/nar/gkaa812. PMC   7778980 . PMID   33010170.
  29. "DoBISCUIT".
  30. Ichikawa N, Sasagawa M, Yamamoto M, Komaki H, Yoshida Y, Yamazaki S, Fujita N (January 2013). "DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters". Nucleic Acids Research. 41 (Database issue): D408 –D414. doi:10.1093/nar/gks1177. PMC   3531092 . PMID   23185043.
  31. "MIBiG".
  32. Kautsar SA, Blin K, Shaw S, Navarro-Muñoz JC, Terlouw BR, van der Hooft JJ, et al. (January 2020). "MIBiG 2.0: a repository for biosynthetic gene clusters of known function". Nucleic Acids Research. 48 (D1): D454 –D458. doi:10.1093/nar/gkz882. PMC   7145714 . PMID   31612915.
  33. "iTOL".
  34. Letunic I, Bork P (July 2016). "Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees". Nucleic Acids Research. 44 (W1): W242 –W245. doi:10.1093/nar/gkw290. PMC   4987883 . PMID   27095192.