Chemistry Development Kit

Last updated
Chemistry Development Kit
Original author(s) Christoph Steinbeck, Egon Willighagen, Dan Gezelter
Developer(s) The CDK Project
Initial release11 May 2001;22 years ago (2001-05-11) [1]
Stable release 2.8 [2] (September 14, 2022;15 months ago (2022-09-14)) [±]
Preview release 2.2 [3] (October 30, 2018;5 years ago (2018-10-30)) [±]
Repository github.com/cdk/cdk
Written in Java
Operating system Windows, Linux, Unix, macOS
Platform IA-32, x86-64
Available inEnglish
Type Chemoinformatics, molecular modelling, bioinformatics
License LGPL 2.0
Website cdk.github.io

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. [4] [5] It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

Contents

History

The CDK was created by Christoph Steinbeck, Egon Willighagen and Dan Gezelter, then developers of Jmol and JChemPaint, to provide a common code base, on 27–29 September 2000 at the University of Notre Dame. The first source code release was made on 11 May 2011. [6] Since then more than 100 people have contributed to the project, [7] leading to a rich set of functions, as given below. Between 2004 and 2007, CDK News was the project's newsletter of which all articles are available from a public archive. [8] Due to an unsteady rate of contributions, the newsletter was put on hold.

Library

The CDK is a library, instead of a user program. However, it has been integrated into various environments to make its functions available. CDK is currently used in several applications, including the programming language R, [13] CDK-Taverna (a Taverna workbench plugin), [14] Bioclipse, PaDEL, [15] and Cinfony. [16] Also, CDK extensions exist for Konstanz Information Miner (KNIME) [17] and for Excel, called LICSS (). [18]

In 2008, bits of GPL-licensed code were removed from the library. While those code bits were independent from the main CDK library, and no copylefting was involved, to reduce confusions among users, the ChemoJava project was instantiated. [19]

Major features

Chemoinformatics

Bioinformatics

General

See also

Related Research Articles

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

Chemical Markup Language is an approach to managing molecular information using tools such as XML and Java. It was the first domain specific implementation based strictly on XML, first based on a DTD and later on an XML Schema, the most robust and widely used system for precise information management in many areas. It has been developed over more than a decade by Murray-Rust, Rzepa and others and has been tested in many areas and on a variety of machines.

The International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.

<span class="mw-page-title-main">Open Babel</span>

Open Babel is computer software, a chemical expert system mainly used to interconvert chemical file formats.

<span class="mw-page-title-main">Jmol</span> Open-source Java viewer for 3D chemical structures

Jmol is computer software for molecular modelling chemical structures in 3-dimensions. Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g., in chemistry and biochemistry. It is written in the programming language Java, so it can run on the operating systems Windows, macOS, Linux, and Unix, if Java is installed. It is free and open-source software released under a GNU Lesser General Public License (LGPL) version 2.0. A standalone application and a software development kit (SDK) exist that can be integrated into other Java applications, such as Bioclipse and Taverna.

<span class="mw-page-title-main">JOELib</span>

JOELib is computer software, a chemical expert system used mainly to interconvert chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix and other operating systems supporting the programming language Java. It is free and open-source software distributed under the GNU General Public License (GPL) 2.0.

<span class="mw-page-title-main">Henry Rzepa</span>

Henry Stephen Rzepa is a chemist and Emeritus Professor of Computational chemistry at Imperial College London.

<span class="mw-page-title-main">Peter Murray-Rust</span> Chemist and open-access research activist

Peter Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Murray-Rust is also known for his support of open access and open data.

The Bioclipse project is a Java-based, open-source, visual platform for chemo- and bioinformatics based on the Eclipse Rich Client Platform (RCP).

<span class="mw-page-title-main">Apache Taverna</span>

Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.

<span class="mw-page-title-main">JME Molecule Editor</span> Molecule editor Java applet

The JME Molecule Editor is a molecule editor Java applet with which users make and edit drawings of molecules and reactions, and can display molecules within an HTML page. The editor can generate Daylight simplified molecular-input line-entry system (SMILES) or MDL Molfiles of the created structures.

<span class="mw-page-title-main">JChemPaint</span>

JChemPaint is computer software, a molecule editor and file viewer for chemical structures using 2D computer graphics. It is free and open-source software, released under a GNU Lesser General Public License (LGPL). It is written in Java and so can run on the operating systems Windows, macOS, Linux, and Unix. There is a standalone application (editor), and two varieties of applet that can be integrated into web pages.

<span class="mw-page-title-main">Blue Obelisk</span>

Blue Obelisk is an informal group of chemists who promote open data, open source, and open standards; it was initiated by Peter Murray-Rust and others in 2005. Multiple open source cheminformatics projects associate themselves with the Blue Obelisk, among which, in alphabetical order, Avogadro, Bioclipse, cclib, Chemistry Development Kit, GaussSum, JChemPaint, JOELib, Kalzium, Openbabel, OpenSMILES, and UsefulChem.

<i>Journal of Cheminformatics</i> Academic journal

The Journal of Cheminformatics is a peer-reviewed open access scientific journal that covers cheminformatics and molecular modelling. It was established in 2009 with David Wild and Christoph Steinbeck as founding editors-in-chief, and was originally published by Chemistry Central. At the end of 2015, the Chemistry Central brand was retired and its titles, including Journal of Cheminformatics, were merged with the SpringerOpen portfolio of open access journals.

<span class="mw-page-title-main">Christoph Steinbeck</span> German chemist (born 1966)

Christoph Steinbeck is a German chemist and has a professorship for analytical chemistry, cheminformatics and chemometrics at the Friedrich-Schiller-Universität Jena in Thuringia.

<span class="mw-page-title-main">OnlineHPC</span>

The OnlineHPC was a free public web service that supplied tools to deal with high performance computers and online workflow editor. OnlineHPC allowed users to design and execute workflows using the online workflow designer and to work with high performance computers – clusters and clouds. Access to high performance resources was available as directly from the service user interface, as from workflow components. The workflow engine of the OnlineHPC service was Taverna as traditionally used for scientific workflow execution in such domains, as bioinformatics, cheminformatics, medicine, astronomy, social science, music, and digital preservation.

Pharmaceutical bioinformatics is a research field related to bioinformatics but with the focus on studying biological and chemical processes in the pharmaceutical area; to understand how xenobiotics interact with the human body and the drug discovery process.

A chemical graph generator is a software package to generate computer representations of chemical structures adhering to certain boundary conditions. The development of such software packages is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.

References

  1. "The Chemistry Development Kit - Browse /OldFiles at SourceForge.net".
  2. "cdk/cdk: CDK 2.8". ZENODO. 2022-09-14. doi:10.5281/zenodo.7079512.
  3. Mayfield, John; Willighagen, Egon; Ujihara, Kazuya; Rahman, Syed Asad; Alvarsson, Jonathan; Gražulis, Saulius; Szisz, Daniel; Williamson, Mark J.; Kochev, Nikolay; Jeliazkova, Nina; Bach, Eric; Berg, Arvid; Clark, Alex; Stephan, Ralf; Wenk, Michael; Stueker, Oliver; Jönsson, Klas; Burgoon, Lyle; Katsubo, Dmitry; Köhler, Uli; Harmon, Cyrus (30 October 2018). "Cdk/Cdk: Cdk 2.2". Zenodo. doi:10.5281/zenodo.1474247.
  4. Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. L. (2003). "The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics". Journal of Chemical Information and Computer Sciences. 43 (2): 493–500. doi:10.1021/ci025584y. PMC   4901983 . PMID   12653513.
  5. Willighagen, Egon L.; Mayfield, John W.; Alvarsson, Jonathan; Berg, Arvid; Carlsson, Lars; Jeliazkova, Nina; Kuhn, Stefan; Pluskal, Tomáš; Rojas-Chertó, Miquel (2017-06-06). "The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching". Journal of Cheminformatics. 9 (1): 33. doi: 10.1186/s13321-017-0220-4 . ISSN   1758-2946. PMC   5461230 . PMID   29086040.
  6. "The Chemistry Development Kit - Browse /OldFiles at SourceForge.net".
  7. "The Chemistry Development Kit (CDK)". GitHub . 12 October 2021.
  8. "The Chemistry Development Kit - Browse /CDK News at SourceForge.net".
  9. "CDK 1.5.x Nightly Build - 2013-05-10 (21:21) [Commit 2abcb5d61304e58d55ea26a23ebd0d375deea36d]". Archived from the original on 2013-05-24. Retrieved 2013-08-05.
  10. "Home". jni-inchi.sourceforge.net.
  11. Spjuth, O.; Berg, A.; Adams, S.; Willighagen, E. L. (2013). "Applications of the InChI in cheminformatics with the CDK and Bioclipse". Journal of Cheminformatics. 5 (1): 14. doi: 10.1186/1758-2946-5-14 . PMC   3674901 . PMID   23497723.
  12. "John May is now release manager of CDK 1.5.x".
  13. Guha, R. (2007). "Chemical informatics functionality in R". Journal of Statistical Software. 18 (5): 1–16. doi: 10.18637/jss.v018.i05 .
  14. Kuhn, T.; Willighagen, E. L.; Zielesny, A.; Steinbeck, C. (2010). "CDK-Taverna: an open workflow environment for cheminformatics". BMC Bioinformatics. 11: 159. doi: 10.1186/1471-2105-11-159 . PMC   2862046 . PMID   20346188.
  15. Yap, C. W. (2011). "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints". Journal of Computational Chemistry. 32 (7): 1466–74. doi: 10.1002/jcc.21707 . PMID   21425294. S2CID   206032727.
  16. O'Boyle, Noel M (2008). "Cinfony – combining Open Source cheminformatics toolkits behind a common interface". Chemistry Central Journal. 2 (1): 24. doi: 10.1186/1752-153X-2-24 . PMC   2646723 . PMID   19055766.
  17. Beisken, S.; Meinl, T.; Wiswedel, B.; De Figueiredo, L. F.; Berthold, M.; Steinbeck, C. (2013). "KNIME-CDK: Workflow-driven Cheminformatics". BMC Bioinformatics. 14: 257. doi: 10.1186/1471-2105-14-257 . PMC   3765822 . PMID   24103053.
  18. Lawson, K. R.; Lawson, J. (2012). "LICSS - a chemical spreadsheet in microsoft excel". Journal of Cheminformatics. 4 (1): 3. doi: 10.1186/1758-2946-4-3 . PMC   3310842 . PMID   22301088.
  19. ChemoJava
  20. Berger, Franziska; Flamm, Christoph; Gleiss, Petra M.; Leydold, Josef; Stadler, Peter F. (March 2004). "Counterexamples in Chemical Ring Perception". Journal of Chemical Information and Computer Sciences. 44 (2): 323–331. doi:10.1021/ci030405d. PMID   15032507.
  21. May, John W; Steinbeck, Christoph (2014). "Efficient ring perception for the Chemistry Development Kit". Journal of Cheminformatics. 6 (1): 3. doi: 10.1186/1758-2946-6-3 . PMC   3922685 . PMID   24479757.
  22. Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. L. (2006). "Recent developments of the chemistry development kit (CDK) — an open-source java library for chemo- and bioinformatics". Curr. Pharm. Des. 12 (17): 2111–20. doi:10.2174/138161206777585274. hdl: 2066/35445 . PMID   16796559. Archived from the original on 2011-07-25.
    Guangli, M.; Yiyu, C. (2006). "Predicting Caco-2 permeability using support vector machine and chemistry development kit". J Pharm Pharm Sci. 9 (2): 210–21. PMID   16959190.
  23. Clark, Alex M; Sarker, Malabika; Ekins, Sean (2014). "New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0". Journal of Cheminformatics. 6: 38. doi: 10.1186/s13321-014-0038-2 . PMC   4190048 . PMID   25302078.
  24. Peironcely, J. E.; Rojas-Chertó, M.; Fichera, D.; Reijmers, T.; Coulier, L.; Faulon, J. L.; Hankemeier, T. (2012). "OMG: Open molecule generator". Journal of Cheminformatics. 4 (1): 21. doi: 10.1186/1758-2946-4-21 . PMC   3558358 . PMID   22985496.
  25. Bashton, M.; Nobeli, I.; Thornton, J. M. (2006). "Cognate Ligand Domain Mapping for Enzymes". Journal of Molecular Biology. 364 (4): 836–52. doi: 10.1016/j.jmb.2006.09.041 . PMID   17034815.
  26. Rojas-Cherto, M.; Kasper, P. T.; Willighagen, E. L.; Vreeken, R. J.; Hankemeier, T.; Reijmers, T. H. (2011). "Elemental composition determination based on MSn". Bioinformatics. 27 (17): 2376–2383. doi: 10.1093/bioinformatics/btr409 . PMID   21757467.
  27. Ruiz-Blanco, Yasser B; Paz, Waldo; Green, James; Marrero-Ponce, Yovani (2015). "ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins". BMC Bioinformatics. 16: 162. doi: 10.1186/s12859-015-0586-0 . PMC   4432771 . PMID   25982853.