Chemical database

Last updated January 26, 2025

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Types of chemical databases

Bioactivity database

Bioactivity databases correlate structures or other chemical information to bioactivity results taken from bioassays in literature, patents, and screening programs.

Name	Developer(s)	Initial release
ScrubChem	Jason Bret Harris	2016^[1]^[2]
ChEMBL	EMBL-EBI	2009^[3]
Reaxys bioactivity DB	Elsevier	2017
PubChem-BioAssay	NIH	2004^[4]^[5]

Chemical structures

Chemical structures are traditionally represented using lines indicating chemical bonds between atoms and drawn on paper (2D structural formulae). While these are ideal visual representations for the chemist, they are unsuitable for computational use and especially for search and storage. Small molecules (also called ligands in drug design applications), are usually represented using lists of atoms and their connections. Large molecules such as proteins are however more compactly represented using the sequences of their amino acid building blocks. Radioactive isotopes are also represented, which is an important attribute for some applications. Large chemical databases for structures are expected to handle the storage and searching of information on millions of molecules taking terabytes of physical memory.^[6]^[7]

Literature database

Chemical literature databases correlate structures or other chemical information to relevant references such as academic papers or patents. This type of database includes STN, Scifinder, and Reaxys. Links to literature are also included in many databases that focus on chemical characterization.

Crystallographic database

Crystallographic databases store X-ray crystal structure data. Common examples include Protein Data Bank and Cambridge Structural Database.

NMR spectra database

NMR spectra databases correlate chemical structure with NMR data. These databases often include other characterization data such as FTIR and mass spectrometry.

Reactions database

Most chemical databases store information on stable molecules but in databases for reactions also intermediates and temporarily created unstable molecules are stored. Reaction databases contain information about products, educts, and reaction mechanisms.

A popular example that lists chemical reaction data, among others, would be the Beilstein database, Reaxys

Thermophysical database

Thermophysical data are information about

phase equilibria including vapor–liquid equilibrium, solubility of gases in liquids, liquids in solids (SLE), heats of mixing, vaporization, and fusion.
caloric data like heat capacity, heat of formation and combustion,
transport properties like viscosity and thermal conductivity

Chemical structure representation

There are two principal techniques for representing chemical structures in digital databases

As connection tables / adjacency matrices / lists with additional information on bond (edges) and atom attributes (nodes), such as:
MDL Molfile, PDB, CML
As a linear string notation based on depth first or breadth first traversal, such as:
SMILES/SMARTS, SLN, WLN, InChI

These approaches have been refined to allow representation of stereochemical differences and charges as well as special kinds of bonding such as those seen in organo-metallic compounds. The principal advantage of a computer representation is the possibility for increased storage and fast, flexible search.

Search

Substructure

Chemists can search databases using parts of structures, parts of their IUPAC names as well as based on constraints on properties. Chemical databases are different from other general purpose databases in their support for substructure search, a method to retrieve chemicals matching a pattern of atoms and bonds which a user specifies. This kind of search is achieved by looking for subgraph isomorphism (sometimes also called a monomorphism) and is a widely studied application of graph theory.^[8]^[9]^[10]

Query structures may contain bonding patterns such as "single/aromatic" or "any" to provide flexibility. Similarly, the vertices which in an actual compound would be a specific atom may be replaced with an atom list in the query. Cis–trans isomerism at double bonds is catered for by giving a choice of retrieving only the E form, the Z form, or both.^[8]^[11]

Conformation

Search by matching 3D conformation of molecules or by specifying spatial constraints is another feature that is particularly of use in drug design. Searches of this kind can be computationally very expensive. Many approximate methods have been proposed, for instance BCUTS,^[12]^[13]^[14] special function representations, moments of inertia, ray-tracing histograms, maximum distance histograms, shape multipoles to name a few.^[15]^[16]^[17]^[18]^[19]

Examples

Large databases, such as PubChem ^[11]^[20] and ChemSpider,^[21] have graphical interfaces for search. The Chemical Abstracts Service provides tools to search the chemical literature and Reaxys supplied by Elsevier covers both chemicals and reaction information, including that originally held in the Beilstein database.^[22] PATENTSCOPE makes chemical patents accessible by substructure^[23] and Wikipedia's articles describing individual chemicals can also be searched that way.^[24]

Suppliers of chemicals as synthesis intermediates or for high-throughput screening routinely provide search interfaces. Currently, the largest database that can be freely searched by the public is the ZINC database, which is claimed to contain over 37 billion commercially available molecules.^[25]^[26]

Descriptors

All properties of molecules beyond their structure can be split up into either physico-chemical or pharmacological attributes also called descriptors. On top of that, there exist various artificial and more or less standardized naming systems for molecules that supply more or less ambiguous names and synonyms. The IUPAC name is usually a good choice for representing a molecule's structure in a both human-readable and unique string although it becomes unwieldy for larger molecules. Trivial names on the other hand abound with homonyms and synonyms and are therefore a bad choice as a defining database key. While physico-chemical descriptors like molecular weight, (partial) charge, solubility, etc. can mostly be computed directly based on the molecule's structure, pharmacological descriptors can be derived only indirectly using involved multivariate statistics or experimental (screening, bioassay) results. All of those descriptors can for reasons of computational effort be stored along with the molecule's representation and usually are.

Similarity

There is no single definition of molecular similarity, however the concept may be defined according to the application and is often described as an inverse of a measure of distance in descriptor space. Two molecules might be considered more similar for instance if their difference in molecular weights is lower than when compared with others. A variety of other measures could be combined to produce a multi-variate distance measure. Distance measures are often classified into Euclidean measures and non-Euclidean measures depending on whether the triangle inequality holds. Maximum Common Subgraph (MCS) based substructure search ^[27](similarity or distance measure) is also very common. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).^[28]

Chemicals in the databases may be clustered into groups of 'similar' molecules based on similarities. Both hierarchical and non-hierarchical clustering approaches can be applied to chemical entities with multiple attributes. These attributes or molecular properties may either be determined empirically or computationally derived descriptors. One of the most popular clustering approaches is the Jarvis-Patrick algorithm.^[29]

In pharmacologically oriented chemical repositories, similarity is usually defined in terms of the biological effects of compounds (ADME/tox) that can in turn be semiautomatically inferred from similar combinations of physico-chemical descriptors using QSAR methods.

Registration systems

Databases systems for maintaining unique records on chemical compounds are termed as Registration systems. These are often used for chemical indexing, patent systems and industrial databases.

Registration systems usually enforce uniqueness of the chemical represented in the database through the use of unique representations. By applying rules of precedence for the generation of stringified notations, one can obtain unique/'canonical' string representations such as 'canonical SMILES'. Some registration systems such as the CAS system make use of algorithms to generate unique hash codes to achieve the same objective.

A key difference between a registration system and a simple chemical database is the ability to accurately represent that which is known, unknown, and partially known. For example, a chemical database might store a molecule with stereochemistry unspecified, whereas a chemical registry system requires the registrar to specify whether the stereo configuration is unknown, a specific (known) mixture, or racemic. Each of these would be considered a different record in a chemical registry system.

Registration systems also preprocess molecules to avoid considering trivial differences such as differences in halogen ions in chemicals.

An example is the Chemical Abstracts Service (CAS) registration system. See also CAS registry number.

List of chemical cartridges

Accord
Direct ^[30]
J Chem ^[31]
CambridgeSoft ^[32]
Bingo ^[33]
Pinpoint ^[34]

List of chemical registration systems

ChemReg ^[35]
Register^[36]
RegMol ^[37]
Compound-Registration ^[38]
Ensemble ^[39]

Web-based

Name	Developer(s)	Initial release
CDD Vault	Collaborative Drug Discovery	2018^[40]^[41]^[42]
Adroit Repository^[43]	Adroit DI^[44]	2023^[45]^[46]
Reaxys	Elsevier	1989

Tools

The computational representations are usually made transparent to chemists by graphical display of the data. Data entry is also simplified through the use of chemical structure editors. These editors internally convert the graphical data into computational representations.

There are also numerous algorithms for the interconversion of various formats of representation. An open-source utility for conversion is OpenBabel. These search and conversion algorithms are implemented either within the database system itself or as is now the trend is implemented as external components that fit into standard relational database systems. Both Oracle and PostgreSQL based systems make use of cartridge technology that allows user defined datatypes. These allow the user to make SQL queries with chemical search conditions (For example, a query to search for records having a phenyl ring in their structure represented as a SMILES string in a SMILESCOL column could be

SELECT*FROMCHEMTABLEWHERESMILESCOL.CONTAINS('c1ccccc1')

Algorithms for the conversion of IUPAC names to structure representations and vice versa are also used for extracting structural information from text. However, there are difficulties due to the existence of multiple dialects of IUPAC. Work is on to establish a unique IUPAC standard (See InChI).

Related Research Articles

Cheminformatics refers to the use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

Quantitative structure–activity relationship models are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.

<span class="mw-page-title-main">Chemical space</span> Concept in cheminformatics

Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds which are readily accessible and available to researchers. It is a library used in the method of molecular docking.

In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when a ligand and a target are bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.

PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database.

A structural analog, also known as a chemical analog or simply an analog, is a compound having a structure similar to that of another compound, but differing from it in respect to a certain component.

Open Babel is a free chemical informatics software designed to facilitate the conversion of Chemical file formats and manage molecular data. It serves as a chemical expert system, widely used in fields such as cheminformatics, molecular modelling, and computational chemistry. Open Babel provides both a comprehensive library and command-line utilities, making it a versatile tool for researchers, developers, and professionals.

Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

ISIS/Draw was a chemical structure drawing program developed by MDL Information Systems. It introduced a number of file formats for the storage of chemical information that have become industry standards.

Substructure search (SSS) is a method to retrieve from a database only those chemicals matching a pattern of atoms and bonds which a user specifies. It is an application of graph theory, specifically subgraph matching in which the query is a hydrogen-depleted molecular graph. The mathematical foundations for the method were laid in the 1870s, when it was suggested that chemical structure drawings were equivalent to graphs with atoms as vertices and bonds as edges. SSS is now a standard part of cheminformatics and is widely used by pharmaceutical chemists in drug discovery.

Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

ChemSpider is a freely accessible online database of chemicals owned by the Royal Society of Chemistry. It contains information on more than 100 million molecules from over 270 data sources, each of them receiving a unique identifier called ChemSpider Identifier.

Druglikeness is a qualitative concept used in drug design for how "druglike" a substance is with respect to factors like bioavailability. It is estimated from the molecular structure before the substance is even synthesized and tested. A druglike molecule has properties such as:

SMILES arbitrary target specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.

Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.

Antony John Williams is a British chemist and expert in the fields of both nuclear magnetic resonance (NMR) spectroscopy and cheminformatics at the United States Environmental Protection Agency. He is the founder of the ChemSpider website that was purchased by the Royal Society of Chemistry in May 2009. He is a science blogger and an author.

The ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database is a publicly available tool developed by the National Institute of Allergy and Infectious Diseases to compile preclinical data on small molecules with potential therapeutic action against HIV/AIDS and related opportunistic infections.

Matched molecular pair analysis (MMPA) is a method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, such as the substitution of a hydrogen atom by a chlorine one. Such pairs of compounds are known as matched molecular pairs (MMP). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair can more easily be interpreted. The term was first coined by Kenny and Sadowski in the book Chemoinformatics in Drug Discovery.

A chemical graph generator is a software package to generate computer representations of chemical structures adhering to certain boundary conditions. The development of such software packages is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.

SIRIUS is a Java-based open-source software for the identification of small molecules from fragmentation mass spectrometry data without the use of spectral libraries. It combines the analysis of isotope patterns in MS1 spectra with the analysis of fragmentation patterns in MS2 spectra. SIRIUS is the umbrella application comprising CSI:FingerID, CANOPUS, COSMIC and ZODIAC.

References

↑ "Home Page - ScrubChem". scrubchem.org. Archived from the original on 26 May 2017.
↑ Harris, JB (2019). "Post-processing of Large Bioactivity Data". Bioinformatics and Drug Discovery. Methods Mol Biol. Vol. 1939. pp. 37–47. doi:10.1007/978-1-4939-9089-4_3. ISBN 978-1-4939-9088-7. PMID 30848455. S2CID 73493315.
↑ "ChEMBL Database".
↑ "PubChem". pubchem.ncbi.nlm.nih.gov.
↑ Wang, Y; Bryant, SH; Cheng, T; Wang, J; Gindulyte, A; Shoemaker, BA; Thiessen, PA; He, S; Zhang, J (2017). "PubChem BioAssay: 2017 update". Nucleic Acids Res. 45 (D1): D955 –D963. doi:10.1093/nar/gkw1118. PMC 5210581 . PMID 27899599.
↑ Hoffmann, Torsten; Gastreich, Marcus (2019). "The next level in chemical space navigation: going far beyond enumerable compound libraries". Drug Discovery Today. 24 (5): 1148–1156. doi: 10.1016/j.drudis.2019.02.013 . PMID 30851414.
↑ Sadybekov, Anastasiia V.; Katritch, Vsevolod (2023). "Computational approaches streamlining drug discovery". Nature. 616 (7958): 673–685. Bibcode:2023Natur.616..673S. doi: 10.1038/s41586-023-05905-z . PMID 37100941.
1 2 Currano, Judith N. (2014). "Chapter 5. Searching by Structure and Substructure". Chemical Information for Chemists. pp. 109–145. doi:10.1039/9781782620655-00109. ISBN 978-1-84973-551-3.
↑ Ullmann, J. R. (1976). "An Algorithm for Subgraph Isomorphism". Journal of the ACM . 23: 31–42. doi: 10.1145/321921.321925 .
↑ Warr, Wendy A. (2011). "Representation of chemical structures". WIREs Computational Molecular Science. 1 (4): 557–579. doi:10.1002/wcms.36.
1 2 "PubChem Structure Search". pubchem.ncbi.nlm.nih.gov. Retrieved 2024-08-01.
↑ Pearlman, R.S.; Smith, K.M. (1999). "Metric Validation and the Receptor-Relevant Subspace Concept". J. Chem. Inf. Comput. Sci. 39: 28–35. doi:10.1021/ci980137x.
↑ "BCUTDescriptor (cdk 2.5 API)". CDK - Chemistry Development Kit. 2021-05-05. Retrieved 2024-06-04.
↑ Burden, Frank R. (1 August 1989). "Molecular identification number for substructure searches". Journal of Chemical Information and Computer Sciences. 29 (3): 225–227. doi:10.1021/ci00063a011.
↑ Pearlman, R.S.; Smith, K.M. (1999). "Metric Validation and the Receptor-Relevant Subspace Concept". J. Chem. Inf. Comput. Sci. 39: 28–35. doi:10.1021/ci980137x.
↑ Lin, Jr., Hung; Clark, Timothy (2005). "An analytical, variable resolution, complete description of static molecules and their intermolecular binding properties". Journal of Chemical Information and Modeling. 45 (4): 1010–1016. doi:10.1021/ci050059v. PMID 16045295.
↑ Meek, P. J.; Liu, Z.; Tian, L.; Wang, C. J; Welsh, W. J; Zauhar, R. J (2006). "Shape Signatures: speeding up computer aided drug discovery". DDT 2006. 19–20 (19–20): 895–904. doi:10.1016/j.drudis.2006.08.014. PMID 16997139.
↑ Grant, J. A; Gallardo, M. A.; Pickup, B. T. (1996). "A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape". Journal of Computational Chemistry. 17 (14): 1653–1666. doi:10.1002/(sici)1096-987x(19961115)17:14<1653::aid-jcc7>3.0.co;2-k. S2CID 96794688.
↑ Ballester, P. J.; Richards, W. G. (2007). "Ultrafast shape recognition for similarity search in molecular databases". Proceedings of the Royal Society A . 463 (2081): 1307–1321. Bibcode:2007RSPSA.463.1307B. doi:10.1098/rspa.2007.1823. S2CID 12540483.
↑ Kim, Sunghwan (2021). "Exploring Chemical Information in PubChem". Current Protocols. 1 (8): e217. doi:10.1002/cpz1.217. PMC 8363119 . PMID 34370395.
↑ Williams, Antony J. (2010). "ChemSpider: Integrating Structure-Based Resources Distributed across the Internet". Enhancing Learning with Online Resources, Social Networking, and Digital Libraries. ACS Symposium Series. Vol. 1060. pp. 23–39. doi:10.1021/bk-2010-1060.ch002. ISBN 978-0-8412-2600-5.
↑ Jarabak, Charlotte; Mutton, Troy; Ridley, Damon D. (2020). "Property Information in Substance Records in Major Web-Based Chemical Information and Data Retrieval Tools: Understanding Content, Search Opportunities, and Application to Teaching". Journal of Chemical Education. 97 (5): 1345–1359. Bibcode:2020JChEd..97.1345J. doi:10.1021/acs.jchemed.9b00966.
↑ "Substructure Search Now Available in PATENTSCOPE". www.wipo.int. 2019-02-11. Retrieved 2024-08-04.
↑ Ertl, Peter; Patiny, Luc; Sander, Thomas; et al. (2015). "Wikipedia Chemical Structure Explorer: Substructure and similarity searching of molecules from Wikipedia". Journal of Cheminformatics. 7: 10. doi: 10.1186/s13321-015-0061-y . PMC 4374119 . PMID 25815062.
↑ Tingle, Benjamin I.; Tang, Khanh G.; Castanon, Mar; Gutierrez, John J.; Khurelbaatar, Munkhzul; Dandarchuluun, Chinzorig; Moroz, Yurii S.; Irwin, John J. (2023). "ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery". Journal of Chemical Information and Modeling. 63 (4): 1166–1176. doi: 10.1021/acs.jcim.2c01253 . PMC 9976280 . PMID 36790087.
↑ Warr, Wendy A.; Nicklaus, Marc C.; Nicolaou, Christos A.; Rarey, Matthias (2022). "Exploration of Ultralarge Compound Collections for Drug Discovery". Journal of Chemical Information and Modeling. 62 (9): 2021–2034. doi:10.1021/acs.jcim.2c00224. PMID 35421301.
↑ Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2000). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1 (1): 12. doi: 10.1186/1758-2946-1-12 . PMC 2820491 . PMID 20298518.
↑ Rahman, S. Asad; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2009). "Small Molecule Subgraph Detector (SMSD) Toolkit". Journal of Cheminformatics. 1 (1): 12. doi: 10.1186/1758-2946-1-12 . PMC 2820491 . PMID 20298518.
↑ Butina, Darko (1999). "Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets". Chem. Inf. Comput. Sci. 39 (4): 747–750. doi:10.1021/ci9803381.
↑ "BIOVIA Direct - BIOVIA - Dassault Systèmes®". 8 September 2023.
↑ "JChem Engines | ChemAxon".
↑ "Chemistry – Oracle Cartridge | Inside Informatics".
↑ Pavlov, D.; Rybalkin, M.; Karulin, B. (2010). "Bingo from SciTouch LLC: Chemistry cartridge for Oracle database". Journal of Cheminformatics. 2 (Suppl 1): F1. doi: 10.1186/1758-2946-2-S1-F1 . PMC 2867114 .
↑ "Small Molecule Drug Discovery Software". Small Molecule Drug Discovery Software.
↑ "BIOVIA Chemical Registration - BIOVIA - Dassault Systèmes®". www.3ds.com. 7 September 2023.
↑ "Register". Archived from the original on 2021-12-10. Retrieved 2021-03-13.
↑ "Scilligence RegMol | Scilligence". 6 June 2016. Archived from the original on September 29, 2018.
↑ "Compound Registration". chemaxon.com.
↑ "Signals Notebook - PerkinElmer Informatics". perkinelmerinformatics.com.
↑ "CDD Vault Update: CDD Vault is Now an ELN". 16 February 2018.
↑ "CDD Electronic Lab Notebook (ELN)". 14 August 2019.
↑ "Electronic Lab Notebooks: What they are (And why you need one)". 4 August 2019.
↑ "Review of SDF Pro from Adroit DI. June 2023 – Macs in Chemistry". 2023-11-05. Retrieved 2024-03-11.
↑ "Adroit DI main page". adroitdi.com. Retrieved 2024-03-10.
↑ "Adroit DI's SDF Pro: The Fast and Affordable Solution to Storing, Sorting and Wrangling 10 Million Molecules in Seconds". www.businesswire.com. 2023-05-16. Retrieved 2024-03-10.
↑ "Best of the Best Entity Registration". 20Visioneers15. Retrieved 2024-03-10.

47. https://www.elsevier.com/en-in/products/reaxys

External links

Wikipedia Chemical Structure Explorer to search Wikipedia chemistry articles by substructure

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Home Page - ScrubChem". scrubchem.org. Archived from the original on 26 May 2017.

[2] Harris, JB (2019). "Post-processing of Large Bioactivity Data". Bioinformatics and Drug Discovery. Methods Mol Biol. Vol. 1939. pp. 37–47. doi:10.1007/978-1-4939-9089-4_3. ISBN 978-1-4939-9088-7. PMID 30848455. S2CID 73493315.

[3] "ChEMBL Database".

[4] "PubChem". pubchem.ncbi.nlm.nih.gov.

[5] Wang, Y; Bryant, SH; Cheng, T; Wang, J; Gindulyte, A; Shoemaker, BA; Thiessen, PA; He, S; Zhang, J (2017). "PubChem BioAssay: 2017 update". Nucleic Acids Res. 45 (D1): D955 –D963. doi:10.1093/nar/gkw1118. PMC 5210581 . PMID 27899599.

[Hoffmann2019-6] Hoffmann, Torsten; Gastreich, Marcus (2019). "The next level in chemical space navigation: going far beyond enumerable compound libraries". Drug Discovery Today. 24 (5): 1148–1156. doi: 10.1016/j.drudis.2019.02.013 . PMID 30851414.

[7] Sadybekov, Anastasiia V.; Katritch, Vsevolod (2023). "Computational approaches streamlining drug discovery". Nature. 616 (7958): 673–685. Bibcode:2023Natur.616..673S. doi: 10.1038/s41586-023-05905-z . PMID 37100941.

[Currano-8] 1 2 Currano, Judith N. (2014). "Chapter 5. Searching by Structure and Substructure". Chemical Information for Chemists. pp. 109–145. doi:10.1039/9781782620655-00109. ISBN 978-1-84973-551-3.

[Ullmann-9] Ullmann, J. R. (1976). "An Algorithm for Subgraph Isomorphism". Journal of the ACM . 23: 31–42. doi: 10.1145/321921.321925 .

[Warr2011-10] Warr, Wendy A. (2011). "Representation of chemical structures". WIREs Computational Molecular Science. 1 (4): 557–579. doi:10.1002/wcms.36.

[Pubchem-11] 1 2 "PubChem Structure Search". pubchem.ncbi.nlm.nih.gov. Retrieved 2024-08-01.

[12] Pearlman, R.S.; Smith, K.M. (1999). "Metric Validation and the Receptor-Relevant Subspace Concept". J. Chem. Inf. Comput. Sci. 39: 28–35. doi:10.1021/ci980137x.

[q243-13] "BCUTDescriptor (cdk 2.5 API)". CDK - Chemistry Development Kit. 2021-05-05. Retrieved 2024-06-04.

[14] Burden, Frank R. (1 August 1989). "Molecular identification number for substructure searches". Journal of Chemical Information and Computer Sciences. 29 (3): 225–227. doi:10.1021/ci00063a011.

[15] Pearlman, R.S.; Smith, K.M. (1999). "Metric Validation and the Receptor-Relevant Subspace Concept". J. Chem. Inf. Comput. Sci. 39: 28–35. doi:10.1021/ci980137x.

[16] Lin, Jr., Hung; Clark, Timothy (2005). "An analytical, variable resolution, complete description of static molecules and their intermolecular binding properties". Journal of Chemical Information and Modeling. 45 (4): 1010–1016. doi:10.1021/ci050059v. PMID 16045295.

[17] Meek, P. J.; Liu, Z.; Tian, L.; Wang, C. J; Welsh, W. J; Zauhar, R. J (2006). "Shape Signatures: speeding up computer aided drug discovery". DDT 2006. 19–20 (19–20): 895–904. doi:10.1016/j.drudis.2006.08.014. PMID 16997139.

[18] Grant, J. A; Gallardo, M. A.; Pickup, B. T. (1996). "A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape". Journal of Computational Chemistry. 17 (14): 1653–1666. doi:10.1002/(sici)1096-987x(19961115)17:14<1653::aid-jcc7>3.0.co;2-k. S2CID 96794688.

[19] Ballester, P. J.; Richards, W. G. (2007). "Ultrafast shape recognition for similarity search in molecular databases". Proceedings of the Royal Society A . 463 (2081): 1307–1321. Bibcode:2007RSPSA.463.1307B. doi:10.1098/rspa.2007.1823. S2CID 12540483.

[20] Kim, Sunghwan (2021). "Exploring Chemical Information in PubChem". Current Protocols. 1 (8): e217. doi:10.1002/cpz1.217. PMC 8363119 . PMID 34370395.

[21] Williams, Antony J. (2010). "ChemSpider: Integrating Structure-Based Resources Distributed across the Internet". Enhancing Learning with Online Resources, Social Networking, and Digital Libraries. ACS Symposium Series. Vol. 1060. pp. 23–39. doi:10.1021/bk-2010-1060.ch002. ISBN 978-0-8412-2600-5.

[22] Jarabak, Charlotte; Mutton, Troy; Ridley, Damon D. (2020). "Property Information in Substance Records in Major Web-Based Chemical Information and Data Retrieval Tools: Understanding Content, Search Opportunities, and Application to Teaching". Journal of Chemical Education. 97 (5): 1345–1359. Bibcode:2020JChEd..97.1345J. doi:10.1021/acs.jchemed.9b00966.

[23] "Substructure Search Now Available in PATENTSCOPE". www.wipo.int. 2019-02-11. Retrieved 2024-08-04.

[24] Ertl, Peter; Patiny, Luc; Sander, Thomas; et al. (2015). "Wikipedia Chemical Structure Explorer: Substructure and similarity searching of molecules from Wikipedia". Journal of Cheminformatics. 7: 10. doi: 10.1186/s13321-015-0061-y . PMC 4374119 . PMID 25815062.

[ZINC-25] Tingle, Benjamin I.; Tang, Khanh G.; Castanon, Mar; Gutierrez, John J.; Khurelbaatar, Munkhzul; Dandarchuluun, Chinzorig; Moroz, Yurii S.; Irwin, John J. (2023). "ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery". Journal of Chemical Information and Modeling. 63 (4): 1166–1176. doi: 10.1021/acs.jcim.2c01253 . PMC 9976280 . PMID 36790087.

[26] Warr, Wendy A.; Nicklaus, Marc C.; Nicolaou, Christos A.; Rarey, Matthias (2022). "Exploration of Ultralarge Compound Collections for Drug Discovery". Journal of Chemical Information and Modeling. 62 (9): 2021–2034. doi:10.1021/acs.jcim.2c00224. PMID 35421301.

[SMSD09-27] Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2000). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1 (1): 12. doi: 10.1186/1758-2946-1-12 . PMC 2820491 . PMID 20298518.

[28] Rahman, S. Asad; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2009). "Small Molecule Subgraph Detector (SMSD) Toolkit". Journal of Cheminformatics. 1 (1): 12. doi: 10.1186/1758-2946-1-12 . PMC 2820491 . PMID 20298518.

[29] Butina, Darko (1999). "Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets". Chem. Inf. Comput. Sci. 39 (4): 747–750. doi:10.1021/ci9803381.

[30] "BIOVIA Direct - BIOVIA - Dassault Systèmes®". 8 September 2023.

[31] "JChem Engines | ChemAxon".

[32] "Chemistry – Oracle Cartridge | Inside Informatics".

[33] Pavlov, D.; Rybalkin, M.; Karulin, B. (2010). "Bingo from SciTouch LLC: Chemistry cartridge for Oracle database". Journal of Cheminformatics. 2 (Suppl 1): F1. doi: 10.1186/1758-2946-2-S1-F1 . PMC 2867114 .

[34] "Small Molecule Drug Discovery Software". Small Molecule Drug Discovery Software.

[35] "BIOVIA Chemical Registration - BIOVIA - Dassault Systèmes®". www.3ds.com. 7 September 2023.

[36] "Register". Archived from the original on 2021-12-10. Retrieved 2021-03-13.

[37] "Scilligence RegMol | Scilligence". 6 June 2016. Archived from the original on September 29, 2018.

[38] "Compound Registration". chemaxon.com.

[39] "Signals Notebook - PerkinElmer Informatics". perkinelmerinformatics.com.

[40] "CDD Vault Update: CDD Vault is Now an ELN". 16 February 2018.

[41] "CDD Electronic Lab Notebook (ELN)". 14 August 2019.

[42] "Electronic Lab Notebooks: What they are (And why you need one)". 4 August 2019.

[43] "Review of SDF Pro from Adroit DI. June 2023 – Macs in Chemistry". 2023-11-05. Retrieved 2024-03-11.

[44] "Adroit DI main page". adroitdi.com. Retrieved 2024-03-10.

[45] "Adroit DI's SDF Pro: The Fast and Affordable Solution to Storing, Sorting and Wrangling 10 Million Molecules in Seconds". www.businesswire.com. 2023-05-16. Retrieved 2024-03-10.

[46] "Best of the Best Entity Registration". 20Visioneers15. Retrieved 2024-03-10.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]