List of chemical databases

Last updated

This is a list of websites that contain lists of chemicals, or databases of chemical information. There is further detail on the content of these and other resources in a Wikibook of information sources.

AbbreviationFull nameOperatorSelectsContainsID prefixQualityLinkEntries
ACToR Environmental Protection Agency toxicology information; occurrence "ACToR". Archived from the original on November 2, 2016.893,280
AtomWorkInorganic Material Database National Institute for Materials Science crystal structures "AtomWork" .82,000
Beilstein Beilstein database Elsevier organic compoundspropertiesclosed access
BIAdb Benzylisoquinoline Alkaloid Database "BIAdb".846
BindingDBThe Binding Database Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diegononcovalent association of molecules in solutionChEMBL SMILES InChiKey targets "BindingDB".
BindingMOADBinding Mother of All Databasesprotein ligand structures "BindingMOAD".36047
BMDB Bovine Metabolome Database Collaborative Drug Discovery BMDBmanually selected and checked "BMDB".7859
BMRBBiological Magnetic Resonance Data BankUniversity of Wisconsinbiological molecules including ligands, cofactors, peptides, saccharidesNMR spectroscopy "BMRB".
BRENDA Technical University of Braunschweig enzymes ligands "BRENDA".
Carotenoids DatabasecarotenoidsCA "Carotenoids".1195
CCCBDBComputational Chemistry Comparison and Benchmark DataBase National Institute of Standards and Technology gas phase molecules "CCCDBD" 2069
CCRISChemical Carcinogenesis Research Information SystemNational Library of Medicinesubstances that affect tumorsCCRISfrom primary literature, reviewed by experts "CCRIS subset of PubChem".9562 [1] [2]
CDD Public drug candidateslimited access3,000,000
ChEBI Chemical Entities of Biological Interest ELIXIR small chemical compoundsfrom PDBeChem ChEMBL KEGG IntEnz "ChEBI".60,000
Chematica Merck organic chemicalsreaction pathway calculation; Beilstein CAS SMILESproprietary7,000,000
ChEMBL Chemicals from European Molecular Biology Laboratory EMBL molecules with drug-like properties "ChEMBL".1,961,000
cheML.io Departments of Computer Science and Chemistry at Nazarbayev Universityde novo molecules generated by ML modelsSMILES, computed propertiesartificially generated "cheML.io". [3] 2,800,000
ChemDBchemical databasesmall molecules "ChemDB".5,000,000
Chemical Book East West University commercially available compoundsCASno, suppliers, properties "Chemical Book".200,000
Chemical Registerfrom 20,000 vendorsCASno mainly from larger-scale suppliers "Chemical Register".1,750,000
ChemIDplus National Library of Medicineother NLM databases; regulated substancesCASNo UNII structureCMNPD 400,000
ChemSpider Royal Society of Chemistry from 275 data sources "ChemSpider".88,000,000
ChemIndex chemical databasesubstancesCAS Search; suppliers "Chemindex".
Clival DatabaseClinical Trail DatabaseClinical Trail Data Solutions50,000 molecules clinical trail dataPhase 0 to IV indications "clival".
CMNPDComprehensive Marine Natural Products Database Peking University from literature and other databasesstructural classification; speciesCMNPDcurated https://www.cmnpd.org/ 31,561
COD Crystallography Open Database Vilnius University small molecules (open source)crystal structure atomic coordinatesCODcurated "COD".478,715
Common Chemistry American Chemical Society structure CAS SMILES InCh https://commonchemistry.cas.org/ [4] ~500,000
Compendium of Pesticide Common Names British Crop Production Council Pesticides with ISO common namesstructure, CASNo, IUPAC name, SMILES, InChIcurated "Compendium of Pesticide Common Names".1,800
CompTox CompTox Chemicals Dashboard US Environmental Protection Agency chemicals evaluated for potential health risks "CompTox". Archived from the original on December 16, 2019.
CosIngCosmetic IngredientsEuropean Commissioncosmetic ingredients "CosIng".
CrystalWorks Science and Technology Facilities Council "CrystalWorks" .
CSD Cambridge Structural Database Cambridge Crystallographic Data Centre "CSD".1,038,250
CSDBCarbohydrate Structure Database Zelinsky Institute of Organic Chemistry carbohydratesstructures referencesCSDB ID "CSDB".
CTD Comparative Toxicogenomics Database Department of Biological Sciences at North Carolina State UniversityMeSH CASNo ChEBI PubChem genes, pathways "CTD".
DDB Dortmund Data Bank pure compounds, mixtures, gas hydratesphysical properties "DDB" .
Dissociation Constants IUPAC Digitized pKa Dataset IUPAC dissociation constants "Dissociation Constants". GitHub .
DETHERMDECHEMAthermophysical properties "DETHERM" .75,000
DrugBank University of Albertadrugs "DrugBank".
DrugCentralUniversity of New Mexicopharmaceuticalsproducts containing substance "DrugCentral".
DTP/NCIDTP Open Compound collection National Cancer Institute Development Therapeutics ProgramCancer therapeuticsCancer Chemotherapy National Service Center number "DTP/NCI".250,000
ECHA REACH database European Chemicals Agency EINECS ELINCS NLP CASNo HPhrases pictograms tonnage "ECHA/REACH".245,000
EAWAG-BBDBiocatalysis/Biodegradation DatabaseEawag: Swiss Federal Institute of Aquatic Science and TechnologyCAS SMILES pubchem pathways "EAWAG-BBD".1396
eMolecules drug screening chemicalslist of suppliers and catalog numbers "eMolecules".8,000,000 [5]
ENCS Japanese Existing and New Chemical Substances Inventory regulated chemicals "ENCS (in Japanese)".
Evaluated Kinetic Data IUPAC rate constantscurated "Evaluated Kinetic Data".
FDA SRSFood and Drug Administration Substance Registration System U.S. National Library of Medicine ingredients in FDA regulated productsUNII inchikey "FDA SRS".781,000
FEMA Flavor Ingredient Library Flavor and Extract Manufacturers Association CAS CFR FEMA number "FEMA".
FooDB Food DatabaseUniversity of AlbertaFood components and additives "FooDB".70926
GlyTouCaninternational glycan structure repositoryMinistry of Education, Culture, Sports, Science & Technology
[ which country? ]
glycansWURCS GlycoCT PubChem CIDG "Glycan Repository".122194
Gmelin Gmelin database Elsevier inorganic and organometallic compoundsclosed access1,500,000
G-SRS Global Substance Registration System CAS PubChem ChEMBL INN UNII "G-SRS".109,260
GMDGolm Metabolome DatabaseGC/MS of metabolites "GMD".
Guide to PHARMACOLOGY IUPHAR drugs and targetsINN CAS ChEBI ChEMBL DrugBank PubChem "Guide to PHARMACOLOGY".
Henry's law constants Max Planck Institute for Chemistry volatile compoundsHenry's law constantsfrom literature "Henry's law constants".46434
HMDB Human Metabolome Database Genome Canada metabolites found in the human bodybiochemical data, clinical dataHMDB "HMDB".114,222 [6]
HugeMDBHuge Molecular Database Elegant Mathematics LLC Small molecules (most of entries have <100 atoms)major conformers with its 3D and easy search on themMgood correlated with PubChem on data that is available on PubChem "HugeMDB".102 million
ICSCILO International Chemical Safety Cards International Labour Organization CAS, EC number, UNnumber "ICSC".1784
ICSD Inorganic Crystal Structure Database FIZ Karlsruhe GmbH "ICSD".161,030
IEDBImmune Epitope Database National Institute of Allergy and Infectious Diseases Epitopes mainly peptides and carbohydrates "IEDB".3,002 non-peptides
ILThermoIonic liquids Database National Institute of Standards and Technology ionic liquids including their solutions and mixturesphysical properties "ILThermo".3041
IUPAC-NIST Solubility Database https://srdata.nist.gov/solubility/index.aspx 67,500
JECDB Japan Existing Chemical DatabaseCAS EINECS RTECS SDBS TSCA graph of number of articles per year "JECDB".
J-GLOBALNikaji Japan Science and Technology Agency "J-GLOBAL".
KEGG Kyoto Encyclopedia of Genes and Genomes Kyoto University Bioinformatics Center Compounds Glycans (also enzymes, reactions, pathways)CAS ChEBI ChEMBL MASSBANK NIKKAJI PubChem PDB-CCD "KEGG".
Ki Database PDSPligand binding "Ki Database".
KNApSAcK Nara Institute of Science and Technology InChI CAS SMILES organismsC00 "KNApSAcK".
LINCSLibrary of Integrated Network-based Cellular Signaturessmall moleculesPubChem ChEMBL SMILES InChILSM "LINCS".43,700
LipidBankJapanese Conference on the Biochemistry of Lipidslipids "LipidBank".7,009
LMSD LIPID MAPS Structure DatabaseLipidsHMDB ChEBI PubChem InChILMFA "LMSD".44701
LOLI List of Listssafety data sheets, regulation "LOLI" .
Mculesupplied chemicalsInChI, SMILES, SDF, physichochemical properties "Mcule".45,000,000
MediaDB Institute for Systems Biology growth media "MediaDB".288
Merck Index Royal Society of Chemistry drugs "Merck-Index" .11,500
MeSH Medical Subject Headings US National Library of Medicine biomedical thesaurushierarchy of descriptors to literature with MeSH ID "MeSH".
MetaCyc SRI International metabolic pathways; metabolites "MetaCyc".
MetaboLightsEMBL-EBIMTBL "MetaboLights".
MetaNetX SIB Swiss Institute of Bioinformatics metabolic networks, metabolites, biochemical reactions, cellular compartmentsmetabolic models, SBML, InChI, InChIKey, SMILESMNXMunified namespace for metabolites and biochemical reactions in the context of metabolic models "MetaNetX".240 metabolic models, 1292154 metabolites, 74613 reactions, 44 compartments
METLIN Metabolite and Chemical Entity Databasetandem mass spectrometry of metabolites "METLIN" .960,000
MINAS Metal Ions in Nucleic AcidS University of Zurich https://www.minas.uzh.ch/
ModelSeedKEGG

MetaCyc

metabolic pathways

CPD "ModelSeed" .
MolPortcatalog chemicals "MolPort".
MoNAMass Bank of North Americamass spectrasplash legg chemspider pubchem chebi CAS "MoNA".200,000
npatlasThe Natural Products Atlas Simon Fraser University microbial and fungal productssmiles, organismNPA npatlas [7] 33434
NIOSH pocket guideNIOSH Pocket Guide to Chemical Hazards National Institute for Occupational Safety and Health commonly used chemicalsexposure limits "NIOSH". 2 August 2024.677
NIST WebbookNIST Chemistry Webbook National Institute of Standards and Technology spectra CAS ionization energy mass spectrum, InChIC+CAS "NIST Webbook".
NMRShiftDB University of Cologne organicnuclear magnetic resonance spectra "NMRShiftDB".43,581
NORMAN SLENORMAN Suspect List Exchangeenvironmental monitoring "NORMAN SLE".110,000
OMGOpen Macromolecular Genome Jackson group at University of Illinois at Urbana-Champaign synthetically accessible linear homopolymersSMILES of linear homopolymers Github / Zenodo 12,886,131
ORDOpen Reaction DatabaseORD consortiumOrganic reactionsmachine-readable reaction schemes "ORD" [8] 2,000,000
OrgSyn Organic Syntheses Organic Syntheses, Inc.Reliable chemical reactionsSearchable experimental proceduresPeer reviewed "OrgSyn search".
PDB PDBeProtein Data Bank in Europe EMBL-EBI has some chemicals as well as proteins "PDBe".
PATENTSCOPE WIPO "PATENTSCOPE".16,000,000
PDBRSCB Protein Data Bank "PDB".166,891
PharmGKBShriram Center for Bioengineering and Chemical Engineeringdrugs targetsprescribing infocurated "PharmGKB".
PHAROSIlluminating the Druggable GenomeNational Institutes of Healthdrug ligands; targets [9] https://pharos.nih.gov/ 355932 ligands

20412 targets

Phenol-Explorerpolyphenols found in food "Phenol-Explorer".500
Phosida PHOsphorylation SIte DAtabaseprotein modifications "Phosida".
PoLyInfoPolymer Database National Institute for Materials Science physical properties "PoLyInfo" .26,000
PPDBPesticide Properties DatabaseAgriculture & Environment Research Unit, University of Hertfordshire Pesticides and their metabolitesChemical structure, physicochemical properties, human health and ecotoxicological datacurated "PPDB".2000 [10]
Probes and Drugs
ProCarDBProkaryotic Bacterial Carotenoid DataBase IMTECH spectra references "ProCarDB".1800
PubChem National Library of Medicine National Center for Biotechnology Informationfrom 748 data sourcesStructures, Names and Identifiers, Chemical and Physical Properties, Spectral Information, Related Records, Chemical Vendors, Pharmacology and Biochemistry, Use and Manufacturing, Safety and Hazards, Toxicity, Literature, Patents, Biomolecular Interactions and Pathways, Biological Test Results "PubChem".103,000,000
Reaxys Elsevier chemical compoundsSearchable chemical reactions "About Reaxys" .118,000,000
Ref-DBRe-referenced Protein Chemical shift Databaseproteins from BioMagResBankRe-referenced NMR shift "Ref-DB".2162
Rhea Swiss Institute of Bioinformatics biochemical reactionsChEBIcurated "Rhea".
RÖMPP Thieme Gruppe "RÖMPP" .
RTECS Registry of Toxic Effects of Chemical Substances Dassault Systèmes Toxicity, Literature "Biovia-RTECS" . 8 September 2023.160,000
RxNav U.S. National Library of Medicine  drugsinteractions "RxNav".
SaguaroChemDe Novo ChemChemical reactions from the patent literatureChemical reaction SMILES, annotated procedures, characterization data, reference metadataCurated from patent literature "SaguaroChem" . 4 July 2024.2,091,105
SciFinder Chemical Abstracts Service of American Chemical Society organic, inorganic chemicals, proteinsCASNopaid access only130,000,000
ScrubChemscraped from PubChem "ScrubChem" .2,282,992
SDBS Spectral Database for

Organic Compounds

National Institute of Advanced Industrial Science and Technology (AIST), JapanOrganic compoundsSpectra:IR Raman MASS ESR 1H NMR 13C NMRSDBS Nocurated "SDBS".34,000
Serum Metabolome Database The Metabolomics Innovation Centre found in blood serum "Serum Metabolome DB".4,651
Solvent Selection Tool ACS Green Chemistry Institute SolventsPrincipal components analysis of physical propertiescurated "Solvent Selection Tool".272 [11]
SPRESIweb InfoChem Gesellschaft für chemische Information mbHorganic molecules and reactionsorganic structuresfrom literature "SPRESI" .5,800,000
SpringerMaterials Springer solid materialsCAS InChI physical propertiesfrom literature "SpringerMaterials" .155,165 + 494,942
STITCHEMBLfrom Biocarta, BioCyc, GO, KEGG, and ReactomeChemical-Protein Interactionscurated and predicted "STITCH".500,000
SuperDRUG2Structural Bioinformatics Groupdrugs targetstargets, dose, side effects, Canonical SMILES, Standard InChI, Standard InChIKey, DrugBank, ChEMBL, DrugCentral, KEGG, PubChem, CASRNSD "SuperDRUG2".4,600
Super Natural IInatural product chemicalsSMILES vendorsSN00 "Super Natural II".325,508
SureChEMBLEuropean Molecular Biology Laboratorysubstances in patentspatent text "SureChEMBL".
SwissLipids Swiss Institute of Bioinformatics lipidsSLM: "SwissLipids".
TDR Targets Tropical Disease ResearchTrypanosomatics Laboratorydrugs and targets "TDR Targets".2,000,000
TTDTherapeutic Targets Database Zhejiang University drugs and targetsSMILES InChI CAS PubChem "TTD".37,316
T3DBToxin and Toxin-Target Database

Toxic Exposome Database

University of Alberta toxins and toxin targetsT3D "T3DB".3,678
UniChemEMBL-EBIpointers to existing chemicals; indexes 41 databases [12] Structure; StdInChI; links to databasesautomated loads ""Compound Sources Search"".>2000000
UniProtUniProt Knowledgebaseproteinssequence, modifications, location, organism, similar "UniProt".
US DOTUS Department of transportEmergency response guidebook

DOT + others

bulk transported chemicalsUNnumber United Nations ID number, hazard response guide "Emergency response guidebook" (PDF).3000
UV/VIS Spectral AtlasThe MPI-Mainz UV/VIS spectral atlas of gaseous molecules of atmospheric interest Max Planck Institute for Chemistry gaseous moleculesabsorption cross sectionsfrom literature "UV/VIS Spectral Atlas".7313
YMDBYeast Metabolome Database The Metabolomics Innovation Centre metabolites of yeast48 data fieldsYMDB "YMDB".16042
ZINC ZINC is not commercial University of California, San Francisco purchasable substancesEPA DSS TOX, ChEMBL, HMDB, KEGG, PDB, SMILES "ZINC". [13] 37 billion

References

  1. "Chemical Carcinogenesis Research Information System (CCRIS) - PubChem Data Source". pubchem.ncbi.nlm.nih.gov. Retrieved 2020-08-07.
  2. "Download CCRIS (Chemical Carcinogenesis Research Information System) Data". www.nlm.nih.gov. Retrieved 2020-08-07.
  3. Zhumagambetov, Rustam; Kazbek, Daniyar; Shakipov, Mansur; Maksut, Daulet; Peshkov, Vsevolod A.; Fazli, Siamac (2020-12-17). "cheML.io: an online database of ML-generated molecules". RSC Advances. 10 (73): 45189–45198. Bibcode:2020RSCAd..1045189Z. doi:10.1039/D0RA07820D. ISSN   2046-2069. PMC   9058596 . PMID   35516285.
  4. Jacobs, Andrea; Williams, Dustin; Hickey, Katherine; Patrick, Nathan; Williams, Antony J.; Chalk, Stuart; McEwen, Leah; Willighagen, Egon; Walker, Martin; Bolton, Evan; Sinclair, Gabriel; Sanford, Adam (13 May 2022). "CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community". Journal of Chemical Information and Modeling. 62 (11): 2737–2743. doi: 10.1021/acs.jcim.2c00268 . PMC   9199008 . PMID   35559614.
  5. "Vision - eMolecules". www.emolecules.com. Retrieved 2020-07-27.
  6. "Human Metabolome Database: About the Human Metabolome Database". hmdb.ca. Retrieved 2020-07-27.
  7. Van Santen, Jeffrey A.; Jacob, Grégoire; Singh, Amrit Leen; et al. (2019). "The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery". ACS Central Science. 5 (11): 1824–1833. doi:10.1021/acscentsci.9b00806. PMC   6891855 . PMID   31807684.
  8. Kearnes, Steven M.; Maser, Michael R.; Wleklinski, Michael; et al. (2021). "The Open Reaction Database". Journal of the American Chemical Society. 143 (45): 18820–18826. doi:10.1021/jacs.1c09820.
  9. "Pharos: Illuminating the Druggable Genome". pharos.nih.gov. Retrieved 2024-10-02.
  10. Lewis, Kathleen A.; Tzilivakis, John; Warner, Douglas J.; Green, Andrew (2016). "An international database for pesticide risk assessments and management". Human and Ecological Risk Assessment. 22 (4): 1050–1064. Bibcode:2016HERA...22.1050L. doi:10.1080/10807039.2015.1133242. hdl: 2299/17565 . S2CID   87599872.
  11. Diorazio, Louis J.; Hose, David R. J.; Adlington, Neil K. (2016). "Toward a More Holistic Framework for Solvent Selection". Organic Process Research & Development. 20 (4): 760–773. doi: 10.1021/acs.oprd.6b00015 .
  12. "UniChem". www.ebi.ac.uk. Retrieved 2024-10-02.
  13. Tingle, Benjamin I.; Tang, Khanh G.; Castanon, Mar; Gutierrez, John J.; Khurelbaatar, Munkhzul; Dandarchuluun, Chinzorig; Moroz, Yurii S.; Irwin, John J. (2023). "ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery". Journal of Chemical Information and Modeling. 63 (4): 1166–1176. doi: 10.1021/acs.jcim.2c01253 . PMC   9976280 . PMID   36790087.