Last updated
Chembl logo.png
Description Biological database
Data types
Molecules with drug-like properties and biological activity
Research center European Molecular Biology Laboratory
Laboratory Flag of the United Kingdom.svg European Bioinformatics Institute
Authors Andrew Leach, Team Leader 2016-Present; John Overington, Team Leader 2008-2015
Primary citation PMID   21948594
Release date2009
Website ChEMBL
Download URL Downloads
Web service URL ChEMBL Webservices
Sparql endpoint ChEMBL EBI-RDF Platform
License The ChEMBL data is made available on a Creative Commons Attribution-Share Alike 3.0 Unported Licence
Versioning ChEMBL_25

ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug-like properties. [1] It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.


The database, originally known as StARlite, was developed by a biotechnology company called Inpharmatica Ltd. later acquired by Galapagos NV. The data was acquired for EMBL in 2008 with an award from The Wellcome Trust, [2] resulting in the creation of the ChEMBL chemogenomics group at EMBL-EBI, led by John Overington. [3] [4]

Scope and access

The ChEMBL database contains compound bioactivity data against drug targets. Bioactivity is reported in Ki, Kd, IC50, and EC50. [5] Data can be filtered and analyzed to develop compound screening libraries for lead identification during drug discovery. [6]

ChEMBL version 2 (ChEMBL_02) was launched in January 2010, including 2.4 million bioassay measurements covering 622,824 compounds, including 24,000 natural products. This was obtained from curating over 34,000 publications across twelve medicinal chemistry journals. ChEMBL's coverage of available bioactivity data has grown to become "the most comprehensive ever seen in a public database.". [3] In October 2010 ChEMBL version 8 (ChEMBL_08) was launched, with over 2.97 million bioassay measurements covering 636,269 compounds. [7]

ChEMBL_10 saw the addition of the PubChem confirmatory assays, in order to integrate data that is comparable to the type and class of data contained within ChEMBL. [8]

ChEMBLdb can be accessed via a web interface or downloaded by File Transfer Protocol. It is formatted in a manner amenable to computerized data mining, and attempts to standardize activities between different publications, to enable comparative analysis. [1] ChEMBL is also integrated into other large-scale chemistry resources, including PubChem and the ChemSpider system of the Royal Society of Chemistry.

Associated resources

In addition to the database, the ChEMBL group have developed tools and resources for data mining. [9] These include Kinase SARfari, an integrated chemogenomics workbench focussed on kinases. The system incorporates and links sequence, structure, compounds and screening data.

GPCR SARfari is a similar workbench focused on GPCRs, and ChEMBL-Neglected Tropical Diseases (ChEMBL-NTD) is a repository for Open Access primary screening and medicinal chemistry data directed at endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD is to provide a freely accessible and permanent archive and distribution centre for deposited data. [3]

July 2012 saw the release of a new malaria data service, sponsored by the Medicines for Malaria Venture (MMV), aimed at researchers around the globe. The data in this service includes compounds from the Malaria Box screening set, as well as the other donated malaria data found in ChEMBL-NTD.

myChEMBL, the ChEMBL virtual machine, was released in October 2013 to allow users to access a complete and free, easy-to-install cheminformatics infrastructure.

In December 2013, the operations of the SureChem patent informatics database were transferred to EMBL-EBI. In a portmanteau, SureChem was renamed SureChEMBL.

2014 saw the introduction of the new resource ADME SARfari - a tool for predicting and comparing cross-species ADME targets. [10]

See also

Related Research Articles

Drug discovery the process by which new candidate medications are discovered

In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered.

Cheminformatics is the use of computer and informational techniques applied to a range of problems in the field of chemistry. These in silico techniques are used, for example, in pharmaceutical companies and academic settings in the process of drug discovery. These methods can also be used in chemical and allied industries in various other forms.

A biological target is anything within a living organism to which some other entity is directed and/or binds, resulting in a change in its behavior or function. Examples of common classes of biological targets are proteins and nucleic acids. The definition is context-dependent, and can refer to the biological target of a pharmacologically active drug compound, the receptor target of a hormone, or some other target of an external stimulus. Biological targets are most commonly proteins such as enzymes, ion channels, and receptors.

UniProt database of protein sequence and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.

The European Bioinformatics Institute (EMBL-EBI) is an International Governmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains substance descriptions and small molecules with fewer than 1000 atoms and 1000 bonds. More than 80 database vendors contribute to the growing PubChem database.

Chemical Entities of Biological Interest, also known as ChEBI, is a database and ontology of molecular entities focused on 'small' chemical compounds, that is part of the Open Biomedical Ontologies effort. The term "molecular entity" refers to any "constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity". The molecular entities in question are either products of nature or synthetic products which have potential bioactivity. Molecules directly encoded by the genome, such as nucleic acids, proteins and peptides derived from proteins by proteolytic cleavage, are not as a rule included in ChEBI.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.


Chemogenomics, or chemical genomics, is the systematic screening of targeted chemical libraries of small molecules against individual drug target families with the ultimate goal of identification of novel drugs and drug targets. Typically some members of a target library have been well characterized where both the function has been determined and compounds that modulate the function of those targets have been identified. Other members of the target family may have unknown function with no known ligands and hence are classified as orphan receptors. By identifying screening hits that modulate the activity of the less well characterized members of the target family, the function of these novel targets can be elucidated. Furthermore, the hits for these targets can be used as a starting point for drug discovery. The completion of the human genome project has provided an abundance of potential targets for therapeutic intervention. Chemogenomics strives to study the intersection of all possible drugs on all of these potential targets.

Virtual screening

Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

ChemSpider database of chemicals owned by the Royal Society of Chemistry; see P661

ChemSpider is a database of chemicals. ChemSpider is owned by the Royal Society of Chemistry.

Collaborative Drug Discovery (CDD) is a software company founded in 2004 as a spin-out of Eli Lilly by Barry Bunin, PhD. CDD offers a web-based database solution for managing drug discovery data, primarily around small molecules and associated bio-assay data.

Aureus Sciences was a research-based company which sold software to the pharmaceutical industry for drug development.

Druggability is a term used in drug discovery to describe a biological target that is known to or is predicted to bind with high affinity to a drug. Furthermore, by definition, the binding of the drug to a druggable target must alter the function of the target with a therapeutic benefit to the patient. The concept of druggability is most often restricted to small molecules but also has been extended to include biologic medical products such as therapeutic monoclonal antibodies.

TDR Targets

The TDR Targets database is a bioinformatics project that seeks to exploit the availability of diverse genomic and chemical datasets to facilitate the identification and prioritization of drugs and drug targets in neglected disease pathogens. TDR in the name of the database stands from the popular abbreviation for a special programme within the World Health Organization, whose focus is Tropical Disease Research. The project was jumpstarted by funds from this programme, and the initial focus of the resource was on organisms/diseases of high priority for this Programme.

European Nucleotide Archive Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

Christoph Steinbeck German chemist

Christoph Steinbeck is a chemist born in Neuwied in 1966 and has a professorship for analytical chemistry, cheminformatics and chemometrics at the Friedrich-Schiller-Universität Jena in Thuringia, Germany.

The IUPHAR/BPS Guide to PHARMACOLOGY is an open-access website, acting as a portal to information on the biological targets of licensed drugs and other small molecules. The Guide to PHARMACOLOGY is developed as a joint venture between the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). This replaces and expands upon the original 2009 IUPHAR Database. The Guide to PHARMACOLOGY aims to provide a concise overview of all pharmacological targets, accessible to all members of the scientific and clinical communities and the interested public, with links to details on a selected set of targets. The information featured includes pharmacological data, target and gene nomenclature, as well as curated chemical information for ligands. Overviews and commentaries on each target family are included, with links to key references.

Experimental factor ontology

Experimental factor ontology, also known as EFO, is an open-access ontology of experimental variables particularly those used in molecular biology. The ontology covers variables which include aspects of disease, anatomy, cell type, cell lines, chemical compounds and assay information. EFO is developed and maintained at the EMBL-EBI as a cross-cutting resource for the purposes of curation, querying and data integration in resources such as Ensembl, ChEMBL and Expression Atlas.

Alex Bateman British bioinformatician

Alexander George Bateman is a computational biologist and Head of Protein Sequence Resources at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) in Cambridge, UK. He has led the development of the Pfam biological database and introduced the Rfam database of RNA families. He has also been involved in the use of Wikipedia for community-based annotation of biological databases.


  1. 1 2 Gaulton, A; et al. (2011). "ChEMBL: a large-scale bioactivity database for drug discovery". Nucleic Acids Research . 40 (Database issue): D1100–7. doi:10.1093/nar/gkr777. PMC   3245175 . PMID   21948594.
  2. "Open access drug discovery database launches with half a million compounds | Wellcome". 18 January 2010. Retrieved 31 August 2019.
  3. 1 2 3 Bender, A (2010). "Databases: Compound bioactivities go public". Nature Chemical Biology . 6 (5): 309. doi:10.1038/nchembio.354.
  4. Overington J (April 2009). "ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by Wendy A. Warr". J. Comput.-Aided Mol. Des. 23 (4): 195–8. Bibcode:2009JCAMD..23..195W. doi:10.1007/s10822-009-9260-9. PMID   19194660.
  5. Mok, N. Yi; Brenk, Ruth (Oct 24, 2011). "Mining the ChEMBL Database: An Efficient Chemoinformatics Workflow for Assembling an Ion Channel-Focused Screening Library". J. Chem. Inf. Model. 51 (10): 2449–2454. doi:10.1021/ci200260t. PMC   3200031 . PMID   21978256.
  6. Brenk, R; Schinpani, A; James, D; Krasowski, A (Mar 2008). "Lessons learnt from assembling screening libraries for drug discovery for neglected diseases". ChemMedChem . 3 (3): 435–44. doi:10.1002/cmdc.200700139. PMC   2628535 . PMID   18064617.
  7. ChEMBL-og (15 November 2010), ChEMBL_08 Released , retrieved 2010-11-15
  8. ChEMBL-og (6 June 2011), ChEMBL_10 Released , retrieved 2011-06-09
  9. Bellis, L J; et al. (2011). "Collation and data-mining of literature bioactivity data for drug discovery". Biochemical Society Transactions . 39 (5): 1365–1370. doi:10.1042/BST0391365. PMID   21936816.
  10. Davies, M; et al. (2015). "ADME SARfari: Comparative Genomics of Drug Metabolising Systems". Bioinformatics . 31 (10): 1695–1697. doi:10.1093/bioinformatics/btv010. PMC   4426839 . PMID   25964657 . Retrieved 2015-01-08.