Last updated
PubChem logo.svg
DescriptionChemicals and their bioassays
Organisms Humans and other animals
Research center NCBI
Primary citation PMID   15879180
Download URL FTP
Web service URL PUG-View [1]
License Public domain

PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database. [2]



PubChem was released in 2004 as a component of the Molecular Libraries Program (MLP) of the NIH. As of November 2015, PubChem contains more than 150 million depositor-provided substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results (from over 1 million assay experiments performed on more than 2 million small-molecules covering almost 10,000 unique protein target sequences that correspond to more than 5,000 genes). It also contains RNA interference (RNAi) screening assays that target over 15,000 genes. [3]

As of August 2018, PubChem contains 247.3 million substance descriptions, 96.5 million unique chemical structures, contributed by 629 data sources from 40 countries. It also contains 237 million bioactivity test results from 1.25 million biological assays, covering >10,000 target protein sequences. [4]

As of 2020, with data integration from over 100 new sources, PubChem contains more than 293 million depositor-provided substance descriptions, 111 million unique chemical structures, and 271 million bioactivity data points from 1.2 million biological assays experiments. [5]


PubChem consists of three dynamically growing primary databases. As of 5 November 2020 (number of BioAssays is unchanged):


Searching the databases is possible for a broad range of properties including chemical structure, name fragments, chemical formula, molecular weight, XLogP, and hydrogen bond donor and acceptor count.

PubChem contains its own online molecule editor with SMILES/SMARTS and InChI support that allows the import and export of all common chemical file formats to search for structures and fragments.

Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases like PubMed.

In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and the logical operators AND, OR, and NOT can be used. AND is assumed if no operator is used.

Example (Lipinski's Rule of Five):

0:500[mw] 0:5[hbdc] 0:10[hbac] -5:5[logp]

Database fields

Identification numbers
Identification number in current database[UID]
Substance identification number[SID]
Compound identification number[CID]
BioAssay identification number[BAID], [AID]

Any database field[ALL]
Deposition date[DDAT], [DEPDAT]
Depositor's external ID[SRID], [SRCID]
Source name[SRC], [SRCNAM], [SRCNAME]
Source release date[SRD], [SRDAT], [RLSDAT]
Medical Subject Heading (MeSH) term[MSHT], [MESHT]
MeSH tree node[MSHN], [MESHTN]
MeSH pharmacological actions[PHMA], [PHARMA]

Substance properties
Substance synonyms[SYNO]
International Chemical Identifier (InChI)[INCHI]
Molecular weight [MW], [MWT], [MOLWT]
Chemical elements [ELMT], [EL]
Non-Hydrogen atoms[HAC], [HACNT]
Isotope count[IAC], [IACNT]
Total formal charge [TFC], [CHG], [CHRG]
Chiral atom count[ACC], [ACCNT]
Defined chiral atom count[ACDC], [ACDCNT]
Undefined chiral atom count[ACUC], [ACUCNT]
Hydrogen bond acceptor count[HBAC], [HBACNT]
Hydrogen bond donor count[HBDC], [HBDCNT]
Tautomer count[TC], [TCNT], [TTMC]
Rotatable bond count[RBC], [RBCNT]
XLogP [11] [XLGP], [LOGP]

Compound properties
Compound synonyms[CSYN], [CSYNO]
Component count[CC], [CCNT]
Covalent unit (molecule) count[CUC], [CUCNT]
Total bioactivity count[TAC]

See also

Related Research Articles

<span class="mw-page-title-main">Organic compound</span> Chemical compound with carbon-hydrogen bonds

In chemistry, many authors consider an organic compound to be any chemical compound that contains carbon-hydrogen or carbon-carbon bonds, however, some authors consider an organic compound to be any chemical compound that contains carbon. The definition of "organic" versus "inorganic" varies from author to author, and is a topic of debate. For example, methane is considered organic, but whether some other carbon-containing compounds are organic or inorganic varies from author to author, for example halides of carbon without carbon-hydrogen and carbon-carbon bonds, and certain compounds of carbon with nitrogen and oxygen.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

<span class="mw-page-title-main">CAS Registry Number</span> Chemical identifier

A CAS Registry Number is a unique identification number assigned by the Chemical Abstracts Service (CAS) in the US to every chemical substance described in the open scientific literature. It includes all substances described from 1957 through the present, plus some substances from as far back as the early 1800s. It is a chemical database that includes organic and inorganic compounds, minerals, isotopes, alloys, mixtures, and nonstructurable materials. CAS RNs are generally serial numbers, so they do not contain any information about the structures themselves the way SMILES and InChI strings do.

Undecane (also known as hendecane) is a liquid alkane hydrocarbon with the chemical formula CH3(CH2)9CH3. It is used as a mild sex attractant for various types of moths and cockroaches, and an alert signal for a variety of ants. It has 159 isomers.

An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of a target entity. The measured entity is often called the analyte, the measurand, or the target of the assay. The analyte can be a drug, biochemical substance, chemical element or compound, or cell in an organism or organic sample. An assay usually aims to measure an analyte's intensive property and express it in the relevant measurement unit.

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

<span class="mw-page-title-main">Entrez</span> Cross-database search engine for health sciences

The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

The International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.

<span class="mw-page-title-main">Carbon tetrabromide</span> Chemical compound

Tetrabromomethane, CBr4, also known as carbon tetrabromide, is a bromide of carbon. Both names are acceptable under IUPAC nomenclature.

<span class="mw-page-title-main">2,2-Dimethylbutane</span> Chemical compound

2,2-Dimethylbutane, trivially known as neohexane, is an organic compound with formula C6H14 or (H3C-)3-C-CH2-CH3. It is therefore an alkane, indeed the most compact and branched of the hexane isomers — the only one with a quaternary carbon and a butane (C4) backbone.

ChemSpider is a freely accessible online database of chemicals owned by the Royal Society of Chemistry. It contains information on more than 100 million molecules from over 270 data sources, each of them receiving a unique identifier called ChemSpider Identifier.

<i>para</i>-Nitrophenylphosphate Chemical compound

para-Nitrophenylphosphate (pNPP) is a non-proteinaceous chromogenic substrate for alkaline and acid phosphatases used in ELISA and conventional spectrophotometric assays. Phosphatases catalyze the hydrolysis of pNPP liberating inorganic phosphate and the conjugate base of para-nitrophenol (pNP). The resulting phenolate is yellow, with a maximal absorption at 405 nm. This property can be used to determine the activity of various phosphatases including alkaline phosphatase (AP) and protein tyrosine phosphatase (PTP).

<span class="mw-page-title-main">Chemical similarity</span> Chemical term

Chemical similarity refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds.

<span class="mw-page-title-main">2-Acetylaminofluorene</span> Chemical compound

2-Acetylaminofluorene is a carcinogenic and mutagenic derivative of fluorene. It is used as a biochemical tool in the study of carcinogenesis. It induces tumors in a number of species in the liver, bladder and kidney. The metabolism of this compound in the body by means of biotransformation reactions is the key to its carcinogenicity. 2-AAF is a substrate for cytochrome P-450 (CYP) enzyme, which is a part of a super family found in almost all organisms. This reaction results in the formation of hydroxyacetylaminofluorene which is a proximal carcinogen and is more potent than the parent molecule. The N-hydroxy metabolite undergoes several enzymatic and non-enzymatic rearrangements. It can be O-acetylated by cytosolic N-acetyltransferase enzyme to yield N-acetyl-N-acetoxyaminofluorene. This intermediate can spontaneously rearrange to form the arylamidonium ion and a carbonium ion which can interact directly with DNA to produce DNA adducts. In addition to esterification by acetylation, the N-hydroxy derivative can be O-sulfated by cytosolic sulfur transferase enzyme giving rise to the N-acetyl-N-sulfoxy product.

<span class="mw-page-title-main">Acetoguanamine</span> Chemical compound

Acetoguanamine is an organic compound with the chemical formula (CNH2)2CCH3N3. It is related to melamine but with one amino group replaced by methyl. Acetoguanamine is used in the manufacturing of melamine resins. Unlike melamine ((CNH2)3N3), acetoguanamine is not a crosslinker. The "aceto" prefix is historical, the compound does not contain an acetyl group. A related compound is benzoguanamine.

<span class="mw-page-title-main">ChEMBL</span> Chemical database of bioactive molecules also having drug-like properties

ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug inducing properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.

<span class="mw-page-title-main">Triazane</span> Chemical compound

Triazane is an inorganic compound with the chemical formula NH2NHNH2 or N3H5. Triazane is the third simplest acyclic azane after ammonia and hydrazine. It can be synthesized from hydrazine but is unstable and cannot be isolated in the free base form, only as salt forms such as triazanium sulfate. Attempts to convert triazanium salts to the free base release only diazene and ammonia. Triazane was first synthesized as a ligand of the silver complex ion: tris(μ2-triazane-κ2N1,N3)disilver(2+). Triazane has also been synthesized in electron-irradiated ammonia ices and detected as a stable gas-phase product after sublimation.

<span class="mw-page-title-main">Biurea</span> Chemical compound

Biurea is a chemical compound with the molecular formula C2H6N4O2. It is produced in food products containing azodicarbonamide, a common ingredient in bread flour, when they are cooked. Upon exposure, biurea is rapidly eliminated from the body through excretion.

<span class="mw-page-title-main">CompTox Chemicals Dashboard</span> Chemical database

The CompTox Chemicals Dashboard is a freely accessible online database created and maintained by the U.S. Environmental Protection Agency (EPA). The database provides access to multiple types of data including physicochemical properties, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay. EPA and other scientists use the data and models contained within the dashboard to help identify chemicals that require further testing and reduce the use of animals in chemical testing. The Dashboard is also used to provide public access to information from EPA Action Plans, e.g. around perfluorinated alkylated substances.


  1. Kim, Sunghwan; Thiessen, Paul A.; Cheng, Tiejun; Zhang, Jian; Gindulyte, Asta; Bolton, Evan E. (9 August 2019). "PUG-View: programmatic access to chemical annotations integrated in PubChem". Journal of Cheminformatics. 11 (1): 56. doi: 10.1186/s13321-019-0375-2 . PMC   6688265 . PMID   31399858.
  2. "PubChem Source Information". The PubChem Project. USA: National Center for Biotechnology Information.
  3. Kim, Sunghwan; Thiessen, Paul A.; Cheng, Tiejun; Yu, Bo; Shoemaker, Benjamin A.; Wang, Jiyao; Bolton, Evan E.; Wang, Yanli; Bryant, Stephen H. (2016). "Literature information in PubChem: associations between PubChem records and scientific articles". Journal of Cheminformatics. 8: Article 32. doi: 10.1186/s13321-016-0142-6 . PMC   4901473 . PMID   27293485.
  4. 1 2 "Search Results for all compounds" . Retrieved 28 January 2016.
  5. 1 2 3 Kim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E (8 January 2021). "PubChem in 2021: new data content and improved web interfaces". Nucleic Acids Research. 49 (D1): D1388–D1395. doi: 10.1093/nar/gkaa971 . PMC   7778930 . PMID   33151290.
  6. "all[filt] - PubChem Compound Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  7. "all[filt] - PubChem Substance Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 28 January 2016.
  8. "all[filt] - PubChem Substance Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  9. "all[filt] - PubChem BioAssay Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 28 January 2016.
  10. "all[filt] - PubChem BioAssay Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  11. Cheng T (Nov 2007). "Computation of octanol-water partition coefficients by guiding an additive model with knowledge". Journal of Chemical Information and Modeling . 47 (6): 2140–2148. doi:10.1021/ci700257y. PMID   17985865.