Last updated
PubChem logo.svg
DescriptionChemicals and their bioassays
Organisms Humans and other animals
Research center NCBI
Primary citation PMID   15879180
Download URL FTP
Web service URL PUG-View [1]
License Public domain

PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains substance descriptions and small molecules with fewer than 1000 atoms and 1000 bonds. More than 80 database vendors contribute to the growing PubChem database. [2]



PubChem consists of three dynamically growing primary databases. As of 1 November 2017:


Searching the databases is possible for a broad range of properties including chemical structure, name fragments, chemical formula, molecular weight, XLogP, and hydrogen bond donor and acceptor count.

PubChem contains its own online molecule editor with SMILES/SMARTS and InChI support that allows the import and export of all common chemical file formats to search for structures and fragments.

Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases like PubMed.

In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and the logical operators AND, OR, and NOT can be used. AND is assumed if no operator is used.

Example (Lipinski's Rule of Five):

0:500[mw] 0:5[hbdc] 0:10[hbac] -5:5[logp]


PubChem was released in 2004. [9]

ACS's concerns

The American Chemical Society has raised concerns about the publicly supported PubChem database, since it appears to directly compete with their existing Chemical Abstracts Service. [10] They have a strong interest in the issue since the Chemical Abstracts Service generates a large percentage of the society's revenue. To advocate their position against the PubChem database, ACS has actively lobbied the US Congress.

Soon after PubChem's creation, the American Chemical Society lobbied U.S. Congress to restrict the operation of PubChem, which they asserted competes with their Chemical Abstracts Service. [11]

Database fields

Identification numbers
Identification number in current database[UID]
Substance identification number[SID]
Compound identification number[CID]
BioAssay identification number[BAID], [AID]

Any database field[ALL]
Deposition date[DDAT], [DEPDAT]
Depositor's external ID[SRID], [SRCID]
Source name[SRC], [SRCNAM], [SRCNAME]
Source release date[SRD], [SRDAT], [RLSDAT]
Medical Subject Heading (MeSH) term[MSHT], [MESHT]
MeSH tree node[MSHN], [MESHTN]
MeSH pharmacological actions[PHMA], [PHARMA]

Substance properties
Substance synonyms[SYNO]
International Chemical Identifier (InChI)[INCHI]
Molecular weight [MW], [MWT], [MOLWT]
Chemical elements [ELMT], [EL]
Non-Hydrogen atoms[HAC], [HACNT]
Isotope count[IAC], [IACNT]
Total formal charge [TFC], [CHG], [CHRG]
Chiral atom count[ACC], [ACCNT]
Defined chiral atom count[ACDC], [ACDCNT]
Undefined chiral atom count[ACUC], [ACUCNT]
Hydrogen bond acceptor count[HBAC], [HBACNT]
Hydrogen bond donor count[HBDC], [HBDCNT]
Tautomer count[TC], [TCNT], [TTMC]
Rotatable bond count[RBC], [RBCNT]
XLogP [12] [XLGP], [LOGP]

Compound properties
Compound synonyms[CSYN], [CSYNO]
Component count[CC], [CCNT]
Covalent unit (molecule) count[CUC], [CUCNT]
Total bioactivity count[TAC]

See also

Related Research Articles

National Center for Biotechnology Information Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.

A CAS Registry Number, also referred to as CASRN or CAS Number, is a unique numerical identifier assigned by the Chemical Abstracts Service (CAS) to every chemical substance described in the open scientific literature, including organic and inorganic compounds, minerals, isotopes, alloys and nonstructurable materials. CASRNs are generally serial numbers, so they do not contain any information about the structures themselves the way SMILES and InChI strings do.

1-Hexanol is an organic alcohol with a six-carbon chain and a condensed structural formula of CH3(CH2)5OH. This colorless liquid is slightly soluble in water, but miscible with diethyl ether and ethanol. Two additional straight chain isomers of 1-hexanol, 2-hexanol and 3-hexanol, exist, both of which differing by the location of the hydroxyl group. Many isomeric alcohols have the formula C6H13OH. It is used in the perfume industry.

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

Entrez cross-database search engine, or web portal

The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

Dimethylmercury chemical compound

Dimethylmercury ((CH3)2Hg) is an organomercury compound. A highly volatile, reactive, flammable, and colorless liquid, dimethylmercury is one of the strongest known neurotoxins, with a quantity of less than 0.1 mL capable of inducing severe mercury poisoning, and is easily absorbed through the skin. Dimethylmercury is capable of permeating many materials, including plastic and rubber compounds. It has a slightly sweet odor, although inhaling enough of the chemical to notice this would be hazardous.

Chemical Abstracts Service Division of the American Chemical Society

Chemical Abstracts Service (CAS) is a division of the American Chemical Society. It is a source of chemical information. CAS is located in Columbus, Ohio, United States.

The IUPAC International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by IUPAC and NIST from 2000 to 2005, the format and algorithms are non-proprietary.

Butyronitrile chemical compound

Butyronitrile or butanenitrile or propyl cyanide, is a nitrile with the formula C3H7CN. This colorless liquid is miscible with most polar organic solvents.

3-Ethylpentane chemical compound

3-Ethylpentane (C7H16) is a branched saturated hydrocarbon. It is an alkane, and one of the many structural isomers of heptane, consisting of a five carbon chain with a two carbon branch at the middle carbon.

ChemSpider database of chemicals owned by the Royal Society of Chemistry; see P661

ChemSpider is a database of chemicals. ChemSpider is owned by the Royal Society of Chemistry.

Acibenzolar chemical compound

Acibenzolar is a chemical compound used as a fungicide, and is closely related to the methyl derivative acibenzolar-S-methyl.

2-Acetylaminofluorene chemical compound

2-Acetylaminofluorene is a carcinogenic and mutagenic derivative of fluorene. It is used as a biochemical tool in the study of carcinogenesis. It induces tumors in a number of species in the liver, bladder and kidney. The metabolism of this compound in the body by means of biotransformation reactions is the key to its carcinogenicity. 2-AAF is a substrate for cytochrome P-450 (CYP) enzyme, which is a part of a super family found in almost all organisms. This reaction results in the formation of hydroxyacetylaminofluorene which is a proximal carcinogen and is more potent than the parent molecule. The N-hydroxy metabolite undergoes several enzymatic and non-enzymatic rearrangements. It can be O-acetylated by cytosolic N-acetyltransferase enzyme to yield N-acetyl-N-acetoxyaminofluorene. This intermediate can spontaneously rearrange to form the arylamidonium ion and a carbonium ion which can interact directly with DNA to produce DNA adducts. In addition to esterification by acetylation, the N-hydroxy derivative can be O-sulfated by cytosolic sulfur transferase enzyme giving rise to the N-acetyl-N-sulfoxy product.

3-Methylhexane chemical compound

3-Methylhexane is a branched hydrocarbon with two enantiomers. It is one of the isomers of heptane.

Glycolonitrile, also called hydroxyacetonitrile or formaldehyde cyanohydrin, is the organic compound with the formula HOCH2CN. It is the simplest cyanohydrin and it is derived from formaldehyde. It is a colourless liquid that dissolves in water and ether. Because glycolonitrile decomposes readily into formaldehyde and hydrogen cyanide, it is listed as an extremely hazardous substance. In January 2019, astronomers reported the detection of glycolonitrile, another possible building block of life among other such molecules, in outer space.

ChEMBL chemical database of bioactive molecules with drug-like properties

ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.

Orthoacetic acid chemical compound

Orthoacetic acid or ethane-1,1,1-triol is an hypothetical organic compound with formula C
or H3C-C(OH)3. It would be an ortho acid with the ethane backbone.

Triazane chemical compound

Triazane is an inorganic compound with the chemical formula NH
or N
. Triazane is the third simplest acyclic azane after ammonia and hydrazine. It can be synthesized from hydrazine but is unstable and cannot be isolated in the free base form, only as salt forms such as triazanium sulfate. Attempts to convert triazanium salts to the free base release only diazene and ammonia. Triazane was first synthesized as a ligand of the silver complex ion: tris(μ2-triazane-κ2N1,N3)disilver(2+). Triazane has also been synthesized in electron-irradiated ammonia ices and detected as a stable gas-phase product after sublimation.

Methacrolein diacetate chemical compound

Methacrolein diacetate is a chemical compound with the molecular formula C8H12O4 and a molecular weight of 172.17848. It is a colorless liquid. It is listed as an extremely hazardous substance by the Emergency Planning and Community Right-to-Know Act, and the National Institute of Health identifies it as "an irritant of the eyes, skin, and respiratory tract."

CompTox Chemicals Dashboard chemical database

The CompTox Chemicals Dashboard is a freely accessible online database created and maintained by the U.S. Environmental Protection Agency (EPA). The database provides access to multiple types of data including physicochemical properties, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay. EPA and other scientists use the data and models contained within the dashboard to help identify chemicals that require further testing and reduce the use of animals in chemical testing. The Dashboard is also used to provide public access to information from EPA Action Plans, e.g. around perfluorinated alkylated substances.,


  1. Kim, Sunghwan; Thiessen, Paul A.; Cheng, Tiejun; Zhang, Jian; Gindulyte, Asta; Bolton, Evan E. (9 August 2019). "PUG-View: programmatic access to chemical annotations integrated in PubChem". Journal of Cheminformatics. 11 (1). doi: 10.1186/s13321-019-0375-2 .
  2. "PubChem Source Information". The PubChem Project. USA: National Center for Biotechnology Information.
  3. "Search Results for all compounds" . Retrieved 28 January 2016.
  4. "all[filt] - PubChem Compound Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  5. "all[filt] - PubChem Substance Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 28 January 2016.
  6. "all[filt] - PubChem Substance Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  7. "all[filt] - PubChem BioAssay Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 28 January 2016.
  8. "all[filt] - PubChem BioAssay Results". The PubChem Project. USA: National Center for Biotechnology Information. Retrieved 7 January 2011.
  9. "About PubChem" . Retrieved 3 May 2014.
  10. Kaiser J (May 2005). "Science resources. Chemists want NIH to curtail database". Science . 308 (5723): 774. doi:10.1126/science.308.5723.774a. PMID   15879180.
  11. "PubChem and the American Chemical Society". Reshaping Scholarly Communication. USA: University of California. 2005-05-31. Retrieved 2018-10-15.
  12. Cheng T (Nov 2007). "Computation of octanol-water partition coefficients by guiding an additive model with knowledge". Journal of Chemical Information and Modeling . 47 (6): 2140–2148. doi:10.1021/ci700257y. PMID   17985865.