Cambridge Structural Database

Last updated
Cambridge Structural Database
Database.png
Content
Description
Contact
Research center Cambridge Crystallographic Data Centre
Access
Data format .cif
Website
Web service URL www.ccdc.cam.ac.uk/structures
Tools
Web WebCSD
Standalone
  • CSD System
  • CSD (the database)
  • ConQuest
  • Mercury
  • IsoStar
  • Mogul
  • GOLD
  • CSD-CrossMiner

The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal-organic and organometallic molecules. The specific entries are complementary to the other crystallographic databases such as the Protein Data Bank (PDB), Inorganic Crystal Structure Database and International Centre for Diffraction Data. The data, typically obtained by X-ray crystallography and less frequently by electron diffraction or neutron diffraction, and submitted by crystallographers and chemists from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC, Repository [1] ). The CSD is overseen by the not-for-profit incorporated company called the Cambridge Crystallographic Data Centre, CCDC.

Contents

The inside of the CCDC headquarters Cambridge, UK The inside of the CCDC headquarters Cambridge, UK.jpg
The inside of the CCDC headquarters Cambridge, UK

The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with Cambridge Crystallographic Data Centre (CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities. [2]

History

The CCDC grew out of the activities of the crystallography group led by Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).

The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK Science and Engineering Research Council. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries.

During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the Cambridge Crystallographic Data Centre (CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.

Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008 [3] to September 2017. [4] And most recently, Juergen Harter was appointed CEO in June 2018. [5]

CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.

Although the CCDC is a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).

The CCDC established US applications and support operations in the US in October 2013, [6] [7] initially at Rutgers, the State University of New Jersey, where it is co-located with the RCSB Protein Data Bank

Contents

One Millionth Structure Added to CSD, CSD ID: XOPCAJ XOPCAJ.jpg
One Millionth Structure Added to CSD, CSD ID: XOPCAJ

The CSD is updated with about 50,000 new structures each year, [8] and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as a CSD Communication.

Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report. [9] As of January 2019, the summary statistics are as follows: [10]

Querystructures% of CSD
Total # of structures995,907100.0
# of different compounds900,984-
# of literature sources2,004-
Organic structures431,03743.5
Transition metal present478,13848.2
alkali or alkaline earth metal present48,0564.8
main group metal present101,94810.3
3D coordinates present937,80994.6
Error-free coordinates926,42298.81
Neutron studies2,1420.2
Powder diffraction studies4,7610.5
Low/high temp. studies503,36850.8
Absolute configuration determined28,8342.9
Disorder present in structure256,01925.8
Polymorphic structures29,8173.0
R-factor < 0.100935,41994.4
R-factor < 0.075845,70885.3
R-factor < 0.050553,04255.8
R-factor < 0.030121,80612.3
No. of atoms with 3D coordinates85,791,623-

As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were: [11]

1. 73,070 structures were reported in Inorg. Chem.
2. 62,072 structures were reported in Dalton & J. Chem. Soc., Dalton Trans.
3. 54,160 structures were reported in Organometallics
4. 48,967 structures were reported in J. Am. Chem. Soc.
5. 42,422 structures were reported in Acta Crystallogr. Sect. E
6. 32,610 structures were reported in Chem. Eur. J.
7. 29,790 structures were reported in J. Organomet. Chem.
8. 29,640 structures were reported in Angew. Chem. Int. Ed.
9. 28,682 structures were reported in Inorg. Chim. Acta
10. 28,351 structures were reported in Chem. Commun. & J. Chem. Soc.
11. 27,328 structures were reported in CSD Communications
12. 26,774 structures were reported in Acta Crystallogr. Sect. C
13. 26,734 structures were reported in Polyhedron
14. 24,045 structures were reported in Eur. J. Inorg. Chem.
15. 23,483 structures were reported in J. Org. Chem.
16. 22,286 structures were reported in Cryst. Growth Des.
17. 22,011 structures were reported in CrystEngComm
18. 15,985 structures were reported in Organic Letters
19. 15,424 structures were reported in Z. Anorg. Allg. Chem.
20. 14,864 structures were reported in Acta Crystallogr. Sect. B
21. 13,909 structures were reported in Tetrahedron 8,597 structures were reported as Private Communication to the CSD
22. 12,734 structures were reported in J. Mol. Struct.
23. 11,234 structures were reported in Tetrahedron Lett.
24. 9,150 structures were reported in Eur. J. Org. Chem.
25. 8,789 structures were reported in New Journal of Chemistry

These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD.

These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.

The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.

Growth trend

Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009, [12] [13] [14] and the 1,000,000 structures milestone on June 8, 2019. [15] The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one.

Growth Trend of Structure in CSD from 1965 - 2018 Growth Trend of Structure in CSD.svg
Growth Trend of Structure in CSD from 1965 - 2018
Number of published structures per year
Year# publishedTotal
201853429974,653
201755031921,224
201654975866,193
201553610811,218
201450759757,608
201348025706,849
201245199661,121
201143882615,922
201041240572,040
200940627530,800
200836802490,173
200736569453,371
200634713416,802
200531733382,089
200427988350,356
200326287322,368
200224306296,081
200121781271,775
200019998249,994
199918780229,996
199817289211,216
199715896193,927
199615487178,031
199513001162,544
199412290149,543
199312032137,253
199210691125,221
19919941114,530
19908935104,589
1989775095,654
1988764487,904
1987747280,260
1986687372,788
1985691165,915
1984651159,004
1983525052,493
1982523347,243
1981466642,010
1980425237,344
1979387633,092
1978341529,216
1977309225,801
1976273522,709
1975217119,974
1974214217,803
1973199115,661
1972196913,670
1971154811,701
1970126110,153
196911308,892
19689757,762
19679366,787
19666835,851
19656565,168
1923-196445124,512

Note: data for 1923-1964 are aggregated together in the last line of the table.

File format

3D printed model of Benzoic Acid, taken from a crystal structure determination, created using coordinates from the Cambridge Structural Database, and via the CCDC program Mercury. The top model shows a single molecule of benzoic acid. The bottom model shows a hydrogen-bonded dimer. BENZAC12.jpg
3D printed model of Benzoic Acid, taken from a crystal structure determination, created using coordinates from the Cambridge Structural Database, and via the CCDC program Mercury. The top model shows a single molecule of benzoic acid. The bottom model shows a hydrogen-bonded dimer.

The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF. [16]

The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.

The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘CSD Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted. [8] The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.

Viewing the data

3D printed model of 1-methyl-2,3,4,5-tetrakis((trimethylsilyl)ethynyl)-1H-pyrrole structure. CSD Identifier: XURZAN XURZAN.jpg
3D printed model of 1-methyl-2,3,4,5-tetrakis((trimethylsilyl)ethynyl)-1H-pyrrole structure. CSD Identifier: XURZAN

Each data set in CSD can be openly viewed and retrieved using the free Access Structure service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription based CSD system.

Besides using the CSD system, the structure files may be viewed using one of several open source computer programs such as Jmol. Some other free, but not open source programs include MDL Chime, Pymol, UCSF Chimera, Rasmol, WINGX, [17] the CCDC provides a free version of its visualization program Mercury.

Starting from 2015, Mercury from CCDC also provides the functionality to generate 3D print ready file from structures in CSD. [18]

See also

Related Research Articles

<span class="mw-page-title-main">Crystallography</span> Scientific study of crystal structures

Crystallography is the experimental science of determining the arrangement of atoms in crystalline solids. Crystallography is a fundamental subject in the fields of materials science and solid-state physics. The word crystallography is derived from the Ancient Greek word κρύσταλλος, with its meaning extending to all solids with some degree of transparency, and γράφειν. In July 2012, the United Nations recognised the importance of the science of crystallography by proclaiming that 2014 would be the International Year of Crystallography.

<span class="mw-page-title-main">X-ray crystallography</span> Technique used for determining crystal structures and identifying mineral compounds

X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their crystallographic disorder, and various other information.

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations. The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

<span class="mw-page-title-main">Chemical structure</span> Organized way in which molecules are ordered and sorted

A chemical structure of a molecule is a spatial arrangement of its atoms and their chemical bonds. Its determination includes a chemist's specifying the molecular geometry and, when feasible and necessary, the electronic structure of the target molecule or other solid. Molecular geometry refers to the spatial arrangement of atoms in a molecule and the chemical bonds that hold the atoms together and can be represented using structural formulae and by molecular models; complete electronic structure descriptions include specifying the occupation of a molecule's molecular orbitals. Structure determination can be applied to a range of targets from very simple molecules to very complex ones.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">International Centre for Diffraction Data</span>

The International Centre for Diffraction Data (ICDD) maintains a database of powder diffraction patterns, the Powder Diffraction File (PDF), including the d-spacings and relative intensities of observable diffraction peaks. Patterns may be experimentally determined, or computed based on crystal structure and Bragg's law. It is most often used to identify substances based on x-ray diffraction data, and is designed for use with a diffractometer. The PDF contains more than a million unique material data sets. Each data set contains diffraction, crystallographic and bibliographic data, as well as experimental, instrument and sampling conditions, and select physical properties in a common standardized format.

<span class="mw-page-title-main">Olga Kennard</span> Hungarian-born British crystallographer (1924–2023)

Olga Kennard, Lady Burgen was a Hungarian-born British scientist who specialised in crystallography. She was the founder of the Cambridge Crystallographic Data Centre.

In crystallography, the R-factor is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data. In other words, it is a measure of how well the refined structure predicts the observed data. The value is also sometimes called the discrepancy index, as it mathematically describes the difference between the experimental observations and the ideal calculated values. It is defined by the following equation:

<span class="mw-page-title-main">Cambridge Crystallographic Data Centre</span> Crystallographic organisation based in Cambridge, England.

The Cambridge Crystallographic Data Centre (CCDC) is a non-profit organisation based in Cambridge, England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database, a database of small molecule crystal structures. They also perform analysis on the database for the benefit of the scientific community, and write and distribute computer software to allow others to do the same.

Acta Crystallographica is a series of peer-reviewed scientific journals, with articles centred on crystallography, published by the International Union of Crystallography (IUCr). Originally established in 1948 as a single journal called Acta Crystallographica, there are now six independent Acta Crystallographica titles:

A crystallographic database is a database specifically designed to store information about the structure of molecules and crystals. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or molecules. They are characterized by symmetry, morphology, and directionally dependent physical properties. A crystal structure describes the arrangement of atoms, ions, or molecules in a crystal.

<span class="mw-page-title-main">Tetrabutylammonium tribromide</span> Chemical compound

Tetrabutylammonium tribromide, abbreviated to TBATB, is a pale orange solid with the formula [N(C4H9)4]Br3. It is a salt of the lipophilic tetrabutylammonium cation and the linear tribromide anion. The salt is sometimes used as a reagent used in organic synthesis as a conveniently weighable, solid source of bromine.

<span class="mw-page-title-main">Molecular models of DNA</span>

Molecular models of DNA structures are representations of the molecular geometry and topology of deoxyribonucleic acid (DNA) molecules using one of several means, with the aim of simplifying and presenting the essential, physical and chemical, properties of DNA molecular structures either in vivo or in vitro. These representations include closely packed spheres made of plastic, metal wires for skeletal models, graphic computations and animations by computers, artistic rendering. Computer molecular models also allow animations and molecular dynamics simulations that are very important for understanding how DNA functions in vivo.

tert-Butyl bromide (also referred to as 2-bromo-2-methylpropane) is an organic compound with the formula Me3CBr (Me = methyl). The molecule features a tert-butyl group attached to a bromide substituent. This organobromine compound is used as a standard reagent in synthetic organic chemistry. It is a colorless liquid.

Nuclear magnetic resonance crystallography is a method which utilizes primarily NMR spectroscopy to determine the structure of solid materials on the atomic scale. Thus, solid-state NMR spectroscopy would be used primarily, possibly supplemented by quantum chemistry calculations, powder diffraction etc. If suitable crystals can be grown, any crystallographic method would generally be preferred to determine the crystal structure comprising in case of organic compounds the molecular structures and molecular packing. The main interest in NMR crystallography is in microcrystalline materials which are amenable to this method but not to X-ray, neutron and electron diffraction. This is largely because interactions of comparably short range are measured in NMR crystallography.

Frank Harmsworth Allen FRSC CChem (1944–2014) was an internationally recognised crystallographer.

This is a timeline of crystallography.

<span class="mw-page-title-main">Mercury (crystallography)</span>

Mercury is a freeware developed by the Cambridge Crystallographic Data Centre, originally designed as a crystal structure visualization tool. Mercury helps three dimensional visualization of crystal structure and assists in drawing and analysis of crystal packing and intermolecular interactions. Current version Mercury can read "cif", ".mol", ".mol2", ".pdb", ".res", ".sd" and ".xyz" types of files. Mercury has its own file format with filename extension ".mryx".

Susan Reutzel-Edens is an American chemist who is the Head of Science at the Cambridge Crystallographic Data Centre. Her work considers solid state chemistry and pharmaceuticals. She is interested in crystal structure predictions. She serves on the editorial boards of CrystEngComm and Crystal Growth & Design.

Alexandra Martha Zoya Slawin is a British chemist and Professor at the University of St Andrews. Her research looks to understand the structure of supramolecular systems. She is generally considered as one of the world's leading crystallographers. She was elected Fellow of the Royal Society of Edinburgh in 2011.

References

  1. "CCDC CIF Depository Request Form". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16.
  2. "CCDC Homepage". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16.
  3. Groom C, Allen F (July 2009). "CCDC well groomed: an interview with Colin Groom, Executive Director, Cambridge Crystallographic Data Centre, and Frank Allen, Emeritus Fellow". Journal of Computer-Aided Molecular Design. 23 (7): 391–4. Bibcode:2009JCAMD..23..391W. doi:10.1007/s10822-009-9272-5. PMID   19421719.
  4. "Announcement from the Chair, on behalf of Trustees". The Cambridge Crystallographic Data Centre. September 11, 2017. Retrieved 2019-05-15.
  5. "The CCDC welcomes Jürgen Harter as CEO". The Cambridge Crystallographic Data Centre (CCDC). June 11, 2018. Retrieved 2019-05-15.
  6. "CCDC opens US operations". The Cambridge Crystallographic Data Centre (CCDC). October 30, 2013. Retrieved 2019-05-15.
  7. "The Cambridge Crystallographic Data Centre Establishes U.S. Operations in New Partnership with Rutgers' Center for Integrative Proteomics Research". Rutgers Office of Research and Economic Development. Retrieved May 15, 2019.
  8. 1 2 Bruno IJ, Groom CR (October 2014). "A crystallographic perspective on sharing data and knowledge". Journal of Computer-Aided Molecular Design. 28 (10): 1015–22. Bibcode:2014JCAMD..28.1015B. doi:10.1007/s10822-014-9780-9. PMC   4196029 . PMID   25091065.
  9. "CSD Entries: Summary Statistics" (PDF). Cambridge Crystallographic Data Centre. Archived from the original (PDF) on 2014-06-11. Retrieved 2014-09-16.
  10. "CSD Entries: Summary Statistics" (PDF). Cambridge Structural Database. January 1, 2019. Retrieved May 15, 2019.
  11. 1 2 "CSD Journal Statistics" (PDF). Cambridge Structural Database. January 1, 2019. Retrieved May 16, 2019.
  12. Groom CR, Allen FH (January 2014). "The Cambridge Structural Database in retrospect and prospect". Angewandte Chemie. 53 (3): 662–71. doi: 10.1002/anie.201306438 . PMID   24382699.
  13. "Growth of the Cambridge Structural Database (CSD) since 1970". CCDC. Retrieved 2014-09-16.
  14. "CSD Statistics". The Cambridge Crystallographic Data Centre (CCDC). Retrieved 2019-05-17.
  15. Robinson, Philip; Withers, Neil; Pink, Chris; Valsler, Ben. "The Cambridge Structural Database hits one million structures". Chemistry World. Retrieved 2019-06-07.
  16. Hall SR, Allen FH, Brown ID (1991). "The Crystallographic Information File (CIF): a new standard archive file for crystallography". Acta Crystallographica. A47 (6): 655–685. doi: 10.1107/S010876739101067X .
  17. Farrugia LJ (1 August 1999). "WinGX suite for small-molecule single-crystal crystallography". Journal of Applied Crystallography. 32 (4): 837–838. doi:10.1107/S0021889899006020.
  18. "3D Printing: Easy as 1, 2, 3!". The Cambridge Crystallographic Data Centre (CCDC). August 19, 2015. Retrieved 2019-05-18.