Content | |
---|---|
Description | |
Contact | |
Research center | Cambridge Crystallographic Data Centre |
Access | |
Data format | .cif |
Website | |
Web service URL | www |
Tools | |
Web | WebCSD |
Standalone |
|
The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal-organic and organometallic molecules. The specific entries are complementary to the other crystallographic databases such as the Protein Data Bank (PDB), Inorganic Crystal Structure Database and International Centre for Diffraction Data. The data, typically obtained by X-ray crystallography and less frequently by electron diffraction or neutron diffraction, and submitted by crystallographers and chemists from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC, Repository [1] ). The CSD is overseen by the not-for-profit incorporated company called the Cambridge Crystallographic Data Centre, CCDC.
The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with Cambridge Crystallographic Data Centre (CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities. [2]
The CCDC grew out of the activities of the crystallography group led by Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).
The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK Science and Engineering Research Council. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries.
During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the Cambridge Crystallographic Data Centre (CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.
Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008 [3] to September 2017. [4] And most recently, Juergen Harter was appointed CEO in June 2018. [5]
CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.
Although the CCDC is a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).
The CCDC established US applications and support operations in the US in October 2013, [6] [7] initially at Rutgers, the State University of New Jersey, where it is co-located with the RCSB Protein Data Bank
The CSD is updated with about 50,000 new structures each year, [8] and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as a CSD Communication.
Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report. [9] As of January 2019 [update] , the summary statistics are as follows: [10]
Query | structures | % of CSD |
---|---|---|
Total # of structures | 995,907 | 100.0 |
# of different compounds | 900,984 | - |
# of literature sources | 2,004 | - |
Organic structures | 431,037 | 43.5 |
Transition metal present | 478,138 | 48.2 |
alkali or alkaline earth metal present | 48,056 | 4.8 |
main group metal present | 101,948 | 10.3 |
3D coordinates present | 937,809 | 94.6 |
Error-free coordinates | 926,422 | 98.81 |
Neutron studies | 2,142 | 0.2 |
Powder diffraction studies | 4,761 | 0.5 |
Low/high temp. studies | 503,368 | 50.8 |
Absolute configuration determined | 28,834 | 2.9 |
Disorder present in structure | 256,019 | 25.8 |
Polymorphic structures | 29,817 | 3.0 |
R-factor < 0.100 | 935,419 | 94.4 |
R-factor < 0.075 | 845,708 | 85.3 |
R-factor < 0.050 | 553,042 | 55.8 |
R-factor < 0.030 | 121,806 | 12.3 |
No. of atoms with 3D coordinates | 85,791,623 | - |
As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were: [11]
These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD.
These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.
The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.
Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009, [12] [13] [14] and the 1,000,000 structures milestone on June 8, 2019. [15] The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one.
Number of published structures per year | ||
Year | # published | Total |
---|---|---|
2018 | 53429 | 974,653 |
2017 | 55031 | 921,224 |
2016 | 54975 | 866,193 |
2015 | 53610 | 811,218 |
2014 | 50759 | 757,608 |
2013 | 48025 | 706,849 |
2012 | 45199 | 661,121 |
2011 | 43882 | 615,922 |
2010 | 41240 | 572,040 |
2009 | 40627 | 530,800 |
2008 | 36802 | 490,173 |
2007 | 36569 | 453,371 |
2006 | 34713 | 416,802 |
2005 | 31733 | 382,089 |
2004 | 27988 | 350,356 |
2003 | 26287 | 322,368 |
2002 | 24306 | 296,081 |
2001 | 21781 | 271,775 |
2000 | 19998 | 249,994 |
1999 | 18780 | 229,996 |
1998 | 17289 | 211,216 |
1997 | 15896 | 193,927 |
1996 | 15487 | 178,031 |
1995 | 13001 | 162,544 |
1994 | 12290 | 149,543 |
1993 | 12032 | 137,253 |
1992 | 10691 | 125,221 |
1991 | 9941 | 114,530 |
1990 | 8935 | 104,589 |
1989 | 7750 | 95,654 |
1988 | 7644 | 87,904 |
1987 | 7472 | 80,260 |
1986 | 6873 | 72,788 |
1985 | 6911 | 65,915 |
1984 | 6511 | 59,004 |
1983 | 5250 | 52,493 |
1982 | 5233 | 47,243 |
1981 | 4666 | 42,010 |
1980 | 4252 | 37,344 |
1979 | 3876 | 33,092 |
1978 | 3415 | 29,216 |
1977 | 3092 | 25,801 |
1976 | 2735 | 22,709 |
1975 | 2171 | 19,974 |
1974 | 2142 | 17,803 |
1973 | 1991 | 15,661 |
1972 | 1969 | 13,670 |
1971 | 1548 | 11,701 |
1970 | 1261 | 10,153 |
1969 | 1130 | 8,892 |
1968 | 975 | 7,762 |
1967 | 936 | 6,787 |
1966 | 683 | 5,851 |
1965 | 656 | 5,168 |
1923-1964 | 4512 | 4,512 |
Note: data for 1923-1964 are aggregated together in the last line of the table.
The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF. [16]
The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.
The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘CSD Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted. [8] The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.
Each data set in CSD can be openly viewed and retrieved using the free Access Structure service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription based CSD system.
Besides using the CSD system, the structure files may be viewed using one of several open source computer programs such as Jmol. Some other free, but not open source programs include MDL Chime, Pymol, UCSF Chimera, Rasmol, WINGX, [17] the CCDC provides a free version of its visualization program Mercury.
Starting from 2015, Mercury from CCDC also provides the functionality to generate 3D print ready file from structures in CSD. [18]
Crystallography is the experimental science of determining the arrangement of atoms in crystalline solids. Crystallography is a fundamental subject in the fields of materials science and solid-state physics. The word crystallography is derived from the Ancient Greek word κρύσταλλος, with its meaning extending to all solids with some degree of transparency, and γράφειν. In July 2012, the United Nations recognised the importance of the science of crystallography by proclaiming that 2014 would be the International Year of Crystallography.
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their crystallographic disorder, and various other information.
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations. The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.
A chemical structure of a molecule is a spatial arrangement of its atoms and their chemical bonds. Its determination includes a chemist's specifying the molecular geometry and, when feasible and necessary, the electronic structure of the target molecule or other solid. Molecular geometry refers to the spatial arrangement of atoms in a molecule and the chemical bonds that hold the atoms together and can be represented using structural formulae and by molecular models; complete electronic structure descriptions include specifying the occupation of a molecule's molecular orbitals. Structure determination can be applied to a range of targets from very simple molecules to very complex ones.
Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.
The International Centre for Diffraction Data (ICDD) maintains a database of powder diffraction patterns, the Powder Diffraction File (PDF), including the d-spacings and relative intensities of observable diffraction peaks. Patterns may be experimentally determined, or computed based on crystal structure and Bragg's law. It is most often used to identify substances based on x-ray diffraction data, and is designed for use with a diffractometer. The PDF contains more than a million unique material data sets. Each data set contains diffraction, crystallographic and bibliographic data, as well as experimental, instrument and sampling conditions, and select physical properties in a common standardized format.
Olga Kennard, Lady Burgen was a Hungarian-born British scientist who specialised in crystallography. She was the founder of the Cambridge Crystallographic Data Centre.
In crystallography, the R-factor is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data. In other words, it is a measure of how well the refined structure predicts the observed data. The value is also sometimes called the discrepancy index, as it mathematically describes the difference between the experimental observations and the ideal calculated values. It is defined by the following equation:
The Cambridge Crystallographic Data Centre (CCDC) is a non-profit organisation based in Cambridge, England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database, a database of small molecule crystal structures. They also perform analysis on the database for the benefit of the scientific community, and write and distribute computer software to allow others to do the same.
Acta Crystallographica is a series of peer-reviewed scientific journals, with articles centred on crystallography, published by the International Union of Crystallography (IUCr). Originally established in 1948 as a single journal called Acta Crystallographica, there are now six independent Acta Crystallographica titles:
A crystallographic database is a database specifically designed to store information about the structure of molecules and crystals. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or molecules. They are characterized by symmetry, morphology, and directionally dependent physical properties. A crystal structure describes the arrangement of atoms, ions, or molecules in a crystal.
Tetrabutylammonium tribromide, abbreviated to TBATB, is a pale orange solid with the formula [N(C4H9)4]Br3. It is a salt of the lipophilic tetrabutylammonium cation and the linear tribromide anion. The salt is sometimes used as a reagent used in organic synthesis as a conveniently weighable, solid source of bromine.
Molecular models of DNA structures are representations of the molecular geometry and topology of deoxyribonucleic acid (DNA) molecules using one of several means, with the aim of simplifying and presenting the essential, physical and chemical, properties of DNA molecular structures either in vivo or in vitro. These representations include closely packed spheres made of plastic, metal wires for skeletal models, graphic computations and animations by computers, artistic rendering. Computer molecular models also allow animations and molecular dynamics simulations that are very important for understanding how DNA functions in vivo.
tert-Butyl bromide (also referred to as 2-bromo-2-methylpropane) is an organic compound with the formula Me3CBr (Me = methyl). The molecule features a tert-butyl group attached to a bromide substituent. This organobromine compound is used as a standard reagent in synthetic organic chemistry. It is a colorless liquid.
Nuclear magnetic resonance crystallography is a method which utilizes primarily NMR spectroscopy to determine the structure of solid materials on the atomic scale. Thus, solid-state NMR spectroscopy would be used primarily, possibly supplemented by quantum chemistry calculations, powder diffraction etc. If suitable crystals can be grown, any crystallographic method would generally be preferred to determine the crystal structure comprising in case of organic compounds the molecular structures and molecular packing. The main interest in NMR crystallography is in microcrystalline materials which are amenable to this method but not to X-ray, neutron and electron diffraction. This is largely because interactions of comparably short range are measured in NMR crystallography.
Frank Harmsworth Allen FRSC CChem (1944–2014) was an internationally recognised crystallographer.
This is a timeline of crystallography.
Mercury is a freeware developed by the Cambridge Crystallographic Data Centre, originally designed as a crystal structure visualization tool. Mercury helps three dimensional visualization of crystal structure and assists in drawing and analysis of crystal packing and intermolecular interactions. Current version Mercury can read "cif", ".mol", ".mol2", ".pdb", ".res", ".sd" and ".xyz" types of files. Mercury has its own file format with filename extension ".mryx".
Susan Reutzel-Edens is an American chemist who is the Head of Science at the Cambridge Crystallographic Data Centre. Her work considers solid state chemistry and pharmaceuticals. She is interested in crystal structure predictions. She serves on the editorial boards of CrystEngComm and Crystal Growth & Design.
Alexandra Martha Zoya Slawin is a British chemist and Professor at the University of St Andrews. Her research looks to understand the structure of supramolecular systems. She is generally considered as one of the world's leading crystallographers. She was elected Fellow of the Royal Society of Edinburgh in 2011.