COSMIC cancer database

Last updated

COSMIC
COSMIC cancer database logo.png
Content
DescriptionCatalogue Of Somatic Mutations In Cancer
Contact
Research center Wellcome Trust Sanger Institute
Release date4 February 2004
Access
Website http://www.sanger.ac.uk/science/tools/cosmic

COSMIC is an online database of somatically acquired mutations found in human cancer. [1] Somatic mutations are those that occur in non-germline cells that are not inherited by children. COSMIC, an acronym of Catalogue Of Somatic Mutations In Cancer, curates data from papers in the scientific literature and large scale experimental screens from the Cancer Genome Project at the Sanger Institute. [2] [3] [4] The database is freely available to academic researchers and commercially licensed to others. [5]

Contents

Creation and history

The COSMIC (Catalogue of Somatic Mutations in Cancer) database was designed to collect and display information on somatic mutations in cancer. It was launched in 2004, with data from just four genes, HRAS, KRAS2, NRAS and BRAF. [6] These four genes are known to be somatically mutated in cancer. Since its creation, the database has expanded rapidly. By 2005 COSMIC contained 529 genes screened from 115,327 tumours, describing 20,981 mutations. [7] By August 2009 it contained information from 1.5 million experiments performed, encompassing 13,423 genes in almost 370,000 tumours and describing over 90,000 mutations. [8] COSMIC version 48, released in July 2010, incorporates mutation data from p53 in collaboration with the International Agency for Research on Cancer. [9] In addition, it provided updated gene co-ordinates for the most recent human reference genome builds. This release includes data from over 2.76 million experiments on over half a million tumours. [9] The number of mutations documented in this release totals 141,212. [9]

The website is focused on presenting complex phenotype-specific mutation data in a graphical manner. Data is taken from selected genes, initially in the Cancer Gene Census, as well as literature search from PubMed.

Process

Data can be accessed via selection of a gene or cancer tissue type (phenotype), either using browse by features or the search box. Results show summary information with mutation counts and frequencies. The gene summary page provides a mutation spectrum map and external resources; the phenotype (tissue) summary page provides lists of mutated genes.

Examples

A histogram showing the mutation range for the CDKN2A gene as produced by the COSMIC database. CDKN2A COSMIC histogram.png
A histogram showing the mutation range for the CDKN2A gene as produced by the COSMIC database.

The figure shows the CDKN2A gene, which is a tumor suppressor that leads to cancer when it is inactivated.

Contents

The COSMIC database contains thousands of somatic mutations that are implicated in the development of cancer. The database collects information from two major sources. Firstly, mutations in known cancer genes are collected from the literature. The list of genes that undergo manual curation are identified by their presence in the Cancer Gene Census. [10] [11] Secondly, data for inclusion in the database is collected from whole genome resequencing studies of cancer samples undertaken by the Cancer Genome Project. [8] For example, Campbell and colleagues used next generation sequencing to examine samples from two individuals with lung cancer which led to the identification of 103 somatic DNA rearrangements. [12] COSMIC also catalogues mutational signatures in human cancer through the COSMIC Signatures group, which represents a collaboration between COSMIC, the Wellcome Sanger Institute, and the University of California, San Diego. The COSMIC signatures database has been leveraged to catalogue the prevalence of specific mutational signatures in human cancer, such as the frequency of ultraviolet radiation-mediated mutagenesis in skin cancers. [13]

See also

Related Research Articles

<span class="mw-page-title-main">Cancer research</span>

Cancer research is research into cancer to identify causes and develop strategies for prevention, diagnosis, treatment, and cure.

<span class="mw-page-title-main">Wellcome Sanger Institute</span> British genomics research institute

The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.

<span class="mw-page-title-main">Oncogenomics</span> Sub-field of genomics

Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.

The Cancer Genome Project is part of the cancer, aging, and somatic mutation research based at the Wellcome Trust Sanger Institute in The United Kingdom. It aims to identify sequence variants/mutations critical in the development of human cancers. Like The Cancer Genome Atlas project within the United States, the Cancer Genome Project represents an effort in the War on Cancer to improve cancer diagnosis, treatment, and prevention through a better understanding of the molecular basis of the disease. The Cancer Genome Project was launched by Michael Stratton in 2000, and Peter Campbell is now the group leader of the project. The project works to combine knowledge of the human genome sequence with high throughput mutation detection techniques.

<span class="mw-page-title-main">STK11</span> Protein-coding gene in the species Homo sapiens

Serine/threonine kinase 11 (STK11) also known as liver kinase B1 (LKB1) or renal carcinoma antigen NY-REN-19 is a protein kinase that in humans is encoded by the STK11 gene.

<span class="mw-page-title-main">24-Dehydrocholesterol reductase</span> Mammalian protein found in Homo sapiens

24-Dehydrocholesterol reductase is a protein that in humans is encoded by the DHCR24 gene.

<span class="mw-page-title-main">DUSP3</span>

Dual specificity protein phosphatase 3 is an enzyme that in humans is encoded by the DUSP3 gene.

<span class="mw-page-title-main">UBAP1</span>

Ubiquitin-associated protein 1 is a protein that in humans is encoded by the UBAP1 gene.

<span class="mw-page-title-main">LRIG1</span>

Leucine-rich repeats and immunoglobulin-like domains protein 1 is a protein that in humans is encoded by the LRIG1 gene. It encodes a transmembrane protein that has been shown to interact with receptor tyrosine kinases of the EGFR family and with MET and RET.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

DECIPHER is a web-based resource and database of genomic variation data from analysis of patient DNA. It documents submicroscopic chromosome abnormalities and pathogenic sequence variants, from over 25000 patients and maps them to the human genome using Ensembl or UCSC Genome Browser. In addition it catalogues the clinical characteristics from each patient and maintains a database of microdeletion/duplication syndromes, together with links to relevant scientific reports and support groups.

Sir Michael Rudolf Stratton, is a British clinical scientist and the third director of the Wellcome Trust Sanger Institute. He currently heads the Cancer Genome Project and is a leader of the International Cancer Genome Consortium.

<span class="mw-page-title-main">CRLF3</span>

Cytokine receptor-like factor 3 is a protein that in humans is encoded by the CRLF3 gene.

<span class="mw-page-title-main">CBX7 (gene)</span>

Chromobox homolog 7 is a protein that in humans is encoded by the CBX7 gene. The loss of CBX7 gene expression has been shown to correlate with a malignant form of thyroid cancer.

The Mouse Genetics Project (MGP) is a large-scale mutant mouse production and phenotyping programme aimed at identifying new model organisms of disease.

Tumour heterogeneity describes the observation that different tumour cells can show distinct morphological and phenotypic profiles, including cellular morphology, gene expression, metabolism, motility, proliferation, and metastatic potential. This phenomenon occurs both between tumours and within tumours. A minimal level of intra-tumour heterogeneity is a simple consequence of the imperfection of DNA replication: whenever a cell divides, a few mutations are acquired—leading to a diverse population of cancer cells. The heterogeneity of cancer cells introduces significant challenges in designing effective treatment strategies. However, research into understanding and characterizing heterogeneity can allow for a better understanding of the causes and progression of disease. In turn, this has the potential to guide the creation of more refined treatment strategies that incorporate knowledge of heterogeneity to yield higher efficacy.

Mutational signatures are characteristic combinations of mutation types arising from specific mutagenesis processes such as DNA replication infidelity, exogenous and endogenous genotoxin exposures, defective DNA repair pathways, and DNA enzymatic editing.

Personalized onco-genomics (POG) is the field of oncology and genomics that is focused on using whole genome analysis to make personalized clinical treatment decisions. The program was devised at British Columbia's BC Cancer Agency and is currently being led by Marco Marra and Janessa Laskin. Genome instability has been identified as one of the underlying hallmarks of cancer. The genetic diversity of cancer cells promotes multiple other cancer hallmark functions that help them survive in their microenvironment and eventually metastasise. The pronounced genomic heterogeneity of tumours has led researchers to develop an approach that assesses each individual's cancer to identify targeted therapies that can halt cancer growth. Identification of these "drivers" and corresponding medications used to possibly halt these pathways are important in cancer treatment.

Serena Nik-Zainal is a British-Malaysian clinician who is a consultant in clinical genetics and Cancer Research UK Advanced Clinician Scientist at the University of Cambridge. She makes use of genomics for clinical applications. She was awarded the Royal Society Francis Crick Medal & Lecture in 2021.

References

  1. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. (January 2011). "COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer". Nucleic Acids Research. 39 (Database issue): D945–D950. doi:10.1093/nar/gkq929. PMC   3013785 . PMID   20952405.
  2. "The COSMIC homepage" . Retrieved 11 June 2012.
  3. Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA, Stratton MR (2008). The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet. Vol. Chapter 10. pp. Unit 10.11. doi:10.1002/0471142905.hg1011s57. ISBN   978-0471142904. PMC   2705836 . PMID   18428421.
  4. Forbes S, Clements J, Dawson E, Bamford S, Webb T, Dogan A, et al. (January 2006). "COSMIC 2005". British Journal of Cancer. 94 (2): 318–322. doi:10.1038/sj.bjc.6602928. PMC   2361125 . PMID   16421597.
  5. "The COSMIC licensing page" . Retrieved 6 September 2017.
  6. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. (July 2004). "The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website". British Journal of Cancer. 91 (2): 355–358. doi:10.1038/sj.bjc.6601894. PMC   2409828 . PMID   15188009.
  7. Hu H, et al. (2008). Biomedical informatics in translational research. Artech House. ISBN   978-1-59693-038-4.
  8. 1 2 3 Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, et al. (January 2010). "COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer". Nucleic Acids Research. 38 (Database issue): D652–D657. doi:10.1093/nar/gkp995. PMC   2808858 . PMID   19906727.
  9. 1 2 3 "COSMIC v48 Release". Catalogue Of Somatic Mutations In Cancer. Wellcome Trust Sanger Institute. 27 July 2010. Retrieved 1 September 2010.
  10. "Cancer Gene Census" . Retrieved 31 August 2010.
  11. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. (March 2004). "A census of human cancer genes". Nature Reviews. Cancer. 4 (3): 177–183. doi:10.1038/nrc1299. PMC   2665285 . PMID   14993899.
  12. Campbell PJ, Stephens PJ, Pleasance ED, O'Meara S, Li H, Santarius T, et al. (June 2008). "Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing". Nature Genetics. 40 (6): 722–729. doi:10.1038/ng.128. PMC   2705838 . PMID   18438408.
  13. Mata DA, Williams EA, Sokol E, Oxnard GR, Fleischmann Z, Tse JY, Decker B (March 2022). "Prevalence of UV Mutational Signatures Among Cutaneous Primary Tumors". JAMA Network Open. 5 (3): e223833. doi:10.1001/jamanetworkopen.2022.3833. PMC   8943639 . PMID   35319765.