Cancer systems biology encompasses the application of systems biology approaches to cancer research, in order to study the disease as a complex adaptive system with emerging properties at multiple biological scales. [1] [2] [3] Cancer systems biology represents the application of systems biology approaches to the analysis of how the intracellular networks of normal cells are perturbed during carcinogenesis to develop effective predictive models that can assist scientists and clinicians in the validations of new therapies and drugs. Tumours are characterized by genomic and epigenetic instability that alters the functions of many different molecules and networks in a single cell as well as altering the interactions with the local environment. Cancer systems biology approaches, therefore, are based on the use of computational and mathematical methods to decipher the complexity in tumorigenesis as well as cancer heterogeneity. [4]
Cancer systems biology encompasses concrete applications of systems biology approaches to cancer research, notably (a) the need for better methods to distill insights from large-scale networks, (b) the importance of integrating multiple data types in constructing more realistic models, (c) challenges in translating insights about tumorigenic mechanisms into therapeutic interventions, and (d) the role of the tumor microenvironment, at the physical, cellular, and molecular levels. [5] Cancer systems biology therefore adopts a holistic view of cancer [6] aimed at integrating its many biological scales, including genetics, signaling networks, [7] epigenetics, [8] cellular behavior, mechanical properties, [9] histology, clinical manifestations and epidemiology. Ultimately, cancer properties at one scale, e.g., histology, are explained by properties at a scale below, e.g., cell behavior.
Cancer systems biology merges traditional basic and clinical cancer research with “exact” sciences, such as applied mathematics, engineering, and physics. It incorporates a spectrum of “omics” technologies (genomics, proteomics, epigenomics, etc.) and molecular imaging, to generate computational algorithms and quantitative models [10] that shed light on mechanisms underlying the cancer process and predict response to intervention. Application of cancer systems biology include but are not limited to- elucidating critical cellular and molecular networks underlying cancer risk, initiation and progression; thereby promoting an alternative viewpoint to the traditional reductionist approach which has typically focused on characterizing single molecular aberrations.
Cancer systems biology finds its roots in a number of events and realizations in biomedical research, as well as in technological advances. Historically cancer was identified, understood, and treated as a monolithic disease. It was seen as a “foreign” component that grew as a homogenous mass, and was to be best treated by excision. Besides the continued impact of surgical intervention, this simplistic view of cancer has drastically evolved. In parallel with the exploits of molecular biology, cancer research focused on the identification of critical oncogenes or tumor suppressor genes in the etiology of cancer. These breakthroughs revolutionized our understanding of molecular events driving cancer progression. Targeted therapy may be considered the current pinnacle of advances spawned by such insights.
Despite these advances, many unresolved challenges remain, including the dearth of new treatment avenues for many cancer types, or the unexplained treatment failures and inevitable relapse in cancer types where targeted treatment exists. [11] Such mismatch between clinical results and the massive amounts of data acquired by omics technology highlights the existence of basic gaps in our knowledge of cancer fundamentals. Cancer Systems Biology is steadily improving our ability to organize information on cancer, in order to fill these gaps. Key developments include:
The practice of Cancer Systems Biology requires close physical integration between scientists with diverse backgrounds. Critical large-scale efforts are also underway to train a new workforce fluent in both the languages of biology and applied mathematics. At the translational level, Cancer Systems Biology should engender precision medicine application to cancer treatment.
High-throughput technologies enable comprehensive genomic analyses of mutations, rearrangements, copy number variations, and methylation at the cellular and tissue levels, as well as robust analysis of RNA and microRNA expression data, protein levels and metabolite levels. [17] [18] [19] [20] [21] [22]
List of High-Throughput Technologies and the Data they generated, with representative databases and publications
Technology | Experimental data | Representative database |
---|---|---|
DNA-seq, NGS | DNA sequences, exome sequences, genomes, genes | TCGA, [23] GenBank, [24] DDBJ, [25] Ensembl [26] |
Microarray, RNA-seq | Gene expression levels, microRNA levels, transcripts | GEO, [27] Expression Atlas [28] |
MS, iTRAQ | Protein concentration, phosphorylations | GPMdb, [29] PRIDE, [30] Human Protein Atlas [31] |
C-MS, GC-MS, NMR | Metabolite levels | HMDB [32] |
ChIP-chip, ChIP-seq | Protein-DNA interactions, transcript factor binding sites | GEO, [27] TRANSFAC, [33] JASPAR, [34] ENCODE [35] |
CLIP-seq, PAR-CLIP, iCLIP | MicroRNA-mRNA regulations | StarBase, [36] miRTarBase [37] |
Y2H, AP/MS, MaMTH, maPPIT | Protein-protein interactions | HPRD, [38] BioGRID [39] |
Protein microarray | Kinase–substrate interactions | TCGA, [23] PhosphoPOINT [40] |
SGA, E-MAP, RNAi | Genetic interactions | HPRD, [41] BioGRID [42] |
SNP genotyping array | GWAS loci, eQTL, aberrant SNPs | GWAS Catalog, [43] dbGAP, [44] dbSNP [45] |
LUMIER, data integration | Signaling pathways, metabolic pathways, molecular signatures | TCGA, [23] KEGG, [46] Reactome [47] |
The computational approaches used in cancer systems biology include new mathematical and computational algorithms that reflect the dynamic interplay between experimental biology and the quantitative sciences. [48] A cancer systems biology approach can be applied at different levels, from an individual cell to a tissue, a patient with a primary tumour and possible metastases, or to any combination of these situations. This approach can integrate the molecular characteristics of tumours at different levels (DNA, RNA, protein, epigenetic, imaging) [49] and different intervals (seconds versus days) with multidisciplinary analysis. [50] One of the major challenges to its success, besides the challenge posed by the heterogeneity of cancer per se, resides in acquiring high-quality data that describe clinical characteristics, pathology, treatment, and outcomes and integrating the data into robust predictive models [51] [19] [20] [21] [22] [52] [53]
Mathematical modeling can provide useful context for the rational design, validation and prioritization of novel cancer drug targets and their combinations. Network-based modeling and multi-scale modeling have begun to show promise in facilitating the process of effective cancer drug discovery. Using a systems network modeling approach, Schoerberl et al. [54] identified a previously unknown, complementary and potentially superior mechanism of inhibiting the ErbB receptor signaling network. ErbB3 was found to be the most sensitive node, leading to Akt activation; Akt regulates many biological processes, such as proliferation, apoptosis and growth, which are all relevant to tumor progression. [55] This target driven modelling has paved way for first of its kind clinical trials. Bekkal et al. presented a nonlinear model of the dynamics of a cell population divided into proliferative and quiescent compartments. The proliferative phase represents the complete cell cycle (G (1)-S-G (2)-M) of a population committed to divide at its end. The asymptotic behavior of solutions of the nonlinear model is analysed in two cases, exhibiting tissue homeostasis or tumor exponential growth. The model is simulated and its analytic predictions are confirmed numerically. [56] Furthermore, advances in hardware and software have enabled the realization of clinically feasible, quantitative multimodality imaging of tissue pathophysiology. Earlier efforts relating to multimodality imaging of cancer have focused on the integration of anatomical and functional characteristics, such as PET-CT and single-photon emission CT (SPECT-CT), whereas more-recent advances and applications have involved the integration of multiple quantitative, functional measurements (for example, multiple PET tracers, varied MRI contrast mechanisms, and PET-MRI), thereby providing a more-comprehensive characterization of the tumour phenotype. The enormous amount of complementary quantitative data generated by such studies is beginning to offer unique insights into opportunities to optimize care for individual patients. Although important technical optimization and improved biological interpretation of multimodality imaging findings are needed, this approach can already be applied informatively in clinical trials of cancer therapeutics using existing tools. [57]
In 2004, the US National Cancer Institute launched a program effort on Integrative Cancer Systems Biology [58] to establish Centers for Cancer Systems Biology that focus on the analysis of cancer as a complex biological system. The integration of experimental biology with mathematical modeling will result in new insights in the biology and new approaches to the management of cancer. The program brings clinical and basic cancer researchers together with researchers from mathematics, physics, engineering, information technology, imaging sciences, and computer science to work on unraveling fundamental questions in the biology of cancer. [59]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is often referred to as computational biology, though the distinction between the two terms is often disputed.
The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.
In academia, computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology. The field's main aim is to convert immunological data into computational problems, solve these problems using mathematical and computational approaches and then convert these results into immunologically meaningful interpretations.
Mark Bender Gerstein is an American scientist working in bioinformatics and Data Science. As of 2009, he is co-director of the Yale Computational Biology and Bioinformatics program.
Richard Michael Durbin is a British computational biologist and Al-Kindi Professor of Genetics at the University of Cambridge. He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader.
Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.
This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.
Sean Roberts Eddy is Professor of Molecular & Cellular Biology and of Applied Mathematics at Harvard University. Previously he was based at the Janelia Research Campus from 2006 to 2015 in Virginia. His research interests are in bioinformatics, computational biology and biological sequence analysis. As of 2016 projects include the use of Hidden Markov models in HMMER, Infernal Pfam and Rfam.
Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.
Alexander George Bateman is a computational biologist and Head of Protein Sequence Resources at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) in Cambridge, UK. He has led the development of the Pfam biological database and introduced the Rfam database of RNA families. He has also been involved in the use of Wikipedia for community-based annotation of biological databases.
The human interactome is the set of protein–protein interactions that occur in human cells. The sequencing of reference genomes, in particular the Human Genome Project, has revolutionized human genetics, molecular biology, and clinical medicine. Genome-wide association study results have led to the association of genes with most Mendelian disorders, and over 140 000 germline mutations have been associated with at least one genetic disease. However, it became apparent that inherent to these studies is an emphasis on clinical outcome rather than a comprehensive understanding of human disease; indeed to date the most significant contributions of GWAS have been restricted to the “low-hanging fruit” of direct single mutation disorders, prompting a systems biology approach to genomic analysis. The connection between genotype and phenotype remain elusive, especially in the context of multigenic complex traits and cancer. To assign functional context to genotypic changes, much of recent research efforts have been devoted to the mapping of the networks formed by interactions of cellular and genetic components in humans, as well as how these networks are altered by genetic and somatic disease.
Julian John Thurstan Gough was a Group Leader in the Laboratory of Molecular Biology (LMB) of the Medical Research Council (MRC). He was previously a professor of bioinformatics at the University of Bristol.
Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data sets are multiple "omes", such as the genome, proteome, transcriptome, epigenome, metabolome, and microbiome ; in other words, the use of multiple omics technologies to study life in a concerted way. By combining these "omes", scientists can analyze complex biological big data to find novel associations between biological entities, pinpoint relevant biomarkers and build elaborate markers of disease and physiology. In doing so, multiomics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association. The OmicTools service lists more than 99 softwares related to multiomic data analysis, as well as more than 99 databases on the topic.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.
{{cite web}}
: CS1 maint: multiple names: authors list (link)