PageRank algorithm in biochemistry

Last updated

The PageRank algorithm has several applications in biochemistry. ("PageRank" is an algorithm used in Google Search for ranking websites in their results, but it has been adopted for other purposes also. According to Google, PageRank works by "counting the number and quality of links to a page to determine a rough estimate of how important the website is," the underlying assumption being that more important websites are likely to receive more links from other websites. [1] )

Contents

Application in analyzing protein networks

The relative importance-measuring property of the PageRank link analysis algorithm could be used to identify new possible drug targets in proteins. [2] A PageRank-based algorithm could identify important protein targets in the pathogen organism better than a method considering only the number of incoming edges (in-degree) of a node in the metabolic network. The reason for this is that some already known, important protein targets do not have a high degree (are not hubs) and also, perturbing some hubs could result in unwanted physiological effects. [3]

Description

The clinical use of most antibiotics result in a mutation of the pathogen organism leading to their resistance against the drug. Therefore, development of new drugs is always needed. A potential first step in developing new drugs against currently threatening diseases (e.g. tuberculosis) is to find new drug targets in the causative agent of the disease, i.e. the pathogen microorganism, let it be either a bacterium, or a protozoan parasite. After finding the target protein in the bacterium (or protozoan parasite), one could design small molecular drug compounds that bind to the protein and inhibit it.

Public availability of biological network data [4] [5] [6] [7] makes the process of searching for new drug targets easier than it was before. By using the available metabolic networks, it is possible to find important nodes with link analysis algorithms, like PageRank. In a recently published paper, [8] biochemical reactions are treated as nodes of the metabolic network. In this directed network, reaction A has a directed edge towards reaction B if the product of the former enters the latter reaction as a substrate or co-factor.

To select important nodes that could serve as drug targets, we might think of selecting high in-degree nodes (hubs; nodes with many incoming edges). It was shown however[2], that targeting hub proteins with many vital functions may unintentionally harm the living cell as well. A PageRank-based scoring method could detect important nodes that are not hubs and therefore might be better drug targets.

The PageRank of a node A is the stationary limit probability distribution that the random walker is at node A. [2] In its original application, the personalization vector w captured the personal interest of a web-surfer: interesting websites to a surfer appeared with a higher probability in the distribution given in vector w. [8] In this metabolic network, w is personalized to proteins; w is larger for those proteins that appear in higher concentrations in the proteomics analysis of certain diseases. This personalized PageRank may identify other related proteins to the disease. [2] [8]

However, by using only the personalized PageRank to identify important nodes, hubs still get a high score on average. [9] To find non-hub important nodes instead, we should consider scoring the nodes by their "relativized personalized PageRank"; i.e. their personalized PageRank scores over the number of edges pointing towards them (over their in-degree): The relativized personalized PageRank (rPPR(v)) for a node v is given by:

where PpageRank(v) is the personalized PageRank score of node v, and d_(v) is its in-degree. It was shown, that by using this method, numerous already validated drug targets can be found (e.g. in the Mycobacterium tuberculosis), therefore, new, currently unknown targets might be detected as well. [8]

Related Research Articles

<span class="mw-page-title-main">Proteomics</span> Large-scale study of proteins

Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Gene regulatory network</span> Collection of molecular regulators

A generegulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins which, in turn, determine the function of the cell. GRN also play a central role in morphogenesis, the creation of body structures, which in turn is central to evolutionary developmental biology (evo-devo).

<span class="mw-page-title-main">Drug design</span> Inventive process of finding new medications based on the knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

<span class="mw-page-title-main">Network theory</span> Study of graphs as a representation of relations between discrete objects

In mathematics, computer science and network science, the network theory is a part of the graph theory. It defines networks as graphs whose the nodes or edges possess attributes. Network theory analyses these networks over the symmetric relations or asymmetric relations between their (discrete) components.

<span class="mw-page-title-main">Interactome</span>

In molecular biology, an interactome is the whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules but can also describe sets of indirect interactions among genes.

Modelling biological systems is a significant task of systems biology and mathematical biology. Computational systems biology aims to develop and use efficient algorithms, data structures, visualization and communication tools with the goal of computer modelling of biological systems. It involves the use of computer simulations of biological systems, including cellular subsystems, to both analyze and visualize the complex connections of these cellular processes.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

<span class="mw-page-title-main">Personalized medicine</span> Medical model that tailors medical practices to the individual patient

Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease. The terms personalized medicine, precision medicine, stratified medicine and P4 medicine are used interchangeably to describe this concept though some authors and organisations use these expressions separately to indicate particular nuances.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">Flux balance analysis</span>

Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steady-state metabolic fluxes for large models in a few seconds on modern personal computers. The related method of metabolic pathway analysis seeks to find and list all possible pathways between metabolites.

<span class="mw-page-title-main">Enzyme inhibitor</span> Molecule that binds to an enzyme and decreases its activity

An enzyme inhibitor is a molecule that binds to an enzyme and blocks its activity. Enzymes are proteins that speed up chemical reactions necessary for life, in which substrate molecules are converted into products. An enzyme facilitates a specific chemical reaction by binding the substrate to its active site, a specialized area on the enzyme that accelerates the most difficult step of the reaction.

Biological network inference is the process of making inferences and predictions about biological networks. By using networks to analyze patterns in biological systems, such as food-webs, we can visualize the nature and strength of interactions between species, DNA, proteins, and more.

In computational biology, power graph analysis is a method for the analysis and representation of complex networks. Power graph analysis is the computation, analysis and visual representation of a power graph from a graph (networks).

<span class="mw-page-title-main">Biological network</span> Method of representing systems

A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typical graphing representation consists of a set of nodes connected by edges.

<span class="mw-page-title-main">PageRank</span> Algorithm used by Google Search to rank web pages

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Immunomics is the study of immune system regulation and response to pathogens using genome-wide approaches. With the rise of genomic and proteomic technologies, scientists have been able to visualize biological networks and infer interrelationships between genes and/or proteins; recently, these technologies have been used to help better understand how the immune system functions and how it is regulated. Two thirds of the genome is active in one or more immune cell types and less than 1% of genes are uniquely expressed in a given type of cell. Therefore, it is critical that the expression patterns of these immune cell types be deciphered in the context of a network, and not as an individual, so that their roles be correctly characterized and related to one another. Defects of the immune system such as autoimmune diseases, immunodeficiency, and malignancies can benefit from genomic insights on pathological processes. For example, analyzing the systematic variation of gene expression can relate these patterns with specific diseases and gene networks important for immune functions.

<span class="mw-page-title-main">Network controllability</span>

Network controllability concerns the structural controllability of a network. Controllability describes our ability to guide a dynamical system from any initial state to any desired final state in finite time, with a suitable choice of inputs. This definition agrees well with our intuitive notion of control. The controllability of general directed and weighted complex networks has recently been the subject of intense study by a number of groups in wide variety of networks, worldwide. Recent studies by Sharma et al. on multi-type biological networks identified control targets in phenotypically characterized Osteosarcoma showing important role of genes and proteins responsible for maintaining tumor microenvironment.

The host–pathogen interaction is defined as how microbes or viruses sustain themselves within host organisms on a molecular, cellular, organismal or population level. This term is most commonly used to refer to disease-causing microorganisms although they may not cause illness in all hosts. Because of this, the definition has been expanded to how known pathogens survive within their host, whether they cause disease or not.

Network medicine is the application of network science towards identifying, preventing, and treating diseases. This field focuses on using network topology and network dynamics towards identifying diseases and developing medical drugs. Biological networks, such as protein-protein interactions and metabolic pathways, are utilized by network medicine. Disease networks, which map relationships between diseases and biological factors, also play an important role in the field. Epidemiology is extensively studied using network science as well; social networks and transportation networks are used to model the spreading of disease across populations. Network medicine is a medically focused area of systems biology.

References

  1. "Facts about Google and Competition". Archived from the original on 4 November 2011. Retrieved 12 July 2014.
  2. 1 2 3 Iván G, Grolmusz V (2011). "When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks". Bioinformatics 27 (3): 405–7.
  3. Russell RB, Aloy P (2008). "Targeting and tinkering with interaction networks". Nat Chem Biol 4: 666–673.
  4. Prasad TSK, Kandasamy K, Pandey A (2009). "Human protein reference database and human proteinpedia as discovery tools for systems biology". Methods Mol Biol 577: 67–79.
  5. Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, et al. (2002). "Mint: a molecular interaction database". FEBS Lett 513: 135–140.
  6. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, et al. (2002). "Dip, the database of interacting proteins: a research tool for studying cellular networks of protein interactions". Nucleic Acids Res 30: 303–305.
  7. Farkas IJ, Korcsmaros T, Kovacs IA, Mihalik A, Palotai R, et al. (2011). "Network-based tools for the identification of novel drug targets". Sci Signal 4: pt3.
  8. 1 2 3 4 Bánky D, Iván G, Grolmusz V (2013). "Equal Opportunity for Low-Degree Network Nodes: A PageRank-Based Method for Protein Target Identification in Metabolic Graphs". PLoS ONE 8(1): e54204.
  9. Fortunato S, Boguna M, Flammini A, Menczer F (2008). "Approximating pagerank from in-degree". Lecture Notes in Computer Science 4936: 59–71.