Google matrix

Last updated
Fig.1. Google matrix of Wikipedia articles network, written in the bases of PageRank index; fragment of top 200 X 200 matrix elements is shown, total size N=3282257 (from ) Googlematrixwikipedia2009.jpg
Fig.1. Google matrix of Wikipedia articles network, written in the bases of PageRank index; fragment of top 200 X 200 matrix elements is shown, total size N=3282257 (from )

A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic.

Contents

Adjacency matrix A and Markov matrix S

In order to generate the Google matrix G, we must first generate an adjacency matrix A which represents the relations between pages or nodes.

Assuming there are N pages, we can fill out A by doing the following:

  1. A matrix element is filled with 1 if node has a link to node , and 0 otherwise; this is the adjacency matrix of links.
  2. A related matrix S corresponding to the transitions in a Markov chain [ broken anchor ] of given network is constructed from A by dividing the elements of column "j" by a number of where is the total number of outgoing links from node j to all other nodes. The columns having zero matrix elements, corresponding to dangling nodes, are replaced by a constant value 1/N. Such a procedure adds a link from every sink, dangling state to every other node.
  3. Now by the construction the sum of all elements in any column of matrix S is equal to unity. In this way the matrix S is mathematically well defined and it belongs to the class of Markov chains and the class of Perron-Frobenius operators. That makes S suitable for the PageRank algorithm.

Construction of Google matrix G

Fig.2. Google matrix of Cambridge University network (2006), coarse-grained matrix elements are written in the bases of PageRank index, total size N=212710 is shown (from ) Googlematrixcambridge2006.jpg
Fig.2. Google matrix of Cambridge University network (2006), coarse-grained matrix elements are written in the bases of PageRank index, total size N=212710 is shown (from )

Then the final Google matrix G can be expressed via S as:

By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient is known as a damping factor.

Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G. [2] [3]

Examples of Google matrix

An example of the matrix construction via Eq.(1) within a simple network is given in the article CheiRank.

For the actual matrix, Google uses a damping factor around 0.85. [2] [3] [4] The term gives a surfer probability to jump randomly on any page. The matrix belongs to the class of Perron-Frobenius operators of Markov chains. [2] The examples of Google matrix structure are shown in Fig.1 for Wikipedia articles hyperlink network in 2009 at small scale and in Fig.2 for University of Cambridge network in 2006 at large scale.

Spectrum and eigenstates of G matrix

Fig3. The spectrum of eigenvalues of the Google matrix of University of Cambridge from Fig.2 at
a
=
1
{\displaystyle \alpha =1}
, blue points show eigenvalues of isolated subspaces, red points show eigenvalues of core component (from ) Googlematrixcambridge2006spectrum.gif
Fig3. The spectrum of eigenvalues of the Google matrix of University of Cambridge from Fig.2 at , blue points show eigenvalues of isolated subspaces, red points show eigenvalues of core component (from )

For there is only one maximal eigenvalue with the corresponding right eigenvector which has non-negative elements which can be viewed as stationary probability distribution. [2] These probabilities ordered by their decreasing values give the PageRank vector with the PageRank used by Google search to rank webpages. Usually one has for the World Wide Web that with . The number of nodes with a given PageRank value scales as with the exponent . [6] [7] The left eigenvector at has constant matrix elements. With all eigenvalues move as except the maximal eigenvalue , which remains unchanged. [2] The PageRank vector varies with but other eigenvectors with remain unchanged due to their orthogonality to the constant left vector at . The gap between and other eigenvalue being gives a rapid convergence of a random initial vector to the PageRank approximately after 50 multiplications on matrix.

Fig4. Distribution of eigenvalues
l
i
{\displaystyle \lambda _{i}}
of Google matrices in the complex plane at
a
=
1
{\displaystyle \alpha =1}
for dictionary networks: Roget (A, N=1022), ODLIS (B, N=2909) and FOLDOC (C, N=13356); UK university WWW networks: University of Wales (Cardiff) (D, N=2778), Birmingham City University (E, N=10631), Keele University (Staffordshire) (F, N=11437), Nottingham Trent University (G, N=12660), Liverpool John Moores University (H, N=13578)(data for universities are for 2002) (from ) Googlematrix1.jpg
Fig4. Distribution of eigenvalues of Google matrices in the complex plane at for dictionary networks: Roget (A, N=1022), ODLIS (B, N=2909) and FOLDOC (C, N=13356); UK university WWW networks: University of Wales (Cardiff) (D, N=2778), Birmingham City University (E, N=10631), Keele University (Staffordshire) (F, N=11437), Nottingham Trent University (G, N=12660), Liverpool John Moores University (H, N=13578)(data for universities are for 2002) (from )

At the matrix has generally many degenerate eigenvalues (see e.g. [6] [8] ). Examples of the eigenvalue spectrum of the Google matrix of various directed networks is shown in Fig.3 from [5] and Fig.4 from. [8]

The Google matrix can be also constructed for the Ulam networks generated by the Ulam method [8] for dynamical maps. The spectral properties of such matrices are discussed in [9,10,11,12,13,15]. [5] [9] In a number of cases the spectrum is described by the fractal Weyl law [10,12].

Fig5. Distribution of eigenvalues
l
{\displaystyle \lambda }
in the complex plane for the Google matrix
G
{\displaystyle G}
of the Linux Kernel version 2.6.32 with matrix size
N
=
285509
{\displaystyle N=285509}
at
a
=
0.85
{\displaystyle \alpha =0.85}
, unit circle is shown by solid curve (from ) Googlematrix2.jpg
Fig5. Distribution of eigenvalues in the complex plane for the Google matrix of the Linux Kernel version 2.6.32 with matrix size at , unit circle is shown by solid curve (from )
Fig.6 Coarse-grained probability distribution for the eigenstates of the Google matrix for Linux Kernel version 2.6.32. The horizontal lines show the first 64 eigenvectors ordered vertically by
|
l
i
|
{\displaystyle |\lambda _{i}|}
(from ) Googlematrix3.gif
Fig.6 Coarse-grained probability distribution for the eigenstates of the Google matrix for Linux Kernel version 2.6.32. The horizontal lines show the first 64 eigenvectors ordered vertically by (from )

The Google matrix can be constructed also for other directed networks, e.g. for the procedure call network of the Linux Kernel software introduced in [15]. In this case the spectrum of is described by the fractal Weyl law with the fractal dimension (see Fig.5 from [9] ). Numerical analysis shows that the eigenstates of matrix are localized (see Fig.6 from [9] ). Arnoldi iteration method allows to compute many eigenvalues and eigenvectors for matrices of rather large size [13]. [5] [9]

Other examples of matrix include the Google matrix of brain [17] and business process management [18], see also. [1] Applications of Google matrix analysis to DNA sequences is described in [20]. Such a Google matrix approach allows also to analyze entanglement of cultures via ranking of multilingual Wikipedia articles abouts persons [21]

Historical notes

The Google matrix with damping factor was described by Sergey Brin and Larry Page in 1998 [22], see also articles on PageRank history [23],[24].

See also

Related Research Articles

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

<span class="mw-page-title-main">Morse potential</span> Model for the potential energy of a diatomic molecule

The Morse potential, named after physicist Philip M. Morse, is a convenient interatomic interaction model for the potential energy of a diatomic molecule. It is a better approximation for the vibrational structure of the molecule than the quantum harmonic oscillator because it explicitly includes the effects of bond breaking, such as the existence of unbound states. It also accounts for the anharmonicity of real bonds and the non-zero transition probability for overtone and combination bands. The Morse potential can also be used to model other interactions such as the interaction between an atom and a surface. Due to its simplicity, it is not used in modern spectroscopy. However, its mathematical form inspired the MLR (Morse/Long-range) potential, which is the most popular potential energy function used for fitting spectroscopic data.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all of its entries are sampled randomly from a probability distribution. Random matrix theory (RMT) is the study of properties of random matrices, often as they become large. RMT provides techniques like mean-field theory, diagrammatic methods, the cavity method, or the replica method to compute quantities like traces, spectral densities, or scalar products between eigenvectors. Many physical phenomena, such as the spectrum of nuclei of heavy atoms, the thermal conductivity of a lattice, or the emergence of quantum chaos, can be modeled mathematically as problems concerning large, random matrices.

In linear algebra, it is often important to know which vectors have their directions unchanged by a given linear transformation. An eigenvector or characteristic vector is such a vector. Thus an eigenvector of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .

In theoretical physics, massive gravity is a theory of gravity that modifies general relativity by endowing the graviton with a nonzero mass. In the classical theory, this means that gravitational waves obey a massive wave equation and hence travel at speeds below the speed of light.

In graph theory, eigenvector centrality is a measure of the influence of a node in a connected network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.

The quantum Heisenberg model, developed by Werner Heisenberg, is a statistical mechanical model used in the study of critical points and phase transitions of magnetic systems, in which the spins of the magnetic systems are treated quantum mechanically. It is related to the prototypical Ising model, where at each site of a lattice, a spin represents a microscopic magnetic dipole to which the magnetic moment is either up or down. Except the coupling between magnetic dipole moments, there is also a multipolar version of Heisenberg model called the multipolar exchange interaction.

In the mathematical field of linear algebra and convex analysis, the numerical range or field of values of a complex matrix A is the set

In the mathematical field of linear algebra, an arrowhead matrix is a square matrix containing zeros in all entries except for the first row, first column, and main diagonal, these entries can be any number. In other words, the matrix has the form

In chemical graph theory, the Estrada index is a topological index of protein folding. The index was first defined by Ernesto Estrada as a measure of the degree of folding of a protein, which is represented as a path-graph weighted by the dihedral or torsional angles of the protein backbone. This index of degree of folding has found multiple applications in the study of protein functions and protein-ligand interactions.

<span class="mw-page-title-main">CheiRank</span>

The CheiRank is an eigenvector with a maximal real eigenvalue of the Google matrix constructed for a directed network with the inverted directions of links. It is similar to the PageRank vector, which ranks the network nodes in average proportionally to a number of incoming links being the maximal eigenvector of the Google matrix with a given initial direction of links. Due to inversion of link directions the CheiRank ranks the network nodes in average proportionally to a number of outgoing links. Since each node belongs both to CheiRank and PageRank vectors the ranking of information flow on a directed network becomes two-dimensional.

In physics, Liouville field theory is a two-dimensional conformal field theory whose classical equation of motion is a generalization of Liouville's equation.

<span class="mw-page-title-main">Multidimensional network</span> Networks with multiple kinds of relations

In network theory, multidimensional networks, a special type of multilayer network, are networks with multiple kinds of relations. Increasingly sophisticated attempts to model real-world systems as multidimensional networks have yielded valuable insight in the fields of social network analysis, economics, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.

Infinite derivative gravity is a theory of gravity which attempts to remove cosmological and black hole singularities by adding extra terms to the Einstein–Hilbert action, which weaken gravity at short distances.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

<span class="mw-page-title-main">Dual graviton</span> Hypothetical particle found in supergravity

In theoretical physics, the dual graviton is a hypothetical elementary particle that is a dual of the graviton under electric-magnetic duality, as an S-duality, predicted by some formulations of eleven-dimensional supergravity.

<span class="mw-page-title-main">Dual photon</span> Hypothetical particle dual to the photon

In theoretical physics, the dual photon is a hypothetical elementary particle that is a dual of the photon under electric–magnetic duality which is predicted by some theoretical models, including M-theory.

In mathematics, the graph Fourier transform is a mathematical transform which eigendecomposes the Laplacian matrix of a graph into eigenvalues and eigenvectors. Analogously to the classical Fourier transform, the eigenvalues represent frequencies and eigenvectors form what is known as a graph Fourier basis.

Hamiltonian truncation is a numerical method used to study quantum field theories (QFTs) in spacetime dimensions. Hamiltonian truncation is an adaptation of the Rayleigh–Ritz method from quantum mechanics. It is closely related to the exact diagonalization method used to treat spin systems in condensed matter physics. The method is typically used to study QFTs on spacetimes of the form , specifically to compute the spectrum of the Hamiltonian along . A key feature of Hamiltonian truncation is that an explicit ultraviolet cutoff is introduced, akin to the lattice spacing a in lattice Monte Carlo methods. Since Hamiltonian truncation is a nonperturbative method, it can be used to study strong-coupling phenomena like spontaneous symmetry breaking.

References

  1. 1 2 3 Ermann, L.; Chepelianskii, A. D.; Shepelyansky, D. L. (2011). "Towards two-dimensional search engines". Journal of Physics A. 45 (27): 275101. arXiv: 1106.6215 . Bibcode:2012JPhA...45A5101E. doi:10.1088/1751-8113/45/27/275101. S2CID   14827486.
  2. 1 2 3 4 5 Langville, Amy N.; Meyer, Carl (2006). Google's PageRank and Beyond. Princeton University Press. ISBN   978-0-691-12202-1.
  3. 1 2 Austin, David (2008). "How Google Finds Your Needle in the Web's Haystack". AMS Feature Columns.
  4. Law, Edith (2008-10-09). "PageRank Lecture 12" (PDF).
  5. 1 2 3 4 Frahm, K. M.; Georgeot, B.; Shepelyansky, D. L. (2011-11-01). "Universal emergence of PageRank". Journal of Physics A. 44 (46): 465101. arXiv: 1105.1062 . Bibcode:2011JPhA...44T5101F. doi:10.1088/1751-8113/44/46/465101. S2CID   16292743.
  6. Donato, Debora; Laura, Luigi; Leonardi, Stefano; Millozzi, Stefano (2004-03-30). "Large scale properties of the Webgraph" (PDF). European Physical Journal B. 38 (2): 239–243. Bibcode:2004EPJB...38..239D. CiteSeerX   10.1.1.104.2136 . doi:10.1140/epjb/e2004-00056-6. S2CID   10640375.
  7. Pandurangan, Gopal; Ranghavan, Prabhakar; Upfal, Eli (2005). "Using PageRank to Characterize Web Structure" (PDF). Internet Mathematics. 3 (1): 1–20. doi: 10.1080/15427951.2006.10129114 . S2CID   101281.
  8. 1 2 3 Georgeot, Bertrand; Giraud, Olivier; Shepelyansky, Dima L. (2010-05-25). "Spectral properties of the Google matrix of the World Wide Web and other directed networks". Physical Review E. 81 (5): 056109. arXiv: 1002.3342 . Bibcode:2010PhRvE..81e6109G. doi:10.1103/PhysRevE.81.056109. PMID   20866299. S2CID   14490804.
  9. 1 2 3 4 5 6 Ermann, L.; Chepelianskii, A. D.; Shepelyansky, D. L. (2011). "Fractal Weyl law for Linux Kernel Architecture". European Physical Journal B. 79 (1): 115–120. arXiv: 1005.1395 . Bibcode:2011EPJB...79..115E. doi:10.1140/epjb/e2010-10774-7. S2CID   445348.