Ranking SVM

Last updated

In machine learning, a ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems (via learning to rank). The ranking SVM algorithm was published by Thorsten Joachims in 2002. [1] The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that ranking SVM also can be used to solve other problems such as Rank SIFT. [2]

Contents

Description

The ranking SVM algorithm is a learning retrieval function that employs pairwise ranking methods to adaptively sort results based on how 'relevant' they are for a specific query. The ranking SVM function uses a mapping function to describe the match between a search query and the features of each of the possible results. This mapping function projects each data pair (such as a search query and clicked web-page, for example) onto a feature space. These features are combined with the corresponding click-through data (which can act as a proxy for how relevant a page is for a specific query) and can then be used as the training data for the ranking SVM algorithm.

Generally, ranking SVM includes three steps in the training period:

  1. It maps the similarities between queries and the clicked pages onto a certain feature space.
  2. It calculates the distances between any two of the vectors obtained in step 1.
  3. It forms an optimization problem which is similar to a standard SVM classification and solves this problem with the regular SVM solver.

Background

Ranking method

Suppose is a data set containing elements . is a ranking method applied to . Then the in can be represented as a binary matrix. If the rank of is higher than the rank of , i.e. , the corresponding position of this matrix is set to value of "1". Otherwise the element in that position will be set as the value "0".

Kendall's tau [3] [4]

Kendall's Tau also refers to Kendall tau rank correlation coefficient, which is commonly used to compare two ranking methods for the same data set.

Suppose and are two ranking method applied to data set , the Kendall's Tau between and can be represented as follows:

where is the number of concordant pairs and is the number of discordant pairs (inversions). A pair and is concordant if both and agree in how they order and . It is discordant if they disagree.

Information retrieval quality [5] [6] [7]

Information retrieval quality is usually evaluated by the following three measurements:

  1. Precision
  2. Recall
  3. Average precision

For a specific query to a database, let be the set of relevant information elements in the database and be the set of the retrieved information elements. Then the above three measurements can be represented as follows:

where is the of .

Let and be the expected and proposed ranking methods of a database respectively, the lower bound of Average Precision of method can be represented as follows:

where is the number of different elements in the upper triangular parts of matrices of and and is the number of relevant elements in the data set.

SVM classifier [8]

Suppose is the element of a training data set, where is the feature vector and is the label (which classifies the category of ). A typical SVM classifier for such data set can be defined as the solution of the following optimization problem.

The solution of the above optimization problem can be represented as a linear combination of the feature vectors s.

where is the coefficients to be determined.

Ranking SVM algorithm

Loss function

Let be the Kendall's tau between expected ranking method and proposed method , it can be proved that maximizing helps to minimize the lower bound of the Average Precision of .

The negative can be selected as the loss function to minimize the lower bound of average precision of

where is the statistical distribution of to certain query .

Since the expected loss function is not applicable, the following empirical loss function is selected for the training data in practice.

Collecting training data

i.i.d. queries are applied to a database and each query corresponds to a ranking method. The training data set has elements. Each element contains a query and the corresponding ranking method.

Feature space

Labelled points in feature space Labelled points in feature space.jpg
Labelled points in feature space

A mapping function [10] [11] is required to map each query and the element of database to a feature space. Then each point in the feature space is labelled with certain rank by ranking method.

Optimization problem

The points generated by the training data are in the feature space, which also carry the rank information (the labels). These labeled points can be used to find the boundary (classifier) that specifies the order of them. In the linear case, such boundary (classifier) is a vector.

Suppose and are two elements in the database and denote if the rank of is higher than in certain ranking method . Let vector be the linear classifier candidate in the feature space. Then the ranking problem can be translated to the following SVM classification problem. Note that one ranking method corresponds to one query.

The above optimization problem is identical to the classical SVM classification problem, which is the reason why this algorithm is called Ranking-SVM.

W candidate W candidate.jpg
W candidate
Not a w candidate Not a w candidate.jpg
Not a w candidate

Retrieval function

The optimal vector obtained by the training sample is

So the retrieval function could be formed based on such optimal classifier.
For new query , the retrieval function first projects all elements of the database to the feature space. Then it orders these feature points by the values of their inner products with the optimal vector. And the rank of each feature point is the rank of the corresponding element of database for the query .

Application of ranking SVM

Ranking SVM can be applied to rank the pages according to the query. The algorithm can be trained using click-through data, where consists of the following three parts:

  1. Query.
  2. Present ranking of search results
  3. Search results clicked on by user

The combination of 2 and 3 cannot provide full training data order which is needed to apply the full SVM algorithm. Instead, it provides a part of the ranking information of the training data. So the algorithm can be slightly revised as follows.

The method does not provide ranking information of the whole dataset, it's a subset of the full ranking method. So the condition of optimization problem becomes more relax compared with the original Ranking-SVM.

Related Research Articles

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal and some transform, time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform.

<span class="mw-page-title-main">Hamilton–Jacobi equation</span> A reformulation of Newtons laws of motion using the calculus of variations

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

<span class="mw-page-title-main">Linear time-invariant system</span> Mathematical model which is both linear and time-invariant

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response y(t) of the system to an arbitrary input x(t) can be found directly using convolution: y(t) = (xh)(t) where h(t) is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining h(t)), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

<span class="mw-page-title-main">Bring radical</span> Real root of the polynomial x^5+x+a

In algebra, the Bring radical or ultraradical of a real number a is the unique real root of the polynomial

In control theory, a distributed-parameter system is a system whose state space is infinite-dimensional. Such systems are therefore also known as infinite-dimensional systems. Typical examples are systems described by partial differential equations or by delay differential equations.

In physics and fluid mechanics, a Blasius boundary layer describes the steady two-dimensional laminar boundary layer that forms on a semi-infinite plate which is held parallel to a constant unidirectional flow. Falkner and Skan later generalized Blasius' solution to wedge flow, i.e. flows in which the plate is not parallel to the flow.

<span class="mw-page-title-main">Toroidal coordinates</span>

Toroidal coordinates are a three-dimensional orthogonal coordinate system that results from rotating the two-dimensional bipolar coordinate system about the axis that separates its two foci. Thus, the two foci and in bipolar coordinates become a ring of radius in the plane of the toroidal coordinate system; the -axis is the axis of rotation. The focal ring is also known as the reference circle.

<span class="mw-page-title-main">Oblate spheroidal coordinates</span> Three-dimensional orthogonal coordinate system

Oblate spheroidal coordinates are a three-dimensional orthogonal coordinate system that results from rotating the two-dimensional elliptic coordinate system about the non-focal axis of the ellipse, i.e., the symmetry axis that separates the foci. Thus, the two foci are transformed into a ring of radius in the x-y plane. Oblate spheroidal coordinates can also be considered as a limiting case of ellipsoidal coordinates in which the two largest semi-axes are equal in length.

In spectroscopy, the Autler–Townes effect, is a dynamical Stark effect corresponding to the case when an oscillating electric field is tuned in resonance to the transition frequency of a given spectral line, and resulting in a change of the shape of the absorption/emission spectra of that spectral line. The AC Stark effect was discovered in 1955 by American physicists Stanley Autler and Charles Townes.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

The prolate spheroidal wave functions are eigenfunctions of the Laplacian in prolate spheroidal coordinates, adapted to boundary conditions on certain ellipsoids of revolution. Related are the oblate spheroidal wave functions.

Bilinear time–frequency distributions, or quadratic time–frequency distributions, arise in a sub-field of signal analysis and signal processing called time–frequency signal processing, and, in the statistical analysis of time series data. Such methods are used where one needs to deal with a situation where the frequency composition of a signal may be changing over time; this sub-field used to be called time–frequency signal analysis, and is now more often called time–frequency signal processing due to the progress in using these methods to a wide range of signal-processing problems.

<span class="mw-page-title-main">Gravitational lensing formalism</span>

In general relativity, a point mass deflects a light ray with impact parameter by an angle approximately equal to

<span class="mw-page-title-main">Rogers–Ramanujan continued fraction</span> Continued fraction closely related to the Rogers–Ramanujan identities

The Rogers–Ramanujan continued fraction is a continued fraction discovered by Rogers (1894) and independently by Srinivasa Ramanujan, and closely related to the Rogers–Ramanujan identities. It can be evaluated explicitly for a broad class of values of its argument.

Uncertainty theory is a branch of mathematics based on normality, monotonicity, self-duality, countable subadditivity, and product measure axioms.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

Quantum characteristics are phase-space trajectories that arise in the phase space formulation of quantum mechanics through the Wigner transform of Heisenberg operators of canonical coordinates and momenta. These trajectories obey the Hamilton equations in quantum form and play the role of characteristics in terms of which time-dependent Weyl's symbols of quantum operators can be expressed. In the classical limit, quantum characteristics reduce to classical trajectories. The knowledge of quantum characteristics is equivalent to the knowledge of quantum dynamics.

Contact mechanics is the study of the deformation of solids that touch each other at one or more points. This can be divided into compressive and adhesive forces in the direction perpendicular to the interface, and frictional forces in the tangential direction. Frictional contact mechanics is the study of the deformation of bodies in the presence of frictional effects, whereas frictionless contact mechanics assumes the absence of such effects.

In numerical analysis, the local linearization (LL) method is a general strategy for designing numerical integrators for differential equations based on a local (piecewise) linearization of the given equation on consecutive time intervals. The numerical integrators are then iteratively defined as the solution of the resulting piecewise linear equation at the end of each consecutive interval. The LL method has been developed for a variety of equations such as the ordinary, delayed, random and stochastic differential equations. The LL integrators are key component in the implementation of inference methods for the estimation of unknown parameters and unobserved variables of differential equations given time series of observations. The LL schemes are ideals to deal with complex models in a variety of fields as neuroscience, finance, forestry management, control engineering, mathematical statistics, etc.

The Bueno-Orovio–Cherry–Fenton model, also simply called Bueno-Orovio model, is a minimal ionic model for human ventricular cells. It belongs to the category of phenomenological models, because of its characteristic of describing the electrophysiological behaviour of cardiac muscle cells without taking into account in a detailed way the underlying physiology and the specific mechanisms occurring inside the cells.

References

  1. Joachims, T. (2002), "Optimizing Search Engines using Clickthrough Data", Proceedings of the ACM Conference on Knowledge Discovery and Data Mining
  2. Bing Li; Rong Xiao; Zhiwei Li; Rui Cai; Bao-Liang Lu; Lei Zhang; "Rank-SIFT: Learning to rank repeatable local interest points", Computer Vision and Pattern Recognition (CVPR), 2011
  3. M. Kemeny. Rank Correlation Methods, Hafner, 1955
  4. A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3rd edition, 1974
  5. J. Kemeny and L. Snell. Mathematical Models in THE Social Sciences. Ginn & Co. 1962
  6. Y. Yao. "Measuring retrieval effectiveness based on user preference of documents." Journal of the American Society for Information Science, 46(2): 133–145, 1995.
  7. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Harlow, UK, May 1999
  8. C. Cortes and V.N. Vapnik. "Support-vector networks." Machine Learning Journal, 20: 273–297,1995
  9. V. Vapnik. Statistical Learning Theory. WILEY, Chichester, GB, 1998
  10. N. Fuhr. "Optimum polynomial retrieval functions based on the probability ranking principle." ACM TRANSACTIONS on Information Systems, 7(3): 183–204
  11. N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. "Air/x – a rule-based multistage indexing system for large subject fields." In RIAO, 1991