![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
In computational linguistics, second-order co-occurrence pointwise mutual information is a semantic similarity measure. To assess the degree of association between two given words, it uses pointwise mutual information (PMI) to sort lists of important neighbor words of the two target words from a large corpus.
The PMI-IR method[ clarification needed ] used AltaVista's Advanced Search query syntax to calculate probabilities. Note that the "NEAR" search operator of AltaVista is an essential operator in the PMI-IR method.[ citation needed ] However, it is no longer in use in AltaVista; this means that, from the implementation point of view, it is not possible to use the PMI-IR method in the same form in new systems. In any case, from the algorithmic point of view, the advantage of using SOC-PMI is that it can calculate the similarity between two words that do not co-occur frequently, because they co-occur with the same neighboring words. For example, the British National Corpus (BNC) has been used as a source of frequencies and contexts.
The method considers the words that are common in both lists and aggregate their PMI values (from the opposite list) to calculate the relative semantic similarity. We define the pointwise mutual information function for only those words having ,
where tells us how many times the type appeared in the entire corpus, tells us how many times word appeared with word in a context window and is total number of tokens in the corpus. Now, for word , we define a set of words, , sorted in descending order by their PMI values with and taken the top-most words having .
The set , contains words ,
A rule of thumb is used to choose the value of . The -PMI summation function of a word is defined with respect to another word. For word with respect to word it is:
where which sums all the positive PMI values of words in the set also common to the words in the set . In other words, this function actually aggregates the positive PMI values of all the semantically close words of which are also common in 's list. should have a value greater than 1. So, the -PMI summation function for word with respect to word having and the -PMI summation function for word with respect to word having are
and
respectively.
Finally, the semantic PMI similarity function between the two words, and , is defined as
The semantic word similarity is normalized, so that it provides a similarity score between and inclusively. The normalization of semantic similarity algorithm returns a normalized score of similarity between two words. It takes as arguments the two words, and , and a maximum value, , that is returned by the semantic similarity function, Sim(). For example, the algorithm returns 0.986 for words cemetery and graveyard with (for SOC-PMI method).
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet and first applied by Rosin & Rammler (1933) to describe a particle size distribution.
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterizations in common use:
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.
In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices.
In mathematics, the Mittag-Leffler functionEα,β is a special function, a complex function which depends on two complex parameters α and β. It may be defined by the following series when the real part of α is strictly positive:
In mathematics, the Gaussian or ordinary hypergeometric function2F1(a,b;c;z) is a special function represented by the hypergeometric series, that includes many other special functions as specific or limiting cases. It is a solution of a second-order linear ordinary differential equation (ODE). Every second-order linear ODE with three regular singular points can be transformed into this equation.
In probability theory and statistics, the beta prime distribution is an absolutely continuous probability distribution.
In applied mathematics, comparison functions are several classes of continuous functions, which are used in stability theory to characterize the stability properties of control systems as Lyapunov stability, uniform asymptotic stability etc.
In natural language processing, the latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's presence is attributable to one of the document's topics. LDA is an example of a topic model.
Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.
In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.
A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.
Tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.
In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.
In mathematics, the Veblen functions are a hierarchy of normal functions, introduced by Oswald Veblen in Veblen (1908). If φ0 is any normal function, then for any non-zero ordinal α, φα is the function enumerating the common fixed points of φβ for β<α. These functions are all normal.
In mathematics, the Fortuin–Kasteleyn–Ginibre (FKG) inequality is a correlation inequality, a fundamental tool in statistical mechanics and probabilistic combinatorics, due to Cees M. Fortuin, Pieter W. Kasteleyn, and Jean Ginibre (1971). Informally, it says that in many random systems, increasing events are positively correlated, while an increasing and a decreasing event are negatively correlated. It was obtained by studying the random cluster model.
In probability theory and statistics, the Exponential-Logarithmic (EL) distribution is a family of lifetime distributions with decreasing failure rate, defined on the interval [0, ∞). This distribution is parameterized by two parameters and .
A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product
In mathematics, Ricci calculus constitutes the rules of index notation and manipulation for tensors and tensor fields on a differentiable manifold, with or without a metric tensor or connection. It is also the modern name for what used to be called the absolute differential calculus, developed by Gregorio Ricci-Curbastro in 1887–1896, and subsequently popularized in a paper written with his pupil Tullio Levi-Civita in 1900. Jan Arnoldus Schouten developed the modern notation and formalism for this mathematical framework, and made contributions to the theory, during its applications to general relativity and differential geometry in the early twentieth century.