Lotka's law

Last updated
Lotka law for the 15 most populated categories on arXiv (2023-07). It is a log-log plot. The x-axis is the number of publications, and the y-axis is the number of authors with at least that many publications. Lotka law for the 15 most populated categories on arXiv (2023-07).svg
Lotka law for the 15 most populated categories on arXiv (2023-07). It is a log-log plot. The x-axis is the number of publications, and the y-axis is the number of authors with at least that many publications.

Lotka's law, [1] named after Alfred J. Lotka, is one of a variety of special applications of Zipf's law. It describes the frequency of publication by authors in any given field. Let X be the number of publications, be the number of authors with publications, and be a constants depending on the specific field. Lotka's law states that .

Contents

In Lotka's original publication, he claimed . Subsequent research showed that varies depending on the discipline.

Equivalently, Lotka's law can be stated as , where is the number of authors with at least publications. Their equivalence can be proved by taking the derivative.

Graphical plot of the Lotka function described in the text, with C=1, n=2 Lotka plot.png
Graphical plot of the Lotka function described in the text, with C=1, n=2

Example

Assume that n=2 in a discipline, then as the number of articles published increases, authors producing that many publications become less frequent. There are 1/4 as many authors publishing two articles within a specified time period as there are single-publication authors, 1/9 as many publishing three articles, 1/16 as many publishing four articles, etc.

And if 100 authors wrote exactly one article each over a specific period in the discipline, then:

Portion of articles writtenNumber of authors writing that number of articles
10100/102 = 1
9100/92 ≈ 1 (1.23)
8100/82 ≈ 2 (1.56)
7100/72 ≈ 2 (2.04)
6100/62 ≈ 3 (2.77)
5100/52 = 4
4100/42 ≈ 6 (6.25)
3100/32 ≈ 11 (11.111...)
2100/22 = 25
1100

That would be a total of 294 articles and 155 writers, with an average of 1.9 articles for each writer.

Software

See also

Related Research Articles

<span class="mw-page-title-main">Boltzmann distribution</span> Probability distribution of energy states of a system

In statistical mechanics and mathematics, a Boltzmann distribution is a probability distribution or probability measure that gives the probability that a system will be in a certain state as a function of that state's energy and the temperature of the system. The distribution is expressed in the form:

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Power law</span> Functional relationship between two quantities

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.

<span class="mw-page-title-main">Zipf's law</span> Probability distribution

Zipf's law is an empirical law that often holds, approximately, when a list of measured values is sorted in decreasing order. It states that the value of the nth entry is inversely proportional to n.

<span class="mw-page-title-main">Pink noise</span> Type of signal whose amplitude is inversely proportional to its frequency

Pink noise, 1f noise or fractal noise is a signal or process with a frequency spectrum such that the power spectral density is inversely proportional to the frequency of the signal. In pink noise, each octave interval carries an equal amount of noise energy.

Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of searching for references in science journals. One formulation is that if journals in a field are sorted by number of articles into three groups, each with about one-third of all articles, then the number of journals in each group will be proportional to 1:n:n². There are a number of related formulations of the principle.

<span class="mw-page-title-main">Debye model</span> Method in physics

In thermodynamics and solid-state physics, the Debye model is a method developed by Peter Debye in 1912 for estimating the phonon contribution to the specific heat in a solid. It treats the vibrations of the atomic lattice (heat) as phonons in a box, in contrast to the Einstein photoelectron model, which treats the solid as many individual, non-interacting quantum harmonic oscillators. The Debye model correctly predicts the low-temperature dependence of the heat capacity of solids, which is proportional to – the Debye T 3 law. Similarly to the Einstein photoelectron model, it recovers the Dulong–Petit law at high temperatures. Due to simplifying assumptions, its accuracy suffers at intermediate temperatures.

The Lotka–Volterra equations, also known as the Lotka–Volterra predator–prey model, are a pair of first-order nonlinear differential equations, frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations:

<span class="mw-page-title-main">Alfred J. Lotka</span> American mathematician (1880–1949)

Alfred James Lotka was an American mathematician, physical chemist, and statistician, famous for his work in population dynamics and energetics. A biophysicist, Lotka is best known for his proposal of the predator–prey model, developed simultaneously but independently of Vito Volterra. The Lotka–Volterra model is still the basis of many models used in the analysis of population dynamics in ecology.

<span class="mw-page-title-main">Bibliometrics</span> Statistical analysis of written publications

Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliometrics is closely associated with scientometrics, the analysis of scientific metrics and indicators, to the point that both fields largely overlap.

<span class="mw-page-title-main">Derek J. de Solla Price</span> Historian of Science

Derek John de Solla Price was a British physicist, historian of science, and information scientist. He was known for his investigation of the Antikythera mechanism, an ancient Greek planetary computer, and for quantitative studies on scientific publications, which led to his being described as the "Herald of scientometrics".

<span class="mw-page-title-main">Informetrics</span> Study of the quantitative aspects of information

Informetrics is the study of quantitative aspects of information, it is an extension and evolution of traditional bibliometrics and scientometrics. Informetrics uses bibliometrics and scientometrics methods to study mainly the problems of literature information management and evaluation of science and technology. Informetrics is an independent discipline that uses quantitative methods from mathematics and statistics to study the process, phenomena, and law of informetrics. Informetrics has gained more attention as it is a common scientific method for academic evaluation, research hotspots in discipline, and trend analysis.

<span class="mw-page-title-main">Pearson distribution</span> Family of continuous probability distributions

The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

Citation impact or citation rate is a measure of how many times an academic journal article or book or author is cited by other articles, books or authors. Citation counts are interpreted as measures of the impact or influence of academic work and have given rise to the field of bibliometrics or scientometrics, specializing in the study of patterns of academic impact through citation analysis. The importance of journals can be measured by the average citation rate, the ratio of number of citations to number articles published within a given time period and in a given index, such as the journal impact factor or the citescore. It is used by academic institutions in decisions about academic tenure, promotion and hiring, and hence also used by authors in deciding which journal to publish in. Citation-like measures are also used in other fields that do ranking, such as Google's PageRank algorithm, software metrics, college and university rankings, and business performance indicators.

The h-index is an author-level metric that measures both the productivity and citation impact of the publications, initially used for an individual scientist or scholar. The h-index correlates with success indicators such as winning the Nobel Prize, being accepted for research fellowships and holding positions at top universities. The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications. The index has more recently been applied to the productivity and impact of a scholarly journal as well as a group of scientists, such as a department or university or country. The index was suggested in 2005 by Jorge E. Hirsch, a physicist at UC San Diego, as a tool for determining theoretical physicists' relative quality and is sometimes called the Hirsch index or Hirsch number.

"The rich get richer and the poor get poorer" is an aphorism attributed to Percy Bysshe Shelley. In A Defence of Poetry Shelley remarked that the promoters of utility had exemplified the saying, "To him that hath, more shall be given; and from him that hath not, the little that he hath shall be taken away. The rich have become richer, and the poor have become poorer; and the vessel of the State is driven between the Scylla and Charybdis of anarchy and despotism." It describes a positive feedback loop.

<span class="mw-page-title-main">Preferential attachment</span> Stochastic process formalizing cumulative advantage

A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. "Preferential attachment" is only the most recent of many names that have been given to such processes. They are also referred to under the names Yule process, cumulative advantage, the rich get richer, and the Matthew effect. They are also related to Gibrat's law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. If preferential attachment is non-linear, measured distributions may deviate from a power law. These mechanisms may generate distributions which are approximately power law over transient periods.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

Author-level metrics are citation metrics that measure the bibliometric impact of individual authors, researchers, academics, and scholars. Many metrics have been developed that take into account varying numbers of factors.

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions. Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology. It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models. The INLA method is implemented in the R-INLA R package.

References

  1. Lotka, Alfred J. (1926). "The frequency distribution of scientific productivity". Journal of the Washington Academy of Sciences. 16 (12): 317–324.

Further reading