Lincoln index

Last updated

The Lincoln index is a statistical measure used in several fields to estimate the number of cases that have not yet been observed, based on two independent sets of observed cases. Described by Frederick Charles Lincoln in 1930, it is also sometimes known as the Lincoln-Petersen method after C.G. Johannes Petersen who was the first to use the related mark and recapture method. [1]

Contents

Applications

Consider two observers who separately count the different species of plants or animals in a given area. If they each come back having found 100 species but only 5 particular species are found by both observers, then each observer clearly missed at least 95 species (that is, the 95 that only the other observer found). Thus, we know that both observers miss a lot. On the other hand, if 99 of the 100 species each observer found had been found by both, it is fair to expect that they have found a far higher percentage of the total species that are there to find.

The same reasoning applies to mark and recapture. If some animals in a given area are caught and marked, and later a second round of captures is done: the number of marked animals found in the second round can be used to generate an estimate of the total population. [2]

Another example arises in computational linguistics for estimating the total vocabulary of a language. Given two independent samples, the overlap between their vocabularies enables a useful estimate of how many more vocabulary items exist but did not happen to show up in either sample. A similar example involves estimating the number of typographical errors remaining in a text, from two proofreaders' counts.

Formulation

The Lincoln Index formalizes this phenomenon. If E1 and E2 are the number of species (or words, or other phenomena) observed by two independent methods, and S is the number of observations in common, then the Lincoln Index is simply

For values of S < 10, this estimate is rough, and becomes extremely rough for values of S < 5. In the case where S = 0 (that is, there is no overlap at all) the Lincoln Index is formally undefined. This can arise if the observers only find a small percentage of the actual species (perhaps by not looking hard enough or long enough), if the observers are using methods that are not statistically independent (for example if one looks only for large creatures and the other only for small), or in other circumstances.

Limitations

The Lincoln Index is merely an estimate. For example, the species in a given area could tend to be either very common or very rare, or tend to be either very hard or very easy to see. [3] Then it would be likely that both observers would find a large share of the common species, and that both observers would miss a large share of the rare ones. Such distributions would throw off the consequent estimate. However, such distributions are unusual for natural phenomena, as suggested by Zipf's Law).

T. J. Gaskell and B. J. George propose an enhancement of the Lincoln Index that claims to reduce bias. [4]

See also

Further reading

Notes

  1. Southwood, T.R.E. & Henderson, P. (2000) Ecological Methods, 3rd edn. Blackwell Science, Oxford.
  2. "Estimating Population Sizes by Mark-recapture and Removal Sampling Methods". University of Texas.
  3. T. Bohlin; B. Sundstrom (1977). "Influence of unequal catchability on population estimates using the Lincoln and the removal method applied to electro-fishing". OIKOS (28): 123–129. JSTOR   3543331.
  4. Gaskell and George (1972)

Related Research Articles

In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:

"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores."

In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This gives a range of values for an unknown parameter. The interval has an associated confidence level that gives the probability with which an estimated interval will contain the true value of the parameter. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

Experimental psychology refers to work done by those who apply experimental methods to psychological study and the processes that underlie it. Experimental psychologists employ human participants and animal subjects to study a great many topics, including sensation & perception, memory, cognition, learning, motivation, emotion; developmental processes, social psychology, and the neural substrates of all of these.

Species diversity is the number of different species that are represented in a given community. The effective number of species refers to the number of equally abundant species needed to obtain the same mean proportional species abundance as that observed in the dataset of interest. Meanings of species diversity may include species richness, taxonomic or phylogenetic diversity, and/or species evenness. Species richness is a simple count of species. Taxonomic or phylogenetic diversity is the genetic relationship between different groups of species. Species evenness quantifies how equal the abundances of the species are.

Regression analysis Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

<i>n</i>-gram Contiguous sequence of n items from a given sample of text or speech

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.

Mist net

Mist nets are used by ornithologists and bat biologists to capture wild birds and bats for banding or other research projects. Mist nets are typically made of nylon or polyester mesh suspended between two poles, resembling a volleyball net. When properly deployed in the correct habitat, the nets are virtually invisible. Mist nets have shelves created by horizontally strung lines that create a loose, baggy pocket. When a bird or bat hits the net, it falls into this pocket, where it becomes tangled.

Mark and recapture

Mark and recapture is a method commonly used in ecology to estimate an animal population's size where it is impractical to count every individual. A portion of the population is captured, marked, and released. Later, another portion will be captured and the number of marked individuals within the sample is counted. Since the number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. Other names for this method, or closely related methods, include capture-recapture, capture-mark-recapture, mark-recapture, sight-resight, mark-release-recapture, multiple systems estimation, band recovery, the Petersen method, and the Lincoln method.

Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies, and is an important concept in reasoning about evidence more generally. Internal validity is determined by how well a study can rule out alternative explanations for its findings. It contrasts with external validity, the extent to which results can justify conclusions about other contexts.

Species richness Variety of species in an ecological community, landscape or region

Species richness is the number of different species represented in an ecological community, landscape or region. Species richness is simply a count of species, and it does not take into account the abundances of the species or their relative abundance distributions. Species richness is sometimes considered synonymous with species diversity, but the formal metric species diversity takes into account both species richness and species evenness.

Discrete uniform distribution Probability distribution on equally likely outcomes

In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen".

Wildlife conservation Practice of protecting wild plant and animal species and their habitats

Wildlife conservation refers to the practice of protecting wild species and their habitats in order to maintain healthy wildlife species or populations and to restore, protect or enhance natural ecosystems. Major threats to wildlife include habitat destruction, degradation, fragmentation, overexploitation, poaching, pollution and climate change. The IUCN estimates that 27,000 species of the ones assessed are at risk for extinction. Expanding to all existing species, a 2019 UN report on biodiversity put this estimate even higher at a million species. It is also being acknowledged that an increasing number of ecosystems on Earth containing endangered species are disappearing. To address these issues, there have been both national and international governmental efforts to preserve Earth's wildlife. Prominent conservation agreements include the 1973 Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and the 1992 Convention on Biological Diversity (CBD). There are also numerous nongovernmental organizations (NGO's) dedicated to conservation such as the Nature Conservancy, World Wildlife Fund, and Conservation International.

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

In ecology, the species discovery curve or species accumulation curve is a graph recording the cumulative number of species of living things recorded in a particular environment as a function of the cumulative effort expended searching for them. It is related to, but not identical with, the species-area curve.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

Species evenness refers to how close in numbers each species in an environment is. Mathematically it is defined as a diversity index, a measure of biodiversity which quantifies how equal the community is numerically. So if there are 40 foxes and 1000 dogs, the community is not very even. But if there are 40 foxes and 42 dogs, the community is quite even. The evenness of a community can be represented by Pielou's evenness index:

In statistics, inter-rater reliability is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.

Abundance (ecology) Relative representation of a species in anr ecosystem

In ecology, local abundance is the relative representation of a species in a particular ecosystem. It is usually measured as the number of individuals found per sample. The ratio of abundance of one species to one or multiple other species living in an ecosystem is referred to as relative species abundances. Both indicators are relevant for computing biodiversity.

Carl Georg Johannes Petersen was a Danish marine biologist, especially fisheries biologist. He was the first to describe communities of benthic marine invertebrates and is often considered a founder of modern fisheries research. Especially he was the first to use the Mark and recapture method which he used to estimate the size of a Plaice population. The Lincoln-Petersen method is named after him and Frederick Charles Lincoln who first described the method in 1930.

Dark diversity is the set of species that are absent from a study site but present in the surrounding region and potentially able to inhabit particular ecological conditions. The term was introduced in 2011 by three researchers from the University of Tartu and was inspired by the idea of dark matter in physics since dark diversity too cannot be directly observed.