Lexical diversity

Last updated September 16, 2025

Lexical diversity is one aspect of 'lexical richness' and refers to the ratio of different unique word stems (types) to the total number of words (tokens). The term is used in applied linguistics and is quantitatively calculated using numerous different measures including Type-Token Ratio (TTR)^[1], vocd,^[2] and the measure of textual lexical diversity (MTLD).^[3]

A common problem with lexical diversity measures, especially TTR, is that text samples containing large number of tokens give lower values for TTR since it is often necessary for the writer or speaker to re-use many words. One consequence of this is that it is often assumed that lexical diversity can only be used to compare texts of the same length.^[4] Yet, many measures of lexical diversity attempt to account for sensitivity to text length. Surveys of such measures are provided by Baayen ^[5] and more recently Bestgen.^[6]

Definitions

In a 2013 article Scott Jarvis proposed that lexical diversity, similar to diversity in ecology, is a perceptual phenomenon. Lexical redundancy is a positive counterpart of lexical diversity in the same way as lexical variability is the mirror image of repetition. According to Jarvis's model, lexical diversity includes variability, volume, evenness, rarity, dispersion and disparity.^[7]

According to Jarvis, the six properties of lexical diversity should be measured by the following indices.

Property	Measure
Variability	Measure of Textual Lexical Diversity (MTLD)
Volume	Total number of words in the text
Evenness	Standard deviation of tokens per type
Rarity	Mean BNC rank
Dispersion	Mean distance between tokens of type
Disparity	Mean number of words per sense or Latent Semantic Analysis

References

↑ Rosillo-Rodes, Pablo; San Miguel, Maxi; Sánchez, David (2025-07-14). "Entropy and type-token ratio in gigaword corpora". Physical Review Research. 7 (3). doi: 10.1103/rxxz-lk3n . ISSN 2643-1564.
↑ McCarthy, Phillip; Jarvis, Scott (2007). "vocd: A theoretical and empirical evaluation". Language Testing. 24 (4): 459–488. doi:10.1177/0265532207080767.
↑ McCarthy, Phillip (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) (Doctoral Dissertation).
↑ Lexical diversity and lexical density in speech and writing: A developmental perspective - V Johansson - Working Papers in Linguistics, 2009
↑ Baayen, R. H. (2001). Springer (ed.). Word frequency distributions. Text, Speech and Language Technology. Vol. 18. doi:10.1007/978-94-010-0844-0. ISBN 978-1-4020-0927-3.
↑ Bestgen, Yves (2024). "Measuring Lexical Diversity in Texts: The Twofold Length Problem" . Language Learning. 74: 638–671. doi:10.1111/lang.12630.
↑ Jarvis, Scott (2013). "Capturing the Diversity in Lexical Diversity". Language Learning. 63: 87–106. arXiv: 2307.04626 . doi:10.1111/j.1467-9922.2012.00739.x.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Rosillo-Rodes, Pablo; San Miguel, Maxi; Sánchez, David (2025-07-14). "Entropy and type-token ratio in gigaword corpora". Physical Review Research. 7 (3). doi: 10.1103/rxxz-lk3n . ISSN 2643-1564.

[2] McCarthy, Phillip; Jarvis, Scott (2007). "vocd: A theoretical and empirical evaluation". Language Testing. 24 (4): 459–488. doi:10.1177/0265532207080767.

[3] McCarthy, Phillip (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) (Doctoral Dissertation).

[4] Lexical diversity and lexical density in speech and writing: A developmental perspective - V Johansson - Working Papers in Linguistics, 2009

[5] Baayen, R. H. (2001). Springer (ed.). Word frequency distributions. Text, Speech and Language Technology. Vol. 18. doi:10.1007/978-94-010-0844-0. ISBN 978-1-4020-0927-3.

[6] Bestgen, Yves (2024). "Measuring Lexical Diversity in Texts: The Twofold Length Problem" . Language Learning. 74: 638–671. doi:10.1111/lang.12630.

[7] Jarvis, Scott (2013). "Capturing the Diversity in Lexical Diversity". Language Learning. 63: 87–106. arXiv: 2307.04626 . doi:10.1111/j.1467-9922.2012.00739.x.

[1]

[2]

[3]

[4]

[5]

[6]

[7]