Text corpora (singular: text corpus ) are large and structured sets of texts, which have been systematically collected. Text corpora are used by both AI developers to train large language models and corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and variation, and teaching language proficiency. [1]
{{cite journal}}
: External link in |journal=
(help)