Below are two estimates of the most common words in Modern Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language, that has been carefully prepared for linguistic analysis. To determine which words are the most common, researchers create a database of all the words found in the corpus, and categorise them based on the context in which they are used.
The first table lists the 100 most common word forms from the Corpus de Referencia del Español Actual (CREA), a text corpus compiled by the Real Academia Española (RAE). The RAE is Spain's official institution for documenting, planning, and standardising the Spanish language. A word form is any of the grammatical variations of a word.
The second table is a list of 100 most common lemmas found in a text corpus compiled by Mark Davies and other language researchers at Brigham Young University in the United States. A lemma is the primary form of a word—the one that would appear in a dictionary. The Spanish infinitive tener ("to have") is a lemma, while tiene ("has")—which is a conjugation of tener—is a word form.
The list below comes from "1000 formas más frecuentes" (transl. 1000 most frequent word forms)", a list published by the Real Academia Española (RAE) from analysis of more than 160 million word forms found in the Corpus de Referencia del Español Actual (transl. Reference Corpus of Current Spanish), or CREA. CREA is a computerised corpus of texts written in Spanish, and of transcripts of spoken Spanish. It includes books, magazines, and newspapers with a wide variety of content, as well as transcripts of spoken language from radio and television broadcasts and other sources. All the works in the collection are from 1975 to 2004. CREA includes samples from all Spanish-speaking countries. [1]
The list of "2000 most frequent word forms" comes from an analysis of CREA version 3.2. [2] Plurals, verb conjugations, and other inflections are ranked separately. Homonyms, however, are not distinguished from one another. CREA 3.2 was published in June 2008. [1]
Rank | Word form | Occurrences | Part of speech | Translation |
---|---|---|---|---|
1 | de | 9,999,518 | preposition | of; from |
2 | la | 6,277,560 | article, pronoun | the; third person feminine singular pronoun |
3 | que | 4,681,839 | conjunction | that, which |
4 | el | 4,569,652 | article | the |
5 | en | 4,234,281 | preposition | in, on |
6 | y | 4,180,279 | conjunction | and |
7 | a | 3,260,939 | preposition | to, at |
8 | los | 2,618,657 | article, pronoun | the; third person masculine direct object |
9 | se | 2,022,514 | pronoun | -self, oneself (reflexive) |
10 | del | 1,857,225 | preposition | from the |
11 | las | 1,686,741 | article, pronoun | the; third person feminine direct object |
12 | un | 1,659,827 | article | a, an |
13 | por | 1,561,904 | preposition | by, for, through |
14 | con | 1,481,607 | preposition | with |
15 | no | 1,465,503 | adverb | no; not |
16 | una | 1,347,603 | article | a, an, one |
17 | su | 1,103,617 | possessive | his/her/its/your |
18 | para | 1,062,152 | preposition | for, to, in order to |
19 | es | 1,019,669 | verb | is |
20 | al | 951,054 | preposition | to the |
21 | lo | 866,955 | article, pronoun | the; third person masculine direct object |
22 | como | 773,465 | conjunction | like, as |
23 | más | 661,696 | adjective | more |
24 | o | 542,284 | conjunction | or |
25 | pero | 450,512 | conjunction | but |
26 | sus | 449,870 | possessive | his/her/its/your |
27 | le | 413,241 | pronoun | third person indirect object |
28 | ha | 380,339 | verb | he/she/it has [done something]; you (formal) have [done something] |
29 | me | 374,368 | pronoun | me |
30 | si | 327,480 | conjunction | if, whether |
31 | sin | 298,383 | preposition | without |
32 | sobre | 289,704 | preposition | on top of, over, about |
33 | este | 285,461 | adjective | this |
34 | ya | 274,177 | adverb | already; still |
35 | entre | 267,493 | preposition | between |
36 | cuando | 257,272 | conjunction | when |
37 | todo | 247,340 | adjective | all, every |
38 | esta | 238,841 | adjective | this |
39 | ser | 232,924 | verb | to be |
40 | son | 232,415 | verb | they are, you (pl.) are |
41 | dos | 228,439 | number | two |
42 | también | 227,411 | adverb | too, also, as well |
43 | fue | 223,791 | verb | was |
44 | había | 223,430 | verb | I/he/she/it/there was (or used to be) |
45 | era | 219,933 | verb | was |
46 | muy | 208,540 | adverb | very |
47 | años | 203,027 | noun (masculine) | years |
48 | hasta | 202,935 | preposition | until |
49 | desde | 198,647 | preposition | from; since |
50 | está | 194,168 | verb | is |
51 | mi | 186,360 | possessive | my |
52 | porque | 185,700 | conjunction | because |
53 | qué | 184,956 | pronoun | what?; which?; how adjective |
54 | sólo | 170,552 | adverb | only, solely |
55 | han | 169,718 | verb | they/you (pl.) have [done something] |
56 | yo | 167,684 | pronoun | I |
57 | hay | 164,940 | verb | there is/are |
58 | vez | 163,538 | noun (feminine) | time, instance |
59 | puede | 161,219 | verb | can |
60 | todos | 158,168 | adjective | all; every |
61 | así | 155,645 | adverb | like that |
62 | nos | 154,412 | pronoun | us |
63 | ni | 153,451 | conjunction, adverb | neither; nor; no even |
64 | parte | 148,750 | noun (masculine / feminine) | part; message |
65 | tiene | 147,274 | verb | has |
66 | él | 139,080 | pronoun (masculine) | he, it |
67 | uno | 136,020 | number | one |
68 | donde | 132,077 | preposition | where |
69 | bien | 130,957 | adjective | fine, well |
70 | tiempo | 130,896 | noun (masculine) | time; weather |
71 | mismo | 130,746 | adjective | same |
72 | ese | 127,976 | pronoun | that |
73 | ahora | 125,661 | adverb | now |
74 | cada | 124,558 | determiner | each; every |
75 | e | 123,729 | conjunction | and |
76 | vida | 123,491 | noun (feminine) | life |
77 | otro | 121,983 | adjective | other, another |
78 | después | 121,746 | preposition | after |
79 | te | 120,052 | pronoun | to you, for you; yourself |
80 | otros | 119,500 | pronoun | others |
81 | aunque | 115,556 | conjunction | though, although, even though |
82 | esa | 115,377 | adjective | that |
83 | eso | 114,523 | pronoun | that |
84 | hace | 114,507 | verb | he/she/it does/makes |
85 | otra | 113,982 | adjective, pronoun | other; another |
86 | gobierno | 113,011 | noun (masculine) | government |
87 | tan | 112,471 | adverb | so |
88 | durante | 112,020 | preposition | during |
89 | siempre | 111,557 | adverb | always |
90 | día | 110,921 | noun (masculine) | day |
91 | tanto | 110,679 | adjective, adverb | so much |
92 | ella | 110,620 | pronoun | she, her; it |
93 | tres | 109,542 | number | three |
94 | sí | 108,631 | noun, pronoun | yes, if; reflexive pronoun |
95 | dijo | 108,471 | verb | said; told |
96 | sido | 107,352 | past participle | been |
97 | gran | 106,991 | adjective | large, great, big |
98 | país | 104,568 | noun (masculine) | country |
99 | según | 104,204 | preposition | as; according to |
100 | menos | 103,498 | adjective | less; fewer |
In 2006, Mark Davies, an associate professor of linguistics at Brigham Young University, published his estimate of the 5000 most common words in Modern Spanish. To make this list, he compiled samples only from 20th-century sources—especially from the years 1970 to 2000. Most of the sources are from the 1990s. Of the 20 million words in the corpus, about one-third (~6,750,000 words) come from transcripts of spoken Spanish: conversations, interviews, lectures, sermons, press conferences, sports broadcasts, and so on. Among the written sources are novels, plays, short stories, letters, essays, newspapers, and the encyclopedia Encarta . The samples, written and spoken, come from Spain and at least 10 Latin American countries. Most of the samples were previously compiled for the Corpus del Español (2001), a 100 million-word corpus that includes works from the 13th century through the 20th. [3] [4]
The 5000 words in Davies' list are lemmas. [5] A lemma is the form of the word as it would appear in a dictionary. [6] Singular nouns and plurals, for example, are treated as the same word, as are infinitives and verb conjugations. The table below includes the top 100 words from Davies' list of 5000. [7] [8] This list distinguishes between the definite articles lo and la and the pronouns lo and la; all are ranked individually. The adjectives ese and esa are ranked together (as are este and esta) ), but the pronoun eso is separate. All conjugations of a verb are ranked together.
A highlighted row indicates that the word was found to occur especially frequently in samples of spoken Spanish. [9]
Rank | Lemma | Occurrences | Part of speech | Translation |
---|---|---|---|---|
1 | el / la | 2,037,803 | article | the |
2 | de | 1,319,834 | preposition | of, from |
3 | que | 662,653 | conjunction | that, which |
4 | y | 562,162 | conjunction | and |
5 | a | 529,899 | preposition | to, at |
6 | en | 507,233 | preposition | in, on |
7 | un | 434,022 | article | a, an |
8 | ser | 374,194 | verb | to be |
9 | se | 329,012 | pronoun | -self, oneself (reflexive) |
10 | no | 257,365 | adverb | no |
11 | haber | 196,962 | verb | to have |
12 | por | 190,975 | preposition | by, for, through |
13 | con | 184,597 | preposition | with |
14 | su | 187,810 | adjective | his, her, their, your |
15 | para | 126,061 | preposition | for, to, in order to |
16 | como | 106,840 | conjunction | like, as |
17 | estar | 106,429 | verb | to be |
18 | tener | 106,642 | verb | to have |
19 | le | 98,211 | pronoun | third person indirect object |
20 | lo | 91,035 | article | the |
21 | lo | 92,519 | pronoun | third person masculine direct object |
22 | todo | 88,057 | adjective | all, every |
23 | pero | 82,435 | conjunction | but, yet, except |
24 | más | 92,352 | adjective | more |
25 | hacer | 81,619 | verb | to do; to make |
26 | o | 82,444 | conjunction | or |
27 | poder | 76,738 | verb | to be able to, can |
28 | decir | 79,343 | verb | to tell, say |
29 | este / esta | 80,544 | adjective | this |
30 | ir | 70,352 | verb | to go |
31 | otro | 61,726 | adjective | other, another |
32 | ese / esa | 60,989 | adjective | that |
33 | la | 55,523 | pronoun | third person feminine direct object |
34 | si | 53,608 | conjunction | if, whether |
35 | me | 95,577 | pronoun | me |
36 | ya | 46,778 | adverb | already, still |
37 | ver | 45,854 | verb | to see |
38 | porque | 44,500 | conjunction | because |
39 | dar | 40,233 | verb | to give |
40 | cuando | 39,726 | conjunction | when |
41 | él | 38,597 | pronoun | he |
42 | muy | 39,558 | adverb | very, really |
43 | sin | 40,432 | preposition | without |
44 | vez | 35,286 | noun (feminine) | time, occurrence |
45 | mucho | 36,391 | adjective | much, many, a lot |
46 | saber | 37,092 | verb | to know |
47 | qué | 42,000 | pronoun | what?; which?; how adjective |
48 | sobre | 35,038 | preposition | on top of, over, about |
49 | mi | 45,636 | adjective | my |
50 | alguno | 30,485 | adjective / pronoun | some; someone |
51 | mismo | 29,569 | adjective | same |
52 | yo | 54,635 | pronoun | I |
53 | también | 33,348 | adverb | also |
54 | hasta | 29,506 | preposition / adverb | until, up to; even |
55 | año | 33,053 | noun (masculine) | year |
56 | dos | 27,733 | number | two |
57 | querer | 28,696 | verb | to want, love |
58 | entre | 30,756 | preposition | between |
59 | así | 24,832 | adverb | like that |
60 | primero | 26,553 | adjective | first |
61 | desde | 25,288 | preposition | from, since |
62 | grande | 25,963 | adjective | large, great, big |
63 | eso | 31,636 | pronoun (neuter gender) | that |
64 | ni | 24,261 | conjunction | not even, neither, nor |
65 | nos | 26,349 | pronoun | us |
66 | llegar | 22,878 | verb | to arrive |
67 | pasar | 22,466 | verb | to pass; to happen; to spend time |
68 | tiempo | 22,432 | noun (masculine) | time, weather |
69 | ella(s) | 24,770 | pronoun | she; (plural) them |
70 | sí | 33,828 | adverb | yes |
71 | día | 24,715 | noun (masculine) | day |
72 | uno | 21,407 | number | one |
73 | bien | 21,589 | adverb | well |
74 | poco | 20,986 | adjective / adverb | little, few; a little bit |
75 | deber | 22,232 | verb | should, ought to; to owe |
76 | entonces | 23,548 | adverb | so, then |
77 | poner | 20,330 | verb | to put (on); to get [adjective] |
78 | cosa | 23,943 | noun (feminine) | thing |
79 | tanto | 20,531 | adjective | much |
80 | hombre | 20,292 | noun (masculine) | man, mankind, husband |
81 | parecer | 19,964 | verb | to seem, to look like |
82 | nuestro | 20,666 | adjective | our |
83 | tan | 19,002 | adverb | such, a, too, so |
84 | donde | 18,852 | conjunction | where |
85 | ahora | 21,030 | adverb | now |
86 | parte | 20,319 | noun (feminine) | part, portion |
87 | después | 20,229 | adverb | after |
88 | vida | 18,045 | noun (feminine) | life |
89 | quedar | 18,152 | verb | to remain, to stay |
90 | siempre | 17,689 | adverb | always |
91 | creer | 21,257 | verb | to believe |
92 | hablar | 19,006 | verb | to speak, to talk |
93 | llevar | 17,062 | verb | to take, to carry |
94 | dejar | 18,185 | verb | to let, to leave |
95 | nada | 19,365 | pronoun | nothing |
96 | cada | 17,155 | adjective | each, every |
97 | seguir | 16,104 | verb | to follow |
98 | menos | 15,527 | adjective | less, fewer |
99 | nuevo | 17,381 | adjective | new |
100 | encontrar | 15,556 | verb | to find |
Spanish or Castilian (castellano) is a Romance language of the Indo-European language family that evolved from the Vulgar Latin spoken on the Iberian Peninsula of Europe. Today, it is a global language with about 500 million native speakers, mainly in the Americas and Spain, and about 600 million when including second language speakers. Spanish is the official language of 20 countries, as well as one of the six official languages of the United Nations. Spanish is the world's second-most spoken native language after Mandarin Chinese; the world's fourth-most spoken language overall after English, Mandarin Chinese, and Hindustani (Hindi-Urdu); and the world's most widely spoken Romance language. The country with the largest population of native speakers is Mexico.
Some of the regional varieties of the Spanish language are quite divergent from one another, especially in pronunciation and vocabulary, and less so in grammar.
In Spanish grammar, voseo is the use of vos as a second-person singular pronoun, along with its associated verbal forms, in certain regions where the language is spoken. In those regions it replaces tuteo, i.e. the use of the pronoun tú and its verbal forms. Voseo can also be found in the context of using verb conjugations for vos with tú as the subject pronoun.
Rioplatense Spanish, also known as Rioplatense Castilian, or River Plate Spanish, is a variety of Spanish originating in and around the Río de la Plata Basin, and now spoken throughout most of Argentina and Uruguay. It is the most prominent dialect to employ voseo in both speech and writing. Many features of Rioplatense are also shared with the varieties spoken in south and eastern Bolivia, and Paraguay. This dialect is influenced by Italian languages, due to the historically significant Italian immigration in the area, and therefore has several Italian loanwords and is often spoken with an intonation resembling that of the Neapolitan language of Southern Italy.
This article presents a set of paradigms—that is, conjugation tables—of Spanish verbs, including examples of regular verbs and some of the most common irregular verbs. For other irregular verbs and their common patterns, see the article on Spanish irregular verbs.
Spanish adjectives are similar to those in most other Indo-European languages. They are generally postpositive, and they agree in both gender and number with the noun they modify.
Spanish orthography is the orthography used in the Spanish language. The alphabet uses the Latin script. The spelling is fairly phonemic, especially in comparison to more opaque orthographies like English, having a relatively consistent mapping of graphemes to phonemes; in other words, the pronunciation of a given Spanish-language word can largely be predicted from its spelling and to a slightly lesser extent vice versa. Spanish punctuation uniquely includes the use of inverted question and exclamation marks: ⟨¿⟩⟨¡⟩.
Standard Spanish, also called the norma culta, 'cultivated norm', refers to the standard, or codified, variety of the Spanish language, which most writing and formal speech in Spanish tends to reflect. This standard, like other standard languages, tends to reflect the norms of upper-class, educated speech. There is variation within this standard such that one may speak of the Mexican, Latin American, Peninsular, and Rioplatense standards, in addition to the standard forms developed by international organizations and multinational companies.
Central American Spanish is the general name of the Spanish language dialects spoken in Central America. More precisely, the term refers to the Spanish language as spoken in Costa Rica, El Salvador, Guatemala, Honduras, and Nicaragua. Panamanian Spanish is considered a variety of Caribbean Spanish, it is transitional between Central American and Caribbean dialects.
Chilean Spanish is any of several varieties of the Spanish language spoken in most of Chile. Chilean Spanish dialects have distinctive pronunciation, grammar, vocabulary, and slang usages that differ from those of Standard Spanish. Formal Spanish in Chile has recently incorporated an increasing number of colloquial elements.
Cuban Spanish is the variety of the Spanish language as it is spoken in Cuba. As a Caribbean variety of Spanish, Cuban Spanish shares a number of features with nearby varieties, including coda weakening and neutralization, non-inversion of Wh-questions, and a lower rate of dropping of subject pronouns compared to other Spanish varieties. As a variety spoken in Latin America, it has seseo and lacks the vosotros pronoun.
Dominican Spanish is Spanish as spoken in the Dominican Republic; and also among the Dominican diaspora, most of whom live in the United States, chiefly in New York City, New Jersey, Connecticut, Rhode Island, Massachusetts, Pennsylvania, and Florida.
Portuguese and Spanish, although closely related Romance languages, differ in many aspects of their phonology, grammar, and lexicon. Both belong to a subset of the Romance languages known as West Iberian Romance, which also includes several other languages or dialects with fewer speakers, all of which are mutually intelligible to some degree. A 1949 study by Italian-American linguist Mario Pei, analyzing the degree of difference from a language's parent by comparing phonology, inflection, syntax, vocabulary, and intonation, indicated the following percentages : In the case of Spanish it was 20%, the third closest Romance language to Latin, only behind Sardinian and Italian. Portuguese was 31%, making it the second furthest language from Latin after French.
Studies that estimate and rank the most common words in English examine texts written in English. Perhaps the most comprehensive such analysis is one that was conducted against the Oxford English Corpus (OEC), a massive text corpus that is written in the English language.
Peninsular Spanish, also known as the Spanish of Spain, European Spanish, or Iberian Spanish, is the set of varieties of the Spanish language spoken in Peninsular Spain. This construct is often framed in opposition to varieties from the Americas and the Canary Islands.
This article describes some of the longest words in the Spanish language.
Nicaraguan Spanish is geographically defined as the form of Spanish spoken in Nicaragua. Affectionately, Nicaraguan Spanish is often called Nicañol.
Mark E. Davies is an American linguist. He specializes in corpus linguistics and language variation and change. He is the creator of most of the text corpora from English-Corpora.org as well as the Corpus del español and the Corpus do português. He has also created large datasets of word frequency, collocates, and n-grams data, which have been used by many large companies in the fields of technology and also language learning.
Spanish personal pronouns have distinct forms according to whether they stand for the subject (nominative) or object, and third-person pronouns make an additional distinction for direct object (accusative) or indirect object (dative), and for reflexivity as well. Several pronouns also have special forms used after prepositions.
Roquetas Pidgin Spanish is a Spanish-based pidgin spoken among agricultural workers in Roquetas de Mar in Spain. Immigrants attracted to work in the greenhouses of the area come from many countries in north and west Africa and eastern Europe, and few speak any Spanish before arrival. The resulting pidgin has such typical characteristics as an avoidance of antonyms.