How Many Words Do You Know?

From Neolithic Homo sapiens to the great contemporary societies, spoken languages are ever-evolving entities laying the foundations of human relations and identity construction. More than mere vehicles for expression, words and grammatical structures offer an unique window into the different ways of perceiving the world through the lenses of different cultures. Surprisingly, even phonemes accentuation can influence the very morphology of the mouth.

Do you happen to take notes of your own vernaculum?
Do you happen to take notes of your own vernaculum?

Before we proceed, in order to gauge the size of your vocabulary, I have prepared a tool to assess the extent of your lexical repertoire, in Portuguese for the moment.

Migration dynamics, military conquests, and cultural exchanges are hybridizing forces that transform languages within regional spheres, forming dialects before eventually giving rise to entirely new languages. For instance, Vulgar Latin, as it spread across the various Roman provinces, gave birth to variants that individually evolved into Portuguese, its elder sibling Spanish, the firstborn French, and even Romanian, a geographically disconnected Latin sibling. Under the umbrella of the same language, some dialects are so distinct from each other that they may seem like entirely different languages, such as Sicilian and Bergamasco in Italian, both united under the unification of the Italian language by the Florentine Dante Alighieri, or even the jarring subtitles we encounter when watching a television interview with a native co-speaker from a remote region or an overseas country.

Linguistic colonisation over time
Linguistic colonisation over time

The historical formation of the Luso-Brazilian language has roots in Latin, the language spoken in the small Latin settlement immediately south of Rome, and carried beyond the confines of the Roman Empire, later shaped on the Iberian Peninsula by the Sephardim and the long Moorish incursions, specifically, Arab and Berber people during Islamic expansion, the formation of kingdoms upon kingdoms, the great navigations that bridged African, Asian, and overseas cultures, cultural exchanges with a myriad of local indigenous and West African tribes, European immigrants, Lebanese, and Japanese, all the way up to the modern age of globalization with the overwhelming American cultural projection intertwined with technological advancements.

Here, we present the hierarchy of major living Indo-European languages based on their evolution and number of speakers.

Hierarchy of Indo-European Languages
Hierarchy of Indo-European Languages

When we refer to a language, we regard the formal language, that snapshot of erudite speech that is appropriate, even though it differs from the everyday vernacular.

Indeed, measuring an individual's lexical repertoire is a task that, pragmatically, can only be done approximately; even if someone were patient enough to undergo an exhaustive search for the tens of thousands of words, terms, and expressions they know, fatigue and mental confusion would distort the results, making the final outcome no better than a good approximation via statistical sampling.

Even though a proficient native speaker demonstrates fluency and mastery of basic grammar, the lexical framework of an adult native with secondary education typically revolves around a range of 18,000 to 35,000 words, though usually around 23,000. This represents only a fraction of the total available words. It is worth highlighting that this figure is influenced by numerous factors, including educational level, cultural context, exposure to reading, interest in disciplinary niches rich in jargon, writing as a means to solidify passive vocabulary, learning new languages (especially those with a rich historical exchange), and, to abbreviate our list, a resource I use daily: self-oratory.

Before diving deeper into lexical assessment, we must define what "knowing a word" means. We categorize etymological knowledge into four levels:

  1. Unawareness or mere deduction that the word is part of the language;

  2. Tangential recognition, without intuitively understanding its meaning;

  3. Passive knowledge, the ability to associate the word with its semantic value, but without actively using it;

  4. Active knowledge, fluent use of the word, with the ability to provide synonyms and antonyms, as well as articulate its etymology on par with one’s educational level.

With this in mind, we have stratified a reasonably large set of words into four broad categories, clearly distinct from one another:

Elementary level – includes frequently used words in daily life, with a limited, rudimentary ability to express ideas;

Intermediate level – allows for the articulation with some errors, imprecisions, and, especially, a deficiency in providing synonyms and antonyms;

Advanced level – encompasses the collection of a typical adult native speaker’s vocabulary;

Proficient level – mastery of the cultured form and clear familiarity with erudite, literary, archaic, and/or various jargon forms (natural, human, and exact sciences, technology, etc).

~ ~

A fazer:

  • Diversificar vocábulos nos estratos superiores

  • Distribuição e coleta de dados com CI 95% e erro de amostragem de 5%

  • Análise fatorial hierárquica do teste, fazer um constructo robusto