Origin and evolution of languages

WE DON'T ASK OURSELVES where languages come from because they just seem to be there: French in France, English in England, Chinese in China, Japanese in Japan, and so forth. Yet if we go back only a few thousand years, none of these languages were spoken in their respective countries and indeed none of these languages existed anywhere in the world. Where did they all come from?

In some cases, the answer is clear and well-known. We know that Spanish is simply a later version of the Latin language that was spoken in Rome two thousand years ago. Latin spread with the Roman conquest of Europe and, following the breakup of the Roman Empire, the regional dialects of Latin gradually evolved into the modern Romance languages: Sardinian, Rumanian, Italian, French, Catalan, Spanish, and Portuguese. A language family, such as the Romance family, is a group of languages that have all evolved from a single earlier language, in this case Latin

But while the Romance family illustrates well the concept of a language family, it is also highly unusual, in that the ancestral language — Latin — was a written language that has left us copious records. The usual situation is that the ancestral language was not a written language and the only evidence we have are in its modern descendants. Yet even without written records, it is not difficult to distinguish language families, as can be seen in Table 1.

Here similarities among certain languages in the word for "hand" allow us to readily identify not only the Romance family (Spanish, Italian, Rumanian), but also the Slavic family (Russian, Polish, Serbo-Croatian) and the Germanic family (English, Danish, German). There are, however, no written records of the languages ancestral to the Germanic or Slavic languages, so these two languages — which must have existed no less than Latin — are called Proto-Germanic and Proto-Slavic, respectively.

If we examine words other than "hand," we find many additional instances where each of these three families is characterized by different word roots (phonetically), just as in the case of "hand, ruka, mano". But we also find, from time to time, roots that seem to be shared by these three families; that is, the same root is found in all three families. What is the meaning of such word roots?

In fact, similarities among language families such as Romance, Germanic, and Slavic, have the same significance as similarities among languages in any one family, for example Romance languages. These similarities imply that the three families are branches of a more ancient family of languages. In other words, a language that existed long before Latin, Proto-Germanic, or Proto-Slavic first differentiated into these three languages and they in turn, diversified into the modern languages of each family. This larger, more ancient family of languages is known as the Indo-European family and it includes almost all European languages (except Basque, Hungarian and Finnish), and many other languages of Iran, Afghanistan, Pakistan, and India. The Indo-European family of languages has in fact, thirteen branches; in addition to Romance, Germanic, and Slavic, there are also Baltic, Celtic, Iranian, Indic, Tocharian, Anatolian, and three single languages that are by themselves separate branches of the family: Armenian, Greek, and Albanian.

The thirteen branches of Indo-European are connected to one another by numerous common words and grammatical endings. One example is the word for "mouse," which exhibits striking similarities among languages from different branches of the family: Greek "muus", Latin "muus", Old English "muus", Russian "msh", and Sanskrit (Indic) "muu-". Not surprisingly, scholars believe that the original Proto-Indo-European word was *muus- (the * indicates a hypothetical reconstructed form, rather than an actually attested written form). Another root shared by different branches is the word for "nose": Latin "naas-", Old English "nosu", Lithuanian (Baltic) "nos-", Russian "nos", and Sanskrit "naas-". All of these words are thought to have evolved from Proto-Indo-European "*naas-". The precise time and place that Proto-Indo-European was spoken remains a matter of some dispute even today. The two most popular hypotheses postulate it was spoken in Ukraine around six thousand years ago, or Anatolia (modern Turkey) around eight thousand years ago.

The story does not end here, for Indo-European is but one branch of an even larger (and more ancient) family known as Eurasiatic. In addition to Indo-European, this family also includes the Uralic family (Finnish, Hungarian, Samoyed); the Altaic family (Turkic, Mongolian, Tungus, Korean, Japanese); the Chukchi-Kamchatkan family just across the Behring Strait from Alaska; and the Eskimo-Aleut family that extends along the northern perimeter of North America from Alaska to Greenland. One of the words found in all five branches has a general meaning of "tongue", "speak" or "call": Proto-Indo-European "*gal" "call," Proto-Uralic "*keele" "tongue," Proto-Altaic * "tongue, speak," Kamchadal (Chukchi-Kamchatkan) "kel" "shout", Proto-Eskimo *- "inform." The Eurasiatic family is also characterized by distinctive first- and second-person pronouns, the first based on M, the second on T. Within the Indo-European family, almost every language exhibits such forms: English me and thee, Spanish me and te, Russian menya and tebya, and so forth. This pattern is, however, characteristic of the entire Eurasiatic family, not just the Indo-European branch. In other parts of the world, different pronominal systems are found. For example, in the Amerind family, which includes most Native American languages, the most common pattern is first- person N and second-person M.

If we apply this method of classification to languages elsewhere in the world, we can, in similar fashion, distinguish about twelve other large and ancient families comparable to Eurasiatic.

Even among these dozen families, there are certain distinctive roots indicating that all twelve of these families have evolved from a single earlier language. Two of the most widespread roots are TIK 'finger', 'one' and PAL 'two'. Both of these roots are extremely common around the world. Table 2 provides just one example of each from the world's major geographical areas, but many additional examples could be cited.

Table 2:

There is, however, indirect circumstantial evidence from other areas of science that may provide an answer to these two questions. Both the archaeological record (in terms of bones and artifacts) and human genes (in terms of gene frequencies and mitochondrial DNA) indicate that all modern humans share a recent common ancestry in Africa. What is surprising, and difficult to explain, is that people who look just like us — modern humans — first appear in the archaeological record one hundred thousand years ago. But these people did not behave like us; they are indistinguishable from Neanderthals in both their toolkit and their behavior. It was only around fifty thousand years ago that — quite suddenly — both toolkits and behavior started to change with amazing rapidity. Toolkits that had remained unchanged over hundreds of thousands of years began to change with the rapidity of tennis-shoe styles today. And styles that had been uniform over huge geographical distances began to differentiate in neighboring villages. People began to fashion tools from other materials. Whereas previously only stone had been used, now bone, shells, ivory, and other natural materials were employed. Art appeared for the first time, burials became more complex, and people seem to have spread out of Africa to inhabit the entire world, replacing earlier inhabitants (Neanderthals) or occupying territories hitherto uninhabited, such as Australia, Oceania, and the Americas.

Given that interaction and borrowing are possible reasons for the similiarities between languages, the original African language would have likely been influenced by the languages of the cultures it encountered and theoretically replaced. Merritt Ruhlen explains how that African language developed and why it is considered to be the original fully modern language.

We arrive at the final question in our story. What advantage could have allowed a small African population to leave Africa fairly recently and, in a short time, occupy the entire world and replace all previous human inhabitants? A growing number of scholars — linguists, archaeologists, and geneticists — believe that it was the appearance of fully modern human language around fifty thousand years ago that bestowed this enormous selective advantage on a small African population. If this scenario is correct, then the similarities among the world's extant languages not only support the idea of a recent African origin for all modern humans, they also explain it. The invention of modern human language fifty thousand years ago led to the explosive expansion of modern humans around the globe. And even today traces of this sudden expansion persist in languages around the world.

Persisting in languages around the world are traces of the sudden expansion of humans at the time of the development of the original fully modern language. Merritt Ruhlen discusses how these traces can be seen in certain widespread roots as a result of their common origins

To further understand the origin and evolution of languages, we may first consider the evolution of modern languages and forms of speech ie. the development of argot, slang, cant, jargon, lingo, patois, vernaculars and regional dialects.

Argot, slang, cant, jargon, lingo, patois, vernaculars and regional dialects are "regional" or "social" varieties of a language distinguished by pronunciation, grammar, and vocabulary. For example, "cockney" is a variety of English spoken by some Londoners. "Marseillais" is a variety of French spoken in the South East of France. More specifically, a dialect is a variety of speech that differs from the "standard speech" or the "speech of the common individual" within the culture in which it exists. Jargon or cant is a special terminology understood among the members of a profession, discipline, group or class, but obscure to the general population, because they have no use of it.

Argot, slang, cant, jargon, lingo, patois, vernaculars and regional dialects develop because languages change continuously in adaptation to the evolution of social behaviours, ideas, technologies and science. Regional and social dialects, within a community speaking the same language, develop when there is little or no communication possible between the different components or areas where the common language is spoken, due to geography and/or culture eg. science or technology.

In the case of dialects separated geographically or culturally from the main language, given enough time, these will become individual languages. The process applies today to French of France, to French of Canada, to French of Belgium, to French of Switzerland, even when communication between the different communities speaking French are numerous, because each community is separated in its cultural, social, political and institutional structures.

But earlier in the past, this applied all over Western Europe to Latin. Latin was the language of the Romans and of the Roman empire at its apogee from -5000 to -2000 years BC.

See map of the Roman empire with major road links:

After the decline of the Roman empire at the beginning of the first millennium, Latin evolved into the modern languages of French, Spanish, Catalan, Italian, Portuguese, Rumanian, Corsican, Provençal, Sarde, which all derive from Latin. However, the process must have taken hundreds of years. Evidence for this is due to the fact that Latin was a written language highly structured in lexicon, grammar and syntax, of which we have numerous literary records by famous authors for example, Cicero, Seneca, Tacitus, Virgilius, Caesar, and which were studied and copied by scholars up to the 17th century. In fact Latin was the official language of clergy and scholars, of law and contracts, all through the first millennium, the middle ages and until the end of the 17th century. Isaac Newton's notorious treatise on universal gravity was written in Latin. Latin is still taught in European schools today.

How did latin evolve into its many regional dialects and into the modern romance languages spoken and written today? We may imagine that latin, the language of the hegemonic power of the late pre first millennium, spread across the sphere of influence of the Romans and of the Roman empire (see map), within small populations that spoke their own languages inherited from their origins. The replacement of these vernaculars by Latin took place only in areas where romance languages are found today ie. France, Spain, Italy, Rumania, the latter having strongly borrowed words later on, from neighbouring Slavonic languages (Russian and Serbo-Croatian). It seems that Latin completely replaced the vernaculars in these regions of France, Spain and Italy; an explanation for this may be the smallness of the populations and their degree of integration into the Roman empire ie. adoption of its social, cultural, political and institutional structures. For example, French law is considered to be based on the Roman judicial system known by record (Cicero, Seneca..), even after being modernized under Napoleon's rule at the beginning of the 19th century.

Despite their degree of integration within the structures of the Roman empire, the latin that was spoken in the region of Lyon, or in the region of Lutetium (Paris), was not exactly the same language spoken in Rome by Cicero or Seneca or Virgilius. The same applies to French Canadian today. When the Roman Empire declined and eventually dislocated, these dialects continued their own evolution and became languages of their own. But the process took more than a thousand years, because Latin continued to be used as the language of clergy, scholars and contract makers. For example, in French, the first legal contract to be written in "French" was in the 13th century but Latin was still used in the 17th century.

Evidence that Romance languages derive from Latin, come from the study of Latin which offers many written records. The vocabulary, grammar and syntax of these languages have similarities with Latin, in particular words; word roots can be traced to Latin in French, Spanish, Portuguese, Italian, and their still regional dialects like Provençal, Catalan, Corsican, Sarde. For example the latin word for hand or derivatives of the same, "manus" is found is all of these languages. Almost all of the word roots that are found in the major Romance languages are found in Latin, exceptions being borrowing from other languages at much later periods nearer to this day and known by linguists, for example from Arabic, or English, or Russian.

In contrast with Romance languages, Latin did not totally replace vernaculars that were spoken in Northern Europe, nor in present day Turkey or the Levant. In Northern Europe, including modern Great Britain, most of the word roots are not found in Latin and are not common with Romance languages. The languages of English, German, Dutch and Flemish, Danish, Swedish, Norwegian, all share common word roots so we may formulate the hypothesis that they derive from a common ancestor language, like Romance languages derive from Latin. These languages are grouped in a family which is designated as Germanic, and their common ancestor is designated as proto-Germanic, because there is no written record of such a language, as is the case for Latin.

However, Latin words were probably adopted in these languages at the time of the Roman Empire, and borrowing of words from Romance languages took place at much later periods, notably in England after the Norman conquest in 1066 by William the conqueror. However, it must be said that the Normans were of Northern European origin having occupied North West France (called Normandy today) from the 6th to 8th century. It seems that the Normans had adopted French customs and language at the time of the Conquest, because borrowing of words common with modern French are considered dating back to this period.

In Turkey, Latin did not replace the local vernacular either; modern Turkish language shares words with the so-called Turkic family of languages spoken in Turkmenistan, Uzbekistan, Kazakstan, Kirgistan...

In the Levant, the vernaculars were all semitic languages of which modern hebrew and Arabic. Arabic spread all over the Levant and West Asia in the second half of the first millennium.

Language roots

As explained above, word roots are identified by considering the modern languages that exist in the world today. Working backwards in time from these modern languages, we can understand which of these roots, being common to each language because perceived by their phonetic evidence, may have existed at an earlier time in an earlier language ancestor, even without the need for writing. Writing is the coding of language sounds and it reflects the grammatical and syntactic structure of the language. However, in many instances, writing hides the similarities that may exist between languages because of script (for example Cyrillic script, or Hindu script, or nearer to us Gothic script in German) or use of script signs to code given phonemes (sounds), for example in Portuguese, where the sound "on" as in "bon jour", is coded "boã dia". Decrypting written languages to identify common word roots is the specialized discipline of linguistics.

If different modern languages have the same word root (almost phonetically the same), for designating a percept, that is a visual or sensory perception, an abstraction or a concept, then we may formulate the hypothesis that this word existed in an ancestor language to these languages. This applies for Romance languages with Latin, or Germanic languages with the hypothetic (because no records) proto-Germanic language.

The foregoing theories have been developed for modern West European languages, but similar processes are most likely to have taken place in other parts of the world for other languages.