The most solidly argued theories about the cradle of the Indo-European (IE) proto-language are 1) the Pontic and 2) the Anatolian. The first proposed the steppes north of the Black Sea, in what are today south-Ukraine and south-Russia; the second: some region in central or eastern Anatolia. Both theories allow for contacts with the Caucasus and the indigenous Caucasian languages: north for the first, south for the second.

As we will see, the Pontic theory is enfeebled by the total absence of any contact between IE and the North-West Caucasian languages (NWC): Cherkess-Kabard-Abkhaz, which should have been their most immediate neighbours. NWC languages like Cherkess, that in historical times have occupied the Black Sea coast between Crimea and the Caucasus range, diverge from IE in all respects: typologically, lexically, phonologically.

The Anatolian theory, on the other hand, explains the profound affinities (typologically, lexically, phonologically) with proto-Semitic, proto-Kartvelian (South-Caucasian: SC) and proto-Nakh-Daghestani (the North-East Caucasian languages: NEC).

My arguments will thus be purely linguistic and I will deliberately leave aside any discussion about archaeological continuity, or about the spread of certain kinds of pottery, weapons, burial types or decoration. In recent times, such arguments have been thoroughly discredited and it is now definitely established that a) — a culture that is uniform in its external aspects can hide, and cover, and be shared, by different ethnic groups speaking widely divergent and dissimilar languages, or, vice-versa: b) — that speakers of one single language can belong to distinct material cultures.

In the first case, a) — the Middle East, India, the Balkans or the Caucasus show us how different ethnic groups speaking different languages can partake of the same culture, wearing the same dresses, eating the same food and sharing the same customs. The overall Caucasian culture is very uniform, shared by Christians and Muslims alike, although they speak languages belonging to three indigenous families: SC (Kartvelian), NWC (Abkhaz-Cherkess-Kabardian) and NEC (Nakho-Daghestani, including Chechen), but the same culture is also shared by Indo-European ethnic groups (Armenians and Ossetians), or Turkic newcomers (Kumyk, Nogai, Balkars). There is, and there was, very little linguistic mingling in the Caucasus, languages were always kept apart, while seen from the outside the culture of the Caucasus region presents a striking image of uniformity. On the basis of the material culture, an archaeologist of the future might be tempted to reconstruct a unique ethnic group speaking mutually comprehensible languages.

In the second case, b) — we have a case like the historically attested Etruscan language. It was the language of the powerful founders and first masters of Rome, spoken in what is the modern-day Tuscany, and, although poorly understood (it is not an Indo-European language) it lets us see that it was rather uniform in its dialects. Nevertheless, without the testimony of the written texts, we might conclude that we are confronted with two different civilisations, for some Etruscans were incinerating their dead, burying the ashes in what we call “urn fields”, while others were burying the embalmed corpses in elaborate, and often very rich, underground constructions imitating earthly dwellings.

I will thus firmly use only linguistic arguments in discussing the question of the cradle of the Indo-European languages.

First, a very important tool that has, to my knowledge, never been throughly taken into account is what are called in linguistics the “areal features”. Languages tend to influence each other locally, regionally. Tonal languages are thus spoken in sharply delimited areas: South-East Asia and the African gulf of Guinea. The monosyllabic structure and the (sometimes) delicate and complicated system of tones in these languages are clearly areal features, both in South-East Asia and in the gulf of Guinea, traits shared, for instances, by languages that are not genetically related, such as Chinese and Vietnamese.

In the same way, languages possessing grammatical genders, or nominal classes, on the IE model, tend also to be locally defined. Such are the Bantu languages in Africa, the Pama-Nyungan languages in Australia, and some language families in the Middle East and the Caucasus: Semitic, Indo-European and Nakho-Daghestani. These are the only regions on the planet where languages display the classificatory system of the grammatical genders (or nominal classes).

In Semitic, the number of nominal classes (grammatical genders) are, in the historically attested languages, only two, those that we call conventionally masculine and feminine. In the Nakho-Daghestani languages, they can go up to six (Chechen, Ingush and the related Batsbi have six genders) although the most common figure is three or four. Avar, the main language from Dagestan, has three genders (or nominal classes), which we can conveniently call “masculine”, “feminine” and “neuter”. In Avar (and the related languages from Dagestan) they function, syntactically and morphologically, exactly like the three corresponding classes (grammatical genders) in the Indo-European languages.

In Old Europe, none of the attested non-IE languages languages had grammatical genders. Neither Basque, which stems from one of the old Iberian languages, nor the northern Finnic languages, nor the extinct and rather well attested Etruscan, had grammatical genders or classes. Nor do the languages of Asia, be they Altaic, Uralic, or the tonal languages of South-East Asia. The appearance of the delicate mechanism of the grammatical genders, combined with the internal flexion, in a proto-language, or a group of closely related languages, such as the proto-IE, in the vicinity of (proto)Basque, Etruscan and Finnic is highly implausible.

All this takes us very far away from the proposed northern homelands, such as the steppes of today’s Ukraine and Southern Russia, where proto-Indo-European would have been in contact with Finno-Ugrian languages and languages from the north-west Caucasian branch (Cherkess, Kabard, Abkhaz). None of these has grammatical genders.

The Anatolian hypothesis has been proposed and convincingly argued by linguists and archaeologists such as Gamkrelidze and Ivanov, Trubetzkoy and Colin Renfrew. The starting point are the aforesaid “areal features”. In the Balkans, for instance, the languages form a Sprachbund, a unity of linguistic typology into which they converged regardless of their initial linguistic family. We saw that the languages of old, pre-IE Europe, do not have genders, or nominal classes. Those from Anatolia, the Old Middle East and parts of the Caucasus do have them.

Then comes the flexion. Semitic (two grammatical genders), Kartvelian (no grammatical genders) and North-East Caucasian (NEC, with nominal classes) languages present a morphology and an internal flexion similar to Indo-European. Kartvelian even uses the same mechanism of Ablaut as the IE, as well as a personal flexion of the verb (which NEC languages don’t have).

In the same way, the mechanism of the flexion of Semitic languages is built, like in the IE, around the Ablaut, the internal vowel shift from one grammatical category to another. In the Semitic languages, the play with the vowels, inside the mechanism of the flexion, has a decisive morpho-semantic role, like in the IE, SC (Kartvelian), or NEC (Chechen, Avar etc.), whereas in Basque, Finnish, Estonian or Lapp a root never modifies its vowel, but functions grammatically through agglutination. The two areal and typological models are widely dissimilar, even opposed.

The numerals

Kartvelian and IE languages borrowed in prehistorical times a series of numerals from proto-Semitic, especially the numerals 6 and 7. We thus have shesh and sheva in Hebrew, sitta and sab’a in Arabic, shetta and shub’a in Aramaic, etc.

In the Indo-European family, there is a close parallel: sex and septem in Latin, sechs and sieben in German, sześć and siedem in Polish, sheshí and septyni in Lithuanian.

And in Kartvelian, 6 is ekvsi in Georgian, usgwa in Svan; 7 is švidi in Georgian, išgwid in Svan.

What is interesting and revealing is that there happened a chassé-croisé of designations of numerals. Thus, in Georgian the Semitic 4 (arb’a in Hebrew) became 8 (rva), while the Georgian 4 (oti) is identical with the IE 8: octo, ahtau, etc…

Moreover, 8 in Indo-European was a dual, something which is visible in Sanskrit, Avestan and Gothic: ahtau. A dual means that 8 designated “twice 4”, which sends us immediately to the Georgian oti = 4. Oti, if we reconstruct it as *okt– (-i is simply the termination of the nominative in Georgian), explains why the IE octo, ahtau is a dual. The same mechanism would explain why the Semitic 4 (arb’a in Hebrew) became 8 (rva) in Kartvelian (Georgian).

The prehistoric existence in the area of a counting system based on 4 would also explain why in Chechen (a NEC, Nakho-Daghestani language), the numeral 4 is the only one that receives prefixes of nominal classes, having thus different forms according to the gender of the defined noun.

A similar comput system based on 4 is attested in other language families, in Africa, or in the isolated Burushaski, in the Pamir mountains in the north of today’s Pakistan, where:

2 = alto
4 = walto
8 = altambo
20 = altar

It is thus perfectly coherent that the Georgian oti = 4, while the IE 8 octo (ahtau etc.) is a dual, that is: 4 x 2 . In the same way, the Semitic 4 (arb’a in Hebrew) became the Georgian 8 = rva. This also vindicates Gamkrelidze’s theory that the formal identity, in IE languages, of the numeral 9 with the adjective “new” is not due to mere coincidence: novum-novem, neu-neun, new-nine etc.  9 was simply opening a new series.

All this indicates that Indo-European must have been formed in the vicinity of Semitic and Kartvelian and possibly other Caucasian languages. This excludes the possibility of a cradle north of the Black Sea, and totally excludes the Danube area, the Balkans, or any part of Eastern Europe. Those regions are too far from the Caucasus and from the Semitic languages, and we have seen that in the Neolithic in today’s Europe the languages might have had a typology similar with today’s Basque, or with the Finnic languages, which have an agglutinative typology.

It is only the Anatolian hypothesis that explains the borrowings and the many lexical common terms between IE, Kartvelian and Semitic. The borrowings from Semitic into IE and Kartvelian are too numerous to be listed here. Between IE and Kartvelian we have surprising correspondences, such as the verbal root *sed– to sit, to stay, to remain (identical in IE and Kartvelian), ordinal numerals such as the Georgian pirveli (first), which cannot come from a Slavic language, with which Georgian had no contact by the time of the first written texts in the Vth century.

Numerous are also the lexical archaic correspondences between IE and the North-East Caucasian languages (Chechen, Avar etc.), while Indo-European borrowings into Basque or Finnic are all recent and can be easily traced historically.

All this shows that proto-Indo-European was formed in Eastern Anatolia, in the vicinity of the Semites and Caucasians.

A shorter Romanian version of this text can be found in the links below.

GAMKRELIDZE, T.V. and IVANOV, V., Индоевропейский язык и индоевропейцы, Tbilisi, 1984. (The Indo-European language and the Indo-Europeans)

TRUBEŢKOI, Nikolai Sergheevici. “Mîsli ob indoevropeiskoi probleme.” in Izbrannîe trudî po filologii, ed. T.V. Gamkrelidze. Moscow: Progress, 1987.

RENFREW, Colin. Archaeology and Language. The Puzzle of Indo-European Origins. London: Penguin Books, 1989.

Cf. also:

A structural comparison of Etruscan with the Kartvelian languages

Sucking the victim‘s mother‘s teats – the Etruscans and the Caucasian vendetta…


Yoga: the Chechen language and its prehistoric contacts with Indo-European…


Anatolia și Caucazul : leagănul primitiv al indo-europenilor – demonstrația lingvistică…