Linguistic aspects of the Aryan non-invasion theory

Dr. Koenraad ELST




It is widely assumed that linguistics has provided the clinching evidence for the Aryan invasion theory (AIT) and for a non-Indian homeland of the Indo-European (IE) language family. Defenders of an "Out of India" theory (OIT) of IE expansion unwittingly confirm this impression by rejecting linguistics itself or its basic paradigms, such as the concept of IE language family. However, old linguistic props of the AIT, such as linguistic paleontology or glottochronology, have lost their credibility. On closer inspection, currently dominant theories turn out to be compatible with an out-of-India scenario for IE expansion. In particular, substratum data are not in conflict with an IE homeland in Haryana-Panjab. It would however be rash to claim positive linguistic proof for the OIT. As a fairly soft type of evidence, linguistic data are presently compatible with a variety of scenarios.

1. Preliminary remarks

1.1. Invasion vs. immigration

The theory of which we are about to discuss the linguistic evidence, is widely known as the "Aryan invasion theory" (AIT). I will retain this term even though some scholars object to it, preferring the term "immigration" to "invasion". They argue that the latter term represents a long-abandoned theory of Aryan warrior bands attacking and subjugating the peaceful Indus civilization. This dramatic scenario, popularized by Sir Mortimer Wheeler, had white marauders from the northwest enslave the black aboriginals, so that "Indra stands accused" of destroying the Harappan civilization. Only the extremist fringe of the Indian Dalit (ex-Untouchable) movement and its Afrocentric allies in the US now insist on this black-and-white narrative (vide Rajshekar 1987, Biswas 1995).

But for this once, I believe the extremists have a point. North India's linguistic landscape leaves open only two possible explanations: either Indo-Aryan was native, or it was imported in an invasion. In fact, scratch any of these emphatic "immigration" theorists and you'll find an old-school invasionist, for they never fail to connect Aryan immigration with horses and spoked-wheel chariots, i.e. factors of military superiority.

Immigration means a movement from one country to another, without the connotation of conquest; invasion, by contrast, implies conquest or at least the intention of conquest. To be sure, invasion is not synonymous with military conquest; it may be that, but it may also be demographic Unterwanderung. What makes an immigration into an invasion is not the means used but the end achieved: after an invasion, the former outsiders are not merely in, as in an immigration, but they are also in charge. If the newcomers end up imposing their (cultural, religious, linguistic) identity rather than adopting the native identity, the result is the same as it would have been in the case of a military conquest, viz. that outsiders have made the country their own, and that natives who remain true to their identity (such as Native Americans in the US) become strangers or second-class citizens in their own country.

In the case of the hypothetical Aryan invasion, the end result clearly is that North India got aryanized. The language of the Aryans marginalized or replaced all others. In a popular variant of the theory, they even reduced the natives to permanent subjugation through the caste system. So, whether or not there was a destructive Aryan conquest, the result was at any rate the humiliation of native culture and the elimination of the native language in the larger part of India. It is entirely reasonable to call this development an "invasion" and to speak of the prevalent paradigm as the "Aryan invasion theory".

As far as I can see, the supposedly invading Aryans could only initiate a process of language replacement by a scenario of elite dominance (that much is accepted by most invasionists), which means that they first had to become the ruling class. Could they have peacefully immigrated and then worked their way up in society, somewhat like the Jews in pre-War Vienna or in New York? The example given illustrates a necessary ingredient of peaceful immigration, viz. linguistic adaptation: in spite of earning many positions of honour and influence in society, the Jews never imposed their language like the Aryans supposedly did, but became proficient in the native languages instead. So how could these Aryan immigrants first peacefully integrate into Harappan or post-Harappan society yet preserve their language and later even impose it on their host society? Neither their numbers, relative to the very numerous natives, nor their cultural level, as illiterate cowherds relative to a literate civilization, gave them much of an edge over the natives.

Therefore, the only plausible way for them to wrest power from the natives must have been by their military superiority, tried and tested in the process of an actual conquest. Possibly there were some twists to the conquest scenario, making it more complicated than a simple attack, e.g. some Harappan faction in a civil war may have invited an Aryan mercenary army which, after doing its job, overstayed its welcome and dethroned its employers. But at least some kind of military showdown should necessarily have taken place. As things now stand, the Aryan "immigration" theory necessarily implies the hypothesis of military conquest.

1.2. The archaeological argument from silence

In this paper, I will give a sympathizing account of the prima facie arguments in favour of the "Out of India" theory (OIT) of IE expansion. I am not sure that this theory is correct, indeed I will argue that the linguistic body of evidence is inconclusive, but I do believe that the theory deserves a proper hearing. In the past, it didn't get one because the academic establishment simply hadn't taken serious notice. Now that this has changed for the better, it becomes clear that the all-important linguistic aspect of the question has never been properly articulated by "Out of India" theorists. The OIT invokes archaeological and textual evidence, but doesn't speak the language of the IE linguists who thought up the AIT in the first place. So now, I take it upon myself to show that the OIT need not be linguistic nonsense.

But first, a glimpse of the archaeological debate. In a recent paper, two prominent archaeologists, Jim Shaffer and Diane Lichtenstein (1999), argue that there is absolutely no archeaological indication of an Aryan immigration into northwestern India during or after the decline of the Harappan city culture. It is odd that the other participants in this debate pay so little attention to this categorical finding, so at odds with the expectations of the AIT orthodoxy, but so in line with majority opinion among Indian archaeologists (e.g. Rao 1992, Lal 1998).

The absence of archaeological evidence for the AIT is also admitted, with erudite reference to numerous recent excavations and handy explanations of the types of evidence recognized in archaeology, by outspoken invasionist Shereen Ratnagar (1999). It then becomes her job to explain why the absence of material testimony of such a momentous invasion need not rule out the possibility that the invasion took place nonetheless. Thus, she mentions parallel cases of known yet archaeologically unidentifiable invasions, e.g. the Goths in late-imperial Rome or the Akkadians in southern Mesopotamia (Ratnagar 1999:222-223). So, in archaeology even more than elsewhere, we should not make too much of an argumentum e silentio. To quote her own conclusion: "We have found that the nature of material residues and the units of analysis in archaeology do not match or fit the phenomenon we wish to investigate, viz. Aryan migrations. The problem is exacerbated by the strong possibility that simultaneous with migrations out of Eurasia there were expansions out of established centres by metallurgists/prospectors. Last, when we investigate pastoral land use in the Eurasian steppe, we can make informed inferences about the nature of Aryan emigration thence, which is a kind of movement very unlikely to have had artefactual correlates." (1999:234)

It's against the stereotype of overbearing macho invaders, but the Aryans secretively stole their way into India, careful not to leave any traces.

1.3. Paradigmatic expectation as a distortive factor

If the Aryan invasion does not stand disproven by the absence of definite archaeological pointers, then neither does an Aryan emigration from India. However, there is one difference. Because several generations of archaeologists have been taught the AIT, they have in their evaluation of new evidence tried to match it with the AIT; in this, they have failed so far. However, it is unlikely that they have explored the possibility of matching the new findings with the reverse migration scenario. Psychologically, they must have been much less predisposed to noticing possible connections between the data and an out-of-India migration than the reverse.

This predisposition is also in evidence in the debates over other types of evidence. Thus, in a recent internet discussion about the genetic data, someone claimed that one study (unlike many others) indicated an immigration of Caucasians into India for the 2nd millennium BC. To be sure, archaeo-genetics is not sufficiently fine-tuned yet to make that kind of chronological assertion, but even if we accept this claim, it would only prove the AIT in the eyes of those who are already conditioned by the AIT perspective. After all, a northwestern influx into India in the 2nd millennium, while not in conflict with the AIT, is not in conflict with the OIT either: the latter posits a northwestern emigration in perhaps the 5th millennium BC, and has no problem with occasional northwestern invasions in later centuries, such as those of the Shakas, Hunas and Turks in the historic period.

Likewise, linguistic evidence cited in favour of the AIT often turns out to be quite compatible with the OIT scenario as well (as we shall see), but is never studied in that light because so few people in the 20th century even thought of that possibility. And today, even those who are aware of the OIT haven't thought it through sufficiently to notice how known data may verify it.

1.4. The horse, argument from silence

In a recent paper, Hans Hock gives the two arguments which have, all through the 1990s, kept myself from giving my unqualified support to the OIT. These are the dialectal distribution of the branches of the IE language family, to be discussed below, and the sparse presence of horses in Harappan culture. About the horse, he summarizes the problem very well: "no archaeological evidence from Harappan India has been presented that would indicate anything comparable to the cultural and religious significance of the horse (...) which can be observed in the traditions of the early IE peoples, including the Vedic Aryas. On balance, then, the 'equine' evidence at this point is more compatible with migration into India than with outward migration." (1999:13)

B.B. Lal (1998:111) mentions finds of true horse in Surkotada, Rupnagar, Kalibangan, Lothal, Mohenjo-Daro, and terracotta images of the horse from Mohenjo-Daro and Nausharo. Many bones of the related onager or half-ass have also been found, and one should not discount the possibility that in some contexts, the term ashva could refer to either species. Nevertheless, all this is still a bit meagre to fulfil the expectation of a prominent place for the horse in an "Aryan" culture. I agree with the OIT school that such paucity of horse testimony may be explainable (cfr. the absence of camel and cow depictions, animals well-known to the Harappans, in contrast with the popularity of the bull motif, though cows must abound when bulls are around), but their case would be better served by more positive evidence.

On the other hand, the evidence is not absolutely damaging to an Aryan Harappa hypothesis. Both outcomes remain possible because other, reputedly Aryan sites are likewise poor in horses. This is the case with the Bactria-Margiana Archaeological Complex, surprisingly for those who interpret the BMAC as the culture of the Indo-Aryans poised to invade India (Sergent 1997:161 ff.). It is also the case for Hastinapura, a city dated by archaeologists at ca. 8th century BC, when that part of India was very definitely Aryan (Thapar 1996:21). So, the argument from near-silence regarding horse bones need not prove absence of Aryans nor be fatal to the OIT, though it remains a weak point in the OIT argumentation.

1.5. Evidence sweeping all before it

When evidence from archaeology and Sanskrit text studies seems to contradict the AIT, we are usually reassured that "there is of course the linguistic evidence" for this invasion, or at least for the non-Indian origin of the IE family. Thus, F.E. Pargiter (1962:302) had shown how the Puranas locate Aryan origins in the Ganga basin and found "the earliest connexion of the Vedas to be with the eastern region and not with the Panjab", but then he allowed the unnamed linguistic evidence to overrule his own findings (1962:1): "We know from the evidence of language that the Aryans entered India very early". His solution is to relocate the point of entry of the Aryans from the western Khyber pass to the eastern Himalaya: Kathmandu or thereabouts.

A common reaction among Indians against this state of affairs is to dismiss linguistics altogether, calling it a "pseudo‑science". Thus, N.S. Rajaram describes 19th-century comparative and historical linguistics, which generated the AIT, as "a scholarly discipline that had none of the checks and balances of a real science" (1995:144), in which "a conjecture is turned into a hypothesis to be later treated as a fact in support of a new theory" (1995:217).

Along the same lines, N.R. Waradpande (1989:19-21) questions the very existence of an Indo-European language family and rejects the genetic kinship model, arguing very briefly that similarities between Greek and Sanskrit must be due to very early borrowing. He argues that "the linguists have not been able to establish that the similarities in the Aryan or Indo-European languages are genetic, i.e. due to their having a common ancestry". Conversely, he also (1993:14-15) rejects the separation of Indo-Aryan and Dravidian into distinct language families, and alleges that "the view that the South-Indian languages have an origin different from that of the North-Indian languages is based on irresponsible, ignorant and motivated utterances of a missionary" (meaning the 19th-century prioneer of Dravidology, Bishop Robert Caldwell).

This rejection of linguistics by critics of the AIT creates the impression that their own pet theory is not resistent to the test of linguistics. Indeed, nothing has damaged their credibility as much as this sweeping dismissal of a science praised in the following terms by archaeologist David W. Anthony (1991:201-202): "It is true that we can only work with relatively late IE daughter languages, that we cannot hope to capture the full variability of PIE, and that reconstructed semantic fields are more reliable than single terms. It is also true that both the reconstructed terms and their meanings are theories derived from systematic correspondences observed among the daughter IE languages; no PIE term is known with absolute certainty. Nevertheless, the rules that guide phonetic (and to a lesser extent, semantic) reconstruction are more rigorous, have been more intensely tested, and rest upon a more secure theoretical foundation than most of the rules that guide interpretation in my own field of prehistoric archaeology. Well-documented linguistic reconstructions of PIE are in many cases more reliable than well-documented archaeological interpretations of Copper Age material remains."

However, the fact that people fail to address the linguistic evidence, preferring simply to excommunicate it from the debate, does not by itself validate the prevalent interpretation of this body of evidence. Rajaram's remark that scholars often treat mere hypotheses (esp. those proposed by famous colleagues) as facts, as solid data capable of overruling other hypotheses and even inconvenient new data, is definitely valid for much of the humanities.

But then, while some linguists have sometimes fallen short of the scientific standard by thus relying on authority, it doesn't follow that linguistics is a pseudo-science. Nobody can observe the Proto-Indo-Europeans live to verify hypotheses, yet comparative IE linguistics does sometimes satisfy the requirement of having predictions implicit in the theory verified by empirical discoveries. Thus, some word forms reconstructed as the etyma of terms in the Romance languages failed to show up in the classical Latin vocabulary, but were finally discovered in the vulgar-Latin graffiti of Pompeii. The most impressive example of this kind is probably the identification of laryngeals, whose existence had been predicted in abstracto decades earlier by Ferdinand de Saussure, in newly-discovered texts in the Hittite language. We will get to see an important sequel to the laryngeal verification below.

At the same time, some linguists are aware that the AIT is just a successful theory, not a proven fact. One of them told me that he had never bothered about a linguistic justification for the AIT framework, because there was, after all, "the well‑known archaeological evidence"! But for the rest, "the linguistic evidence" is still the magic mantra to silence all doubts about the AIT. It is time we take a look at it for ourselves.

2. The Indo-European landscape

2.1. Intuitive deductions from geography

There is, pace Misra 1992, absolutely no reason to doubt the established refutation of the Indian (and turn-of-the-19th-century European) belief that Sanskrit is the mother of all IE languages, though Sanskrit remains in many respects closest to PIE, as a standard textbook of IE testifies: "The distribution [of the two stems as/s for "to be"] in Sanskrit is the oldest one" (Beekes 1990:37); "PIE had 8 cases, which Sanskrit still has" (Beekes 1990:122); "PIE had no definite article. That is also true for Sanskrit and Latin, and still for Russian. Other languages developed one" (Beekes 1990:125); "[For the declensions] we ought to reconstruct the Proto-Indo-Iranian first,... But we will do with the Sanskrit because we know that it has preserved the essential information of the Proto-Indo-Iranian" (Beekes 1990:148); "While the accentuation systems of the other languages indicate a total rupture, Sanskrit, and to a lesser extent Greek, seem to continue the original IE situation" (Beekes 1990:187); "The root aorist... is still frequent in Indo-Iranian, appears sporadically in Greek and Armenian, and has disappeared elsewhere" (Beekes 1990:279).

All the same, Sanskrit has moved away from PIE and the path can be mapped. Thus, you can explain Skt. jagma from PIE *gegoma as a palatalization of the initial velar (before e/i) followed by the conflation of a/e/o to a, but the reverse is not indicated and is close to impossible: palatalization is a one-way process, attested in numerous languages on all continents (including English, e.g. wicca > witch), while the opposite shift is practically unknown. The kentum forms and the forms with differentiated vowels as attested in Greek represent the original situation, while the Sanskrit forms represent an innovation. This means that Sanskrit is not PIE, that it has considerably evolved after separating from the ancestor-languages of the other branches of IE.

However, accepting the conventional genealogical tree of the IE languages does not imply acceptance of their conventional geography. When Sanskrit was dethroned in the 19th century and the putative linguistic distance between PIE and Sanskrit progressively increased, there was a parallel movement of the PIE homeland away from India. Apart from linguistic considerations (chiefly linguistic paleontology) and the political background (increased Eurocentrism at the height of the colonial period), this was certainly also due to a more or less conscious tendency to equate linguistic distance from PIE with geographical distance from the Urheimat. That tendency has persisted here and there all through the 20th century, e.g. Witold Manczak (1992) deduces that the Urheimat must be in or near Poland from his estimate that lexically, Polish is closest to PIE in that it is the IE language with the fewest substratal borrowings.

Obviously, that type of reasoning must be abandoned. It is perfectly possible for the most conservative language to be spoken by a group of emigrants rather than by those who stayed behind in the homeland. Indeed, according to the so-called Lateral Theory, it is precisely in outlying settlement areas that the most conservative forms will be found, while in the metropolis the language evolves faster. That exactly is what the OIT posits regarding palatalization.

2.2. Kentum/satem

The first innovation acknowledged as creating a distance between PIE and Sanskrit was the kentum > satem shift. It was assumed, in my view correctly (pace Misra 1992), that palatalization is a one‑way process transforming velars (k,g) into palatals (c,j) but never the reverse; so that the velar or "kentum" forms had to be the original and the palatal or "satem" forms the evolved variants.

However, it would be erroneous to infer from this that the homeland was in the kentum area. On the contrary, it is altogether more likely that it was in what became satem territory, e.g. as follows: India originally had the kentum form, the dialects which emigrated first retained the kentum form and took it to the geographical borderlands of the IE expanse (Europe, Anatolia, China), while the last‑emigrated dialects (Armenian, Iranian) plus the staybehind Indo‑Aryan languages had meanwhile adopted the satem form.

Moreover, the discovery of a small and extinct kentum language inside India (Proto‑Bangani, with koto as its word for "hundred"), surviving as a sizable substratum in the Himalayan language Bangani, tends to support the hypothesis that the older kentum form was originally present in India as well. This discovery was made by the German linguist Claus Peter Zoller (1987, 1988, 1989). The attempt by George van Driem and Suhnu R. Sharma (1996) to discredit Zoller has been overruled by the findings made on the spot by Anvita Abbi (1998) and her students. She has almost entirely confirmed Zoller's list of kentum substratum words in Bangani. But as the trite phrase goes: this calls for more research.

Zoller does not explain the presence of a kentum language in India through an Indian Homeland Theory but as a left-over of a pre-Vedic Indo-European immigration into India. He claims that the local people have a tradition of their immigration from Afghanistan. If they really lived in Afghanistan originally, their case (and their nuisance value for the AIT) isn't too different from that of the Tocharians, another kentum people showing up in unexpected quarters. But if even the Vedic poets could not recall the invasion of their grandfathers into India (Vedic literature doesn't mention it anywhere, vide Elst 1999:164-171), what value should we attach to a tradition of this mountain tribe about its own immigration many centuries ago? Could it not rather be that they have interiorized what the school-going ones among them picked up in standard textbooks of history, viz. the AIT model? Their presence in Afghanistan or in Garhwal itself is at any rate highly compatible with the OIT.

2.3. Indo-Hittite

Another element which increased the distance between reconstructed PIE and Sanskrit dramatically was the discovery of Hittite. Though Hittite displayed a very large intake of lexical and other elements from non‑lE languages, some of its features were deemed to be older than their Sanskrit counterparts, e.g. the Hittite genus commune as opposed to Sanskrit's contrast between masculine and feminine genders, or the much‑discussed laryngeal consonants. Outside Hittite, some phonetic side-effects are the only trace of these supposed laryngeals, e.g. Greek odont-, "tooth", shows trace of an initial H-, which Latin lost to yield dent-. Greek anr, "man", would come from *Hnr, whereas Sanskrit has nr/nara, only preserving the laryngeal in the form of vowel-lengthening in a prefix, as in s-nara from su + *Hnara. In metre, we find traces of an original laryngeal consonant marking a second syllable which was later contracted with the preceding syllable: "In Indo-Iranian such forms are often still disyllabic in the oldest poetry: bhs, 'light', = /bhaas/ < /bheH-os/." (Beekes 1990:180)

This fact has gone unnoticed in all pro-OIT writing so far. The laryngeal came in three varieties, which later yielded the three vowels a/e/o, whose representatives in the Greek alphabet happen to be derived from the three more or less laryngeal consonants in Northwest-Semitic: aleph, he and ayn.

The laryngeal theory has been attacked by both OIT and mainstream circles. Misra (1992:21) claims to have "refuted" it, Dcsy (1991:17) calls it "the infamous laryngeal theory". When scholars claim proof of the laryngeals in Caucasian loan-words from IE, Dcsy (1992:14, w.ref. to Wagner 1984) counters that it is the other way around: "Hittite lost its Indo-European character and acquired a large number of Caucasian areal features in Anatolia. These Caucasian-type features can not be regarded as ancient characteristics of the entire PIE". Likewise Jonsson (1978:86), though accepting that the laryngeals may offer a "more elegant explanation of certain cases of hiatus in Vedic, of certain suffixal 's, 's", presents as "an acceptable alternative" the scenario that the laryngeal in IE-inherited Anatolian words "comes from the unknown non-IE language or languages that are responsible for the major part of the [Anatolian] vocabulary".

But we need no dissident hypotheses here: even in the dominant theory, there is no reason why the Urheimat should be in the historical location of Hittite or at least outside India. As the first emigrant dialect, Hittite could have taken from India some linguistic features (genus commune, laryngeals) which were about to disappear in the dialects emigrating only later or staying behind.

As for the shift from genus commune to a differentiation of the "animate" category in masculine and feminine, this has been used to illustrate a theory of fast-increasing complexity of post-PIE grammar, which Zimmer (1990/2) interprets as a typical phenomenon of Creole languages. He sees early IE as the language of a colluvies gentium, a synthetic tribe of people from divergent ethnic backgrounds, which developed its makeshift link language into a complex language, with Hittite splitting off in an early stage of this evolution. This is an interesting hypothesis, but so far the evidence for it is lacking. Thus, there is no proof that the simpler verbal tense system of Germanic and Hittite came first while the more elaborate tense system of Aryan or Greek was a later evolution; more likely, the aorist which exists in the latter two but not in the former two is a PIE tense which some retained and some lost. The theories that PIE grammar was Hittite-like simple and that PIE was a Creole developed by a colluvies gentium are mutually supportive, but there is no outside proof for either. And if there were, it would still not preclude northwestern India as the habitat of this colluvies gentium.

2.4. Dialect distribution

One consideration which has always kept me from simply declaring the AIT wrong concerns the geographical distribution of the branches of the IE family. This argument has been developed in some detail by Hans Hock, who explains (1999:13) that "the early Indo-European languages exhibit linguistic alignments which cannot be captured by a tree diagram, but which require a dialectological approach that maps out a set of intersecting 'isoglosses' which define areas with shared features (...) While there may be disagreements on some of the details, Indo-Europeanists agree that these relationships reflect a stage at which the different Indo-European languages were still just dialects of the ancestral language and as such interacted with each other in the same way as the dialects of modern languages."

Isoglosses, linguistic changes which are common to several languages, indicate either that the change was imparted by one language to its sisters, or that the languages have jointly inherited or adopted it from a common source. Within the IE family, we find isoglosses in languages which take or took geographically neighbouring positions, e.g. in a straight Greece-to-India belt, the Greek, Armenian, Iranian and some Dardic and western Indo-Aryan languages, we see the shift s > h, e.g. Latin septem corresponding to Greek hepta, Iranian hafta. In the same group, plus the remaining Indo-Aryan languages, we see the "preterital augment": Greek e-phere, Sanskrit a-bharat, "he/she/it carried". Does this mean that the said languages formed a single branch for some time after the disintegration of PIE unity, before fragmenting into the presently distinct languages?

Not necessarily, for this group is itself divided by separate developments which the member languages have in common with non-member languages. Best known is the kentum/satem divide: Greek belongs to the kentum group, while Armenian and Indo-Iranian share with Baltic and Slavic the satem isogloss (as well as the related "ruki rule", changing s to sh after r, u, k, i). So, like between the dialects of any modern language, the IE languages share one isogloss with this neighbour, another isogloss with another neighbour, who in turn shares isoglosses with yet other neighbours.

The key factor in Hock's argument seems to be neighbour: the remarkable phenomenon which should ultimately support the AIT is that isoglosses are shared by neighbouring branches of IE. Thus, the kentum languages form a continuous belt from Anatolia through southern to western and northern Europe (with serious exceptions, viz. Tocharian and proto-Bangani), and the satem isogloss likewise covers a continuous territory from central Europe to India, only later fragmented by the intrusion of Turkic. Hock provides (1999:15) a map showing ten isoglosses in their distribution over the geographically placed IE language groups, and we do note the geographical contiguity of languages sharing an isogloss. Why is this important? "What is interesting, and significant for present purposes, is the close correspondence between the dialectological arrangement in Figure 2 (based on the evidence of shared innovations) and the actual geographical arrangement of the Indo-European languages in their earliest attested stages. (...) the relative positions of the dialects can be mapped straightforwardly into the actual geographical arrangement if (...) the relative positions were generally maintained as the languages fanned out over larger territory." (Hock 1999:16) In other words: the geographical distribution of IE languages which actually exists happens to be the one which would, at the stage when the proto-languages were dialects of PIE, be best able to produce the actual distribution of isoglosses over the languages.

So, the relative location of the ancestor-languages in the PIE homeland was about the same as their location at the dawn of history. This, Hock proposes, is best compatible with a non-Indian homeland. And indeed, if the Homeland was in the Pontic region, the dialect communities could spread out radially, with the northwestern proto-Germanic tribe moving further northwest through what is now Poland, the western proto-Celtic tribe moving further west, the southwestern proto-Greek and proto-Albanian tribes moving further southwest through the Balkans, the southeastern proto-Indo-Iranians moving southeast, etc. (One reason given by the early Indo-Europeanists for assuming such radial expansion is that they found little inter-borrowing between IE language groups, indicating little mutual contact, this in spite of plenty of Iranian loans found in Slavic, some Celtic loans in Germanic, etc.) This way, while the distances grew bigger, the relative location of the daughters of PIE vis--vis one another remained the same.

If this is a bit too neat to match the usual twists and turns of history, it is at least more likely than an Indocentric variant of Hock's scenario would be: "To be able to account for these dialectological relationships, the 'Out-of-India' approach would have to assume, first, that these relationships reflect a stage of dialectal diversity in a Proto-Indo-European ancestor language located within India. While this assumption is not in itself improbable, it has consequences which, to put it mildly, border on the improbable and certainly would violate basic principles of simplicity. What would have to be assumed is that the various Indo-European languages moved out of India in such a manner that they maintained their relative position to each other during and after the migration. However, given the bottle-neck nature of the route(s) out of India, it would be immensely difficult to do so." (Hock 1999:16-17, emphasis Hock's)

I believe there is a plausible and entirely logical alternative. The geographical distribution of PIE dialects in the PIE homeland is unrelated to the location of their daughter languages; the isoglosses are the result of a twofold scenario, part areal effect and part genealogical tree, as follows. In part, they reflect successive migrations from the heartland where new linguistic trends developed and affected only the dialects staying behind. Gamkrelidze and Ivanov (1995:348-350) have built an impressive reconstruction of such successive migrations on an impressive survey of the linguistic material. To summarize:

1) Initially, there was a single PIE language.

2) The first division of PIE yielded two dialect groups, which will be called A and B. Originally they co-existed in the same area, and influenced each other, but geographical separation put an end to this interaction.

3) In zone A, one dialect split off, probably by geographical separation (whether it was its own speakers or those of the other dialects who emigrated from the Urheimat, is not yet at issue), and went on to develop separately and become Anatolian.

4) The remainder of the A group acquired the distinctive characteristics of the Tocharo-Italo-Celtic subgroup.

5) While the A remainder differentiated into Italo-Celtic and Tokharic, the B group differentiated into a "northern" or Balto-Slavic-Germanic and a "southern" or Greek-Armenian-Aryan group; note that the kentum/satem divide only affects the B group, and does not come in the way of other and more important isoglosses distinguishing the northern group (with kentum Germanic and predominantly satem Baltic and Slavic) from the southern group (with kentum Greek and satem Armenian and Aryan).

The second part is that the isoglosses not explainable by the former scenario are post-PIE areal effects, which is why they affect historically neighbouring languages, regardless of whether these had been neighbours when they were dialects of PIE. Archaeologists (mostly assuming a North-Caspian homeland) have said that the North-Central-European Corded Ware culture of ca. 3000 BC was a kind of secondary homeland from which the Western branches of PIE spread, again more or less radially, to their respective historical locations; the OIT would allot that role of secondary western-IE homeland to the Kurgan culture. In such a secondary homeland, IE-speaking communities would, before their further dispersal, be close enough to allow for the transmission of lexical innovations or common substratal borrowings (e.g. beech, cfr. Latin fagus; or fish, cfr. Latin piscis, unattested in eastern IE languages). Communities in truly close interaction, at whichever stage of the development of IE, would also develop grammatical isoglosses.

Hock (1999:14) himself unwittingly gives at least one example which doesn't easily admit of a different explanation: "The same group of dialects [Germanic, Baltic, Slavic] also has merged the genitive and ablative cases into a single 'genitive' case. But within the group, Germanic and Old Prussian agree on generalizing the old genitive form (...) while Lithu-Latvian and Slavic favor the old ablative".

But clearly, Old Prussian and Lithu-Latvian lived in close proximity and separate from Germanic and Slavic for centuries, as dialects of proto-Baltic, else they wouldn't have jointly developed into the Baltic group, distinct in many lexical and grammatical features from its neighbours. So, if the Baltic language bordering on the Germanic territory happens to share the Germanic form, while the languages bordering on Slavic happen to share the Slavic form, we are clearly faced with a recent areal effect and not a heirloom from PIE days. The conflation of cases has continued to take place in many IE languages in the historical period, so the example under consideration may well date to long after the fragmentation of PIE.

A second example mentioned by Hock may be the split within the Anatolian group, with Luwian retaining a distinction between velar and palatal but Hittite merging the two, just like its Greek neighbour. Positing an areal influence at the stage of PIE dialectal differentiation on top of an obviously existing areal influence in the post-PIE period seems, in this context, like a "multiplication of entities beyond necessity": neighbouring languages need not also have been neighbours at the dialectal PIE stage in order to transmit innovations, because their present or recent neighbourliness already allows for such transmissions.

As far as I can see from Hock's presentation, the twofold scenario outlined above is compatible with all the linguistic developments mentioned by him. For now, I must confess that after reading Hock's presentation, the linguistic problem which I have always considered the most damaging to an Indocentric hypothesis, doesn't look all that threatening anymore. The isoglosses discussed by him do not necessitate the near-identity of the directional distribution pattern of the PIE dialects with that of their present-day daughter languages, which would indeed be hard to reconcile with an out-of-India hypothesis. But I cannot as yet exclude that Hock's line of argument could be sharpened, viz. by proving that certain isoglosses must date back to PIE times, making it tougher to reconcile the distribution of isoglosses with an Indian homeland hypothesis.

2.5. Distribution of large and small territories

Another aspect of geographical distribution is the allocation of larger and smaller stretches of territory to the different branches of the IE family. We find the Iranian (covering the whole of Central Asia before 1000 AD) and Indo‑Aryan branches each covering a territory as large as all the European branches (at least in the pre‑colonial era) combined. We also find the Indo‑Aryan branch by itself having, from antiquity till today, more speakers on the Eurasian continent (now nearing 900 million) than all other branches combined. This state of affairs could help us to see the Indo-Aryan branch as the centre and the other branches as wayward satellites; but so far, philologists have made exactly the opposite inference.

It is said that this is the typical contrast between a homeland and its colony: a fragmented homeland where languages have small territories, and a large but linguistically more homogeneous colony (cfr. English, which shares its little home island with some Celtic languages, but has much larger stretches of land in North America and Australia all to itself, and with less dialect variation than in Britain). By that criterion, it may be remarked at once, the Pontic region too would soon be dismissed as an IE homeland candidate, for it has been homogeneously Slavic for centuries, though it was more diverse in the Greco-Roman period.

It is also argued that Indo‑Aryan must be a late‑comer to India, for otherwise it would have been divided by now in several subfamilies as distinct from each other as, say, Celtic from Slavic.

To this last point, we must remark first of all that the linguistic unity of Indo‑Aryan should not be exaggerated. The difference between Bengali and Sindhi may well be bigger than that between, say, any two of the Romance languages, especially if you consider their colloquial rather than their high-brow (sanskritized) register. Further, to the extent that Indo‑Aryan has preserved its unity, this may be attributed to the following factors, which have played to a larger extent and for longer periods in India than in Europe: a geographical unity from Sindh to Bengal (a continuous riverine plain) facilitating interaction between the regions, unlike the much more fragmented geography of Europe; long‑time inclusion in common political units (e.g. Maurya, Gupta and Moghul empires); and continuous inclusion in a common cultural space with the common stabilizing influence of Sanskrit.

As for the high fragmentation of IE in Europe when compared to its relative homogeneity in North India: from the viewpoint of an Indian homeland hypothesis, the most important factor explaining it is the way in which an emigration from India to Europe must have taken place. Tribes left India and mixed with the non‑lE‑speaking tribes of their respective corners of Central Asia and Europe. This happens to be the fastest way of making two dialects of a single language grow apart and develop distinctive new characteristics: make them mingle with different foreign languages.

Thus, in the Romance family, we find little difference between Catalan, Occitan and Italian, three languages which have organically grown without much outside influence except for a short period of Germanic influence which was common to them; by contrast, Spanish and Rumanian have grown far apart (lexically, phonetically and grammatically), and this is largely due to the fact that the former has been influenced by Germanic and Arabic, while the latter was influenced by Greek and Slavic. Similarly, under the impact of languages they encountered (now mostly extinct and beyond the reach of our searchlight), and whose speakers they took over, the dialects of the IE emigrants from India differentiated much faster from each other than the dialects of Indo‑Aryan.

To be sure, expanding Indo-Aryan communities have likewise merged with communities speaking now-extinct non-IE languages, but they remained continually in touch with neighbouring speakers of "pure" Indo-Aryan, so that they maintained the original standards of their language better. It is widely assumed that the Bhil tribals of Gujarat and Madhya Pradesh originally spoke a non-IE language, probably Nahali, yet: "No group of Bhils speak any but an Aryan tongue. (...) it is unlikely that traces of a common non-Aryan substratum will ever be uncovered in present-day Bhili dialects." (von Frer-Haimendorf 1956:x, quoted in Kuiper 1962:50).

One can still witness this process today: when tribals in Eastern and Central India switch over to Hindi, they retain at most only a handful of words from their Austro-Asiatic or Dravidian mother-tongues, because the influence of standard Hindi is continually impressed upon them by the numerous native Hindi speakers surrounding them (not to mention the media).

By contrast, upon arrival on the North-European coasts, the speakers of proto-Germanic merged completely with the at least equally numerous natives. Having covered greater distances and in smaller numbers than the gradually expanding Indo-Aryan agriculturalists in India, they lost touch with the language standards of their fathers because they were not surrounded by a compact and numerically overwhelming environment of fellow IE-speakers. This allowed a far deeper impact of the native language upon their own, differentiating it decisively from IE languages not influenced by the same substratum.

2.6. Go West

A seemingly common-sense objection to an Indian homeland is that it implies an IE expansion almost entirely in one direction: east to west, with the homeland lying in the far corner of the ultimate IE settlement area rather than in the centre. Isn't this odd?

Well, no: it is the rule rather than the exception. Chinese spread from the Yellow River basin southward, first assimilating Central and then South China. Arabic spread from Arabia a little northward and mostly westward. The circumstances in north and south, or in east and west, are usually very different, making the prospects of expansion very attractive on one side but quite uninteresting on the other. Spanish and English could expand westward, in the Americas, because of their steep technological-military edge over the natives; this did not apply in the equation of forces to their east, in Europe.

Assuming the OIT with Panjab-Haryana as the centre, we can safely surmise that a similar number of migrants went southeast c.q. northwest, yet their destinies were quite different. The first didn't have far to go: they colonized the rain forests of India's interior, where soil and climate allowed for the settlement of large populations on a relatively small surface. It was always easier to chop down another stretch of forest and expand locally than to leave the material security of interior India for a dangerous and probably pointless mountain trek into China or a sea voyage to Indonesia. By contrast, the second group going to Central Asia found itself challenged by more uncomfortable conditions: a variable climate, large stretches of relatively useless land, a crossroads location with hostile nomads or migrating populations passing through. They had to cross far larger distances in order to settle comfortably, mixing with many more people along the way, thus losing their physical Indianness and linguistically growing away from PIE fast and in different directions.

As an economic and demographic outpost of India, Bactria was, along with Sogdia, a launching-pad for the most ambitious migration in premodern history; the first Amerindians and Austronesians covered even larger distances but settled empty lands, while the Indo-Europeans assimiliated large populations in a whole continent. This followed (or rather, set) a pattern: recall how the Mongols conquered this region, thence to conquer the Western half of Asia and Eastern Europe; in the preceding centuries, the Turks; before that, the Iranians or (pars pro toto) Scythians; and first of all, the Indo-Europeans. Nichols 1997 (cfr. below) adds Kartvelian to this list, as one case of a language spread westward through the Central-Asian "spread zone" but entirely losing its foothold there, only to survive in a South-Causasian backwater; and points to the parallel westward movement of the Finno-Ugrians from Siberia to Northeastern Europe. Until the eastward expansion of Russia, Central Asia was subject to an over-arching dynamic of east-to-west migration. This may have started as early as the end of the Ice Age, when a depopulated Europe became hospitable again, and lasted until the reversal of the demographic equation, when European population pressures forced an eastward expansion.

3. Loans and substratum features

3.1. How to decide on the foreign origin of a word?

One widely accepted criterion for deciding whether a word attested in ancient Sanskrit is IE or not, is the presence of sound combinations which do not follow the standard pattern. It is argued that a word in a given language cannot take just any shape, e.g. a true English word cannot start with shl-, shm-, sht-. Consequently, when a word does contain such irregular sounds, it must be of foreign origin, i.c. German or Yiddish loans like schnitzel, schmuck, schlemiel. Likewise, a Sanskrit word cannot contain certain sound combinations, which would mark it as a foreign loan.

However, there are problems with this rule. Firstly, and invasionists should welcome this one, if a sound is too strange, chances are that people will "domesticate" it into something more manageable. This will result in a loan which differs in pronunciation from its original form, but which is no longer recognizable as a loan by the present criterion. Thus, in Sino-English, a boss or upper-class person is called a taiban, Chinese for "big boss"; there is nothing decisively un-English about this string of consonants and vowels. The one feature of this Chinese word which could have marked it as un-English, is its tones (tai fourth tone, ban third tone),-- but precisely that typically foreign feature has been eliminated from the English usage of the word. The same is true in Japanese, which has adopted hundreds of Chinese words after stripping them of tones and other distinctively Chinese phonetic characteristics. Likewise, Arabic has a number of sounds and phonemic distinctions unknown in European languages, which are systematically eliminated in the Arabic loans in these languages, e.g. tariff from ta'rfa with laryngeal 'ayn, or cheque from sakk with emphatic saad.

Another point is: how do you decide what the standard shape of a word in a given language should be? Witzel (1999/1:364) calls bekanTa "certainly a non-IA name" citing as reason the retroflex T and the initial b-. It may be conceded that the suffix -Ta is common in seemingly non-IA ethnonyms (kkaTa etc.), but the phonetic exceptionalism, by contrast, cannot be accepted as a valid ground for excluding an IA etymology. The dental/retroflex distinction must initially have been merely allophonic, representing a single but phonetically unstable phoneme; and at any rate, numerous purely IE words have acquired the retroflex pronunciation, e.g. SaD, "six", or aSTa, "eight". While b- may be rare in Old IA, there is no good reason to exclude it altogether from the acceptable native sounds of the language. It is also attested in bala, "strength", related to Greek bel-tin, "better", and Latin de-bil-is, "off-strength", "weak", a connection which Kuiper (1990:90) admits to be "attractive" though he would prefer to "accept the absence of /b/ in the PIE consonant system", it being otherwise only attested in the Celtic-Germanic-Slavic (hence probably Euro-substratal) root *kob, "to fall".

What threatens to happen here, is that the minority gets elbowed out by the majority, that the majoritarian forms are imposed as the normative and only permissible forms. Compare with the argument by Alexander Lehrman (1997:151) about accepting or excluding the rare sequence "e + consonant" as a possibly legitimate root in Hittite: "There is absolutely no reason why a lexical root of Proto-Indo-European (or Proto-Indo-Hittite) cannot have the shape *eC-, except the wilful imposition by the researching scholar of the inferred structure of a majority of lexical roots on a minority of them." (emphasis mine) The same openness to exceptions to the statistical rule is verifiable in other languages, e.g. Chinese family names are, as a rule, monosyllabic (the Mao in Mao Zedong), yet two-syllable names have also existed, though now fallen in disuse (the Sima in Sima Qian). As a rule, Semitic verbal roots have a "skeleton" of three consonants, yet a few with two or four consonants also exist. Admittedly, both examples also illustrate a tendency of the exception to disappear in favour of (or to conform itself to) the majoritarian form; but their very existence still provides an analogy for the existence of atypical minoritarian forms in IE, such as the b- phoneme.

Another point is that there may be a covert petitio principii at work here. Many assertions on what can or cannot be done in Indo-Aryan are based on the assumption that Vedic Sanskrit is more or less the mother of the whole IA group, it being the language of the entry point whence the Aryan tribes populated a large part of India. In an OIT scenario (e.g. Talageri 1993:145) of ancient Indian history, Sanskrit need not be the mother of IA at all, there being IA dialects developing alongside Vedic Sanskrit. Just as Vedic religion was but one among several Indo-Aryan religious traditions, the traces of which are found in the Puranas and Tantras, Vedic Sanskrit is but one among a number of OIA dialects. The eastward expansion of Vedic culture attested in the Atharva Veda, Shatapatha Brhmana etc. may have vedicized regions which were already IA-speaking though religiously and linguistically non-Vedic.

Thus, the sh/S > s shift in eastern Hindi and Bengali, e.g. subhSa > subhs, ghoSa > ghos, may be due to substratum influence (cfr. the case of Kosala in the next section), but then again, what is more ordinary than this inter-sibilant shift in dialectal variation? Remember Semitic salm/shalom, or the Biblical test of pronouncing sibboleth/shibboleth. This could be a substratum influence, but it could also simply be a spontaneous variation in a non-Vedic dialect of IA. More generally, one should not jump to conclusions of foreign origins without a positive indication. Mere oddities may come into being without adstratal or substratal influence (cfr. French phonetic oddities like nasalization or uvular [r]); they are not proof enough that IA was an intruding language replacing a native one.

3.2. River names in Panjab

If a word looks Sanskritic, it may still be of foreign origin, but thoroughly assimilated. With historical languages, the assimilation into Sanskrit sound patterns is well-attested, e.g. Greek dekanos becoming drekkNa, Altaic turuk becoming turuSka, Arabic sultan becoming suratrNa, etc. Sometimes this phonetic adaptation gives rise to folk-etymological reinterpretation, often with hypercorrect modification of the word, e.g. the rNa, "king", in suratrNa. Such adaptation can also take place even without etymological interpretation, just for reasons of "sounding right". Thus, it is often said (e.g. Witzel 1999/1:358) that Yavana, vaguely "West-Asian", is a hypersanskritic back-formation on Yona, Ionia, i.e. the name of the Asian part of Greece. This principle underlies the Sanskrit looks of many foreign loans in Sanskrit.

Witzel uses this phenomenon to explain the Sanskrit looks of no less than 35 North-Indian river names: "Even a brief look at this list indicates that in northern India, by and large, only Sanskritic river names seem to survive". (1999/1:370) He quotes Pinnow 1953 as observing that over 90% don't just look IA but "are etymologically clear and generally have a meaning" in IA. He attributes this unexpectedly large etymological transparency to "the ever-increasing process of changing older names by popular etymology". This hypothesis of a very thorough assimilation of foreign names with pseudo-etymology is a possibility but quite unsubstantiated, a complicated explanation satisfying AIT presumptions but not Occam's razor. It has no counterpart in any other region of IE settlement, e.g. in Belgium most river names are Celtic or pre-Celtic and make no sense at all in Dutch or French; yet in their present forms no attempt is in evidence of semantically romanizing or germanicizing them. In the US, there are plainly native river names like Potomac, and plainly European ones like Hudson, but no anglicized native names. So, most likely, the Sanskrit-looking river names are simply Sanskrit.

This may be contrasted with the situation farther east in the Ganga plain, where we do find many Sanskrit-sounding names of rivers and regions which however do not have a transparent etymology, e.g. kaushik or koshala, apparently linked to Tibeto-Birmese kosi, "water", and the name of the river separating Koshala from Videha. In that case, we also see the ongoing sanskritization: kaushik evolved from kosik (attested in Pali), and koshala from kosala, which Witzel (1999/1:382) considers as necessarily foreign loans because the sequence -os- is "not allowed in Sanskrit". But while the phonetic assimilation can be caught in the act, we can see no semantic domestication through folk etymology at work. The name koshala doesn't mean anything in Sanskrit, and that is a decisive difference with the Western hydronyms gomat, "the cow-rich one", or asikn, "the dark one". While the occurrence of some folk-etymological adaptation among the Panjabi river names can in principle be conceded, it is highly unlikely to be the explanation of all 35 names. Until proof of the contrary, the evidence of the Northwest-Indian hydronyms goes in favour of the absence of a non-IE substratum, hence of the OIT.

3.3. Exit Dravidian Harappa

The European branches of IE are all full of substratum elements, mostly from extinct Old European languages. For Germanic, this includes some 30% of the acknowledged "Germanic" vocabulary, including such core lexical items as sheep and drink; for Greek, it amounts to some 40% of the vocabulary. In both cases, extinct branches of the IE family may have played a role along with non‑lE languages (vide Jones-Bley and Huld 1996:109-180 for the Germanic case). The branch least affected by foreign elements is Slavic, but this need not be taken as proof of a South‑Russian homeland: in an Indian Urheimat scenario, the way for Slavic would have been cleared by other IE forerunners, and though these languages would absorb many Old‑European elements as substratum features, they also eliminated the Old‑European languages as such and prevented them from further influencing Slavic.

Even if we accept as non‑lE all the elements in Sanskrit described as such by various scholars, the non‑lE contribution is still smaller than in some of the European branches of IE, which bear the undeniable marks of "Aryan" invasions followed by linguistic assimilation of large native populations. Among the highest estimates is the 5% to 9% of loans in Vedic Sanskrit proposed by Kuiper 1991:90-93, in his list of 383 "foreign words in the Rigvedic language". A number of these words are certainly misplaced: some have no counterpart in Dravidian or Munda, or when they do, there is often no reason to assume that the direction of borrowing was into rather than out of Indo-Aryan.

To take up one example, the name of the seer Agastya is a normal Sanskritic derivation of the tree name agasti, "Agasti grandiflora" (Kuiper 1991:7 sees the derivation as a case of totemism). This word is proposed to be a loanword, related to Tamil akatti, acci, as if the invaders borrowed the name from Dravidian natives. That non-Indian branches of IE do not have this word, says nothing about its possible IE origins: they didn't need a word for a tree that only exists in India, so they may have lost it after emigrating. It is perfectly possible that the Tamil word was derived from Sanskrit agasti, and by looking harder we just might discern an IE etymon for it, e.g. Pirart (1998:542) links Agastya with Iranian gasta, "foul-smelling, sin".

But let us accept that some 300 words in Kuiper's list are indeed of non-IE origin. Even then, the old tendency to impute Dravidian origins to IA words of unclear etymology must be abandoned because the underlying assumption of a Dravidian-speaking Harappan civilization has failed to get substantiated. Likewise, the relative convergence of Indo-Aryan and Dravidian (as well as Munda and to an extent Burushaski) in phonetic, lexical and grammatical features, forming a pan-Indian linguistic zone (vide e.g. Abbi 1994), is no longer explained as the substratal effect of an India-dominating Dravidian culture.

That the Dravidians are not native to their present habitat, had already been accepted: "Arguments in favour of the South Indian peninsula being the original home of the Dravidian language family, very popular with Tamil scholars at one time, cannot resist the weight of the evidence, both archaeological and linguistic." (Basham 1979:2)

Now, even Harappa is being lifted out of their claimed heritage. Bernard Sergent (1997:129) and Michael Witzel (1999/1:385) are among the latest experts to bid goodbye to the popular assumption that Harappa was Dravidian-speaking. Indeed, the most important shift in scholarly opinion in recent years is the realization that, when all is said and done, there is really not a shred of evidence for the identification of the Harappans as Dravidian, even though several elaborate attempts at decipherment of the Indus script (Fairservis 1992, Parpola 1994) have been based on it.

Some of the arguments classically used against Vedic Harappa equally stand in the way of Dravidian Harappa, e.g. like Vedic culture, the oldest attested Dravidian culture was not urban: according to McAlpin (1979:181-182), the Dravidians "were almost certainly transhumants practising both herding and agriculture, with herding the more unbroken tradition".

Of course, in both cases, a chronological shift placing them in the pre-urban pre-Harappan period could solve this problem. More importantly, the Dravidian contribution to the Indo-Aryan languages is not such as one would expect if Indo-Aryan newcomers had incorporated a prestigious Dravidian-speaking city culture. Even linguists eager to discover Dravidian words in IA are surprised to find how small their harvest is: "Dravidian influence is less than has been expected by specialists." (Wojtilla 1986:34)

Judging from the substratum of place-names, Dravidians once were located along the northwestern coast (Sindh, Gujarat, Maharashtra) in the southern reaches of the Harappan civilization. Parpola (1994:170) points out the presence of a Dravidian substratum, starting with the place-names: "palli, 'village' (whence valli and modern -oli, -ol in Gujarat), corresponding to South-Dravidian paLLi; and pTa(ka) or pTi (whence vTa, vTi, etc., modern -vD, vD etc. in Gujarat) as well as paTTana (Gujarati paTTan), all originally 'pastoral village' from the Dravidian root paTu, 'to lie down to sleep'. In addition to place-names, other linguistic evidence suggests that Dravidian was formerly spoken in Maharashtra, Gujarat and, less evidently, Sindh, all of which belonged to the Harappan realm. It includes Dravidian structural features in the local Indo-Aryan languages Marathi, Gujarati and Sindhi, such as the distinction between two forms of the personal pronoun of the first person plural, indicating whether the speaker includes the addressee(s) in the concept 'we' or not. Dravidian loanwords are conspicuously numerous in the lower-class dialects of Marathi." Add to this the cultural influence, e.g. the Dravidian system of kinship (Witzel 1999/1:385).

So, that is how a Dravidian past perpetuates itself along the presently IA-speaking coastline, but it is conspicuous by its absence in the language and culture of Panjab and the Hindi belt. The latter has much fewer Dravidian elements than the link language Sanskrit, e.g. the Dravidian loan mna, "fish", caught on in Sanskrit but never in Hindi. There is no reason to assume a Dravidian presence in North India at any time. The main part of the Harappan civilization was definitely not Dravidian if we may judge by the substratum evidence there, e.g. the lack of Dravidian hydronyms. There are also no indications that South-Indian Dravidian culture is a continuation of Harappan culture.

The Dravidians may have entered Sindh through the Bolan Pass from Afghanistan (Samuel 1990:45), possibly as late as the 3rd millennium BC (McAlpin 1979), though I am not aware of any firm proof against their indigenous origins. Vedic culture was established in the Panjab for quite some time before encountering Dravidian, considering that the oldest layers of Vedic literature do not contain loans from Dravidian: according to Witzel (1999/2:$1.1), "RV level 1 has no Dravidian loans at all". Dravidian loans appear only gradually in the next stages (i.e. when Indo-Aryan culture penetrates Dravidian territory) and are typically terms used in commercial exchanges, indicating adstratum rather than substratum influence. With that, Dravidian seems now to have been eliminated from the shortlist of pretenders to the status of Harappan high language.

3.4. Pre-IE substratum in Indo-Aryan: para-Munda

Unlike Dravidian, other languages seem to have exerted an influence on Sanskrit since the earliest Vedic times: chiefly a language exhibiting Austro-Asiatic features, hence provisionally called para-Munda, not the mother but at least an aunt of the Munda languages still spoken in Chhotanagpur. Where IA-Dravidian likenesses in words without apparent IE etymology were hitherto often explained as Dravidian substratum in IA, the favourite explanation now is that Dravidian borrowed from IA what IA itself had borrowed from para-Munda, e.g. mayra, "peacock" was derived from Munda *mara and in its turn yielded Tamil mayil. A second influence is attributed to an unknown language, nonetheless discernible through consistent features, and provisionally called Language X.

Indian non-invasionists strongly dislike the alleged fondness of Western linguists for "ghost languages", e.g. Talageri (1993:160) dismisses "purely hypothetical extinct languages" thus: "We cannot proceed with these scholars into the twilight zone of non-existent languages." But the simple fact remains that numerous languages have died out, and that the ghost of some of them can be seen at work in anomalous elements in existing languages. Thus, the first Sumerologists noticed an un-Sumerian presence of remnants of an older language typified by reduplicated final syllables, hence baptized "banana language". Today, much more is known about a pre-Sumerian Ubaidic culture, which has become considerably less ghostly.

In the para-Munda thesis, the hypothetical para-Munda language seems to be the main influence, reaching far northwest to and even beyond the entry point of the Vedic Aryans in India, and definitely predominant in the whole Ganga basin. The word gaGg itself has long been given an Austro-Asiatic etymology, esp. linking it with southern Chinese kang/kiang/jiang, supposedly also an Austro-Asiatic loan. The latter etymology has recently been abandoned, with the pertinent proto-Austro-Asiatic root being reconstructed as *krang and the Chinese word having a separate Sino-Tibetan origin (Zhang 1998). Witzel (1999/1:388) now proposes to explain Ganga as "a folk etymology for Munda *gand", meaning "river", a general meaning it still has in some IA languages. The folk etymology would be a reduplication of the root *gam/ga, "moving-moving", "swiftly flowing", which only applies meaningfully to the river's upper course, nearest to the Harappan population centres. But there is no decisive reason why the folk etymology could not be the real one, nor why some other IE etymology could not apply. (Experimentally: what about a phonetically impeccable kinship with Middle Dutch konk-elen, "twist and turn", related to English kink, "torsion"?)

In some cases, a Munda etymology is supported by archaeological evidence. Rice cultivation was developed in Southeast Asia (including South China), land of origin of the Austro-Asiatic people, who brought it to the Indus region by the late-Harappan age at the latest. Therefore, it is not far-fetched to derive Sanskrit vrihi from Austro-Asiatic *vari, which exists in practically the same form in Austronesian languages like Malagasy and Dayak, and reappears even in Japanese (uru-chi), again pointing to Southeast-Asia as the origin and propagator in all directions of both the cultivation of rice and its name *vari.

All this goes to confirm that at least linguistically, the Munda tribals are not "aboriginals" (with a pseudo-native modern term, divss), but carriers and importers of Southeast-Asian culture. Witzel himself acknowledges that "Munda speakers immigrated", as this should explain why in Colin Masica's list of agricultural loans in Hindi (1979), which in conformity with the invasionist paradigm is very generous in allotting non-IE origins to Indo-Aryan words, Austro-Asiatic etymologies account for only 5.7%. In borrowing so few Munda words, the Vedic Aryans clearly did not behave like immigrants into Munda-speaking territory.

This paucity of Munda influence in the agricultural vocabulary, soil-related par excellence, should also caution us against reading Munda etymologies into the equally soil-bound hydronyms, which are overwhelmingly Indo-Aryan from the kubh to the yamun. Witzel (1999/1:374) diagnoses the usual Sanskritic interpretations as artificial "popular etymology", but in most cases does not produce convincing Munda alternatives. The one plausible Munda etymology is for shutudr (prefix plus *tu-, "to drift", plus *da, "water", Witzel 1999/2:$1.4), if only because the Vedic Aryans themselves showed their unfamiliarity with it by devising folk etymologies like shata-druk, "hundred streams"; even there, the step from -da to -dr, though possible, does not impress itself as compelling.

Numerous words have wrongly or at least prematurely been classified as foreign loans. Talageri (1993:169-170) gives the examples of animal-names like khaDgin ("breaker", rhinoceros), mtaMga ("roaming at will", elephant), gaja ("trumpeter", elephant), which Suniti Kumar Chatterji had cited as loans from Dravidian or Munda but which easily admit of an IE etymology. Likewise, there may well be an IA explanation for terms commonly given non-IE etyma, e.g. exotic-sounding ulkhala, "mortar (for soma)", may well be analysed, following Paul Thieme, into IA uru, "broad", plus khala, "threshing-floor", or even khara, "rectangular piece of earth for sacrifices" (with Greek cognate, eschara), albeit with vulgar -l- pronunciation. The word mayra, "peacock", is often given a Dravidian or (by Witzel 1999/1:350) Munda etymon, but Monier Monier-Williams (1899:789) already derived it from an onomatopoeic IA root *m, "bleat", and the related words in non-IA languages may very well be derived from IA forms (but in this case, the suffix -r-, unknown in Indo-Aryan, pleads in favour of a foreign origin).

As a rule, one should not allot Dravidian or Munda origins to an IA word unless the etymon can actually be pointed out (at least indirectly) in the purported source language. It is therefore with great reservation that we should consider the list of para-Munda words "in the RV, even if we cannot yet find etymologies". (Witzel 1999/2:$1:2) However, many hypothetical etyma which do not exist in Munda in full, and which should at first sight be rejected, may be analysed as composites with components which do exist in Munda.

The main pointer to a Munda connection seems to be a list of prefixes, now no longer productive in the Munda languages, and not recognized or used as prefixes by Vedic Sanskrit speakers. Thus, the initial syllable of the ethnonym k-kaTa seems to be one in a series of non-IA and probably para-Munda prefixes ka/ke/ki etc. (Witzel 1999/1:365), some of which look like the declension forms of the definite article in Khasi, an Austro-Asiatic language in the Northeast. On this basis, very common words become suspected loans from "para-Munda", e.g. ku-mra, "young man", a term not explainable in IE, but plausibly related to a Munda word mar, "man" (Witzel 1991/2:$1.2).

Between Sanskrit karpsa, "cotton", and Munda ka-pas (cfr. Sumerian kapazum), it may now be decided that the latter was first while the former, with its typical cluster -rp-, is but a hypersanskritized loan. This also fits in with the archaeological indications of textile-manufacturing processes pioneered by the Southeast-Asians, and with an already-established Austro-Asiatic etymon *pas (without the prefix) for Chinese bu, "cotton cloth". Incidentally, this does not affect the argument by Sethna that the appearance of this word in late-Vedic, regardless of its provenance, should be synchronous with the appearance of actual cotton cloth in the Panjab region, viz. in the mature Harappan phase (implying that early Vedic predated the mature Harappan phase); indeed, Sethna (1982:5) himself accepts the Austro-Asiatic etymology.

An interesting little idea suggested by Witzel concerns an alleged alternation k/zero, e.g. in the Greek rendering of the place-name and ethnonym Kamboja (eastern Afghanistan) as Ambautai, apparently based on a native pronunciation without k-. Citing Kuiper and others, Witzel (1999/1:362) asserts that "an interchange k : zero 'points in the direction of Munda'" though this "would be rather surprising at this extreme western location". Indeed, it would mean that not just Indo-Aryan but also other branches of Indo-Iranian have been influenced by Munda, for Kam-boja seems to be an Iranian word, the latter part being the de-aspirated Iranian equivalent of Skt. bhoja, "king" (Pirart 1998:542). At any rate, if the Mundas could penetrate India as far as the Indus, they could reach Kamboja too.

But the interesting point here is that the "interchange k : zero" is attested in IE vocabulary far to the west of India and Afghanistan, e.g. English ape corresponding to Greek kepos, Sanskrit kapi, "monkey", or Latin aper, "boar", corresponding to Greek kapros. Gamkrelidze and Ivanov (1995:113, 435) have tried to explain this through a Semitic connection, with the phonetic and physiological closeness, somewhere in the throat, of qof and 'ayn. But if the origin of this alternation must be sought in an Afghano-Munda connection, what does that say about the geographical origin of English, Latin and Greek?

Given the location of the different language groups in India, it is entirely reasonable that Munda influence should appear in the easternmost branch of IE, viz. Indo-Aryan. If both IE and Munda were native to India, we might expect Munda influence in the whole IE family (though India is a big place with room for non-neighbouring languages), but since Munda is an immigrant language, we should not be surprised to find it influencing only the stay-behind IA branch of IE. This merely indicates a relative chronology: first Indo-Aryan separated from the other branches of IE when these left India, and then it came in contact with para-Munda. So, if we accept the presence of para-Munda loans in Vedic Sanskrit, we still need not accept that this is a native substratum influence in a superimposed foreign language.

3.5. Pre-IE substratum in Indo-Aryan: language X

The mysterious language X has possibly not left this earth without a trace, for it is tentatively claimed to be connected with the nearly-vanished but known Kusunda language of Nepal (Witzel 1999/1:346). Masica (1979) had found no known etymologies for 31% of agricultural and flora terms in Hindi, and Witzel credits these to language X (1999/1:339). I would caution, with Talageri (1993:165 ff.), against prematurely deciding on the non-IE origin of a word not having parallels in other IE languages, especially in the case of terms for indigenous flora and fauna. Though Sanskrit kukkura or Hindi kutt, both "dog", have no IE cognates outside India, we cannot expect the Aryans to have been ignorant of this animal and to have learned about it from the Indian natives upon invading. Onomatopoeic or otherwise slang formations just come into being and sometimes replace the original standard terms, without implying foreign origin or a substratum effect.

The OIT has no objection to the impression that Vedic Sanskrit has absorbed some foreign words, e.g. from immigrants into their metropolis, just like the Romance languages borrowed many Germanic words from the Gothic invaders. All that the OIT requires is merely that this absorption should have taken place after the emigration of the other branches of IE from India. Also, it is accepted that substratal effects may have taken place during the Aryan "colonization" of the non-Aryan lower Ganga plain, in which the western IE languages took no part.

One discernible trait of this ghost language X is claimed to be the "typical gemination of certain consonants" (Witzel 1999/2:$1.1), e.g. in the name of the malla tribe/caste. Often these geminates are visible upon first borrowing but are later masked by hypersanskritic dissimilation, e.g. pippala becoming pishpala, or guggulu becoming gulgulu (Witzel 1999/2:$2.4). However, the geminated -kk- in kukkura or the -tt- in kutt, though atypical of the IE word pattern, can perfectly come into being as onomatopoeic formations within a purely IE milieu: in imitating the sound of a dog, even IE-speakers need not have assumed that barking sounds follow the IE pattern.

The assumption of a language X in North India will be welcomed by many as the solution to the vexing question of the origin of retroflexion in the Indian languages. Weak in Burushaski and Munda, strong yet defective (never in initial position) in Dravidian, strong in Indo-Aryan but unattested among its non-Indian sister-languages, retroflexion in its origins is a puzzling phenomenon. So, language X as the putative language of the influential Harappan metropolis, or as the native substratum of the later metropolitan region, viz. Eastern Uttar Pradesh and Bihar, might neatly fit an invasionist scenario for the genesis of retroflexion in Indo-Aryan as well as its spread to all corners of India.

Still, there is no positive reason yet for locating the origin of retroflexion in this elusive language X. An entirely internal origination of retroflexion within early Indo-Aryan, which then imparted it to its neighbours, has always had its defenders even among linguists working within the invasionist paradigm (e.g. Hamp 1996). And consider the following possibility.

The Vedic hymns may well be somewhat older than the language in which they have come down to us. We need not exclude a phonetical evolution between the time of composition and the time when the Veda was given its definitive shape, traditionally by vysa, "compiler". Strictly speaking, it is not even impossible that a hymn composed in a language phonetically close to PIE, pre-proto-Indo-Iranian, subsequently underwent the kentum/katem shift and the vowel shift from IE /a/e/o/ to Sanskrit /a/, somewhat like the continuity of living Latin across centuries of phonetic change: Caesar evolving from [kaisar] to [cezar] or [sezar], agnus (lamb) from [agnus] to [anyus], cyclus from [kklus] to [ciklus] or [siklus], descendere from [deskendere] satemized to [deshendere], the vowels ae/oe/e coinciding as [e], etc. In the Middle Ages, Virgil's verses were still recited, but with a different pronunciation, just as in China, children memorized the Confucian Classics in the pronunciation of their own day, without knowing what the ancient masters' own pronunciation must have sounded like. Similarly, the Vedic hymns may well be older than the language form in which they have been preserved till today.

A very modest application of this line of thought is the hypothesis that the differentiation between dental and retroflex or cerebral consonants was not yet present in the original Vedic, and only developed by the time Sanskrit reached its classical form. Deshpande (1979) argues that the cerebral sounds crept in when the centre of Brahminical learning had shifted from Sapta-Sindhu to the Ganga basin, where the Indo-Aryan dialects had developed the dental-cerebral distinction. In that case, the Veda recension which we have today (the mNDkeya and shkalya recensions, which Deshpande dates to 700 BC), was established in Videha-Magadha (Bihar), where native speakers imposed their pronunciation on the Veda.

Deshpande also mentions a Magadhan king Shishunaga (5th century BC?) who prohibited the use of the retroflex sounds T/Th/D/Dh/S/kS in his harem. But this seems to indicate that retroflexion was an intrusive new trend in Magadha, not at all a native tendency which was so strong and ingrained that it could impose itself on the liturgical language. Something may be said for Kuiper's (1991:11-14) rebuttal to Deshpande's thesis, viz. that mNDkeya's insistence on retroflex pronunciation was a case of upholding ancient standards against a new and degenerative trend, implying that retroflexion was well-established by the time the Vedas were composed, and was being neglected in the new, eastern metropolis. That puts us back at base one: Munda (probably the main influence in Bihar) is clearly not the source of retroflexion, and that elusive language X didn't have much lexical impact on Vedic yet, making phonological influence even less likely. So if retroflexion was already present in Vedic, and otherwise too, the search for its origin continues.


3.6. The peculiar case of "Sindhu"

Among IA-looking river names, a case can be made for surprising IE etymologies of names usually explained as loans. In particular, sindhu might be an "Indo-Iranian coinage with the meaning 'border river, ocean' and fits Paul Thieme's etymology from the IE root *sidh, 'to divide'". (1999/1:387) Now, if the Vedic Aryans only entered India in the 2nd millennium BC, the name Sindhu cannot be older than that.

According to Oleg Trubachov (1999), elaborating on a thesis by Kretschmer (1944), Indo-Aryan was spoken in Ukraine as late as the Hellenistic period, by two tribes knows as the Maiotes and the Sindoi, the latter also known by its Scythian/Iranian-derived name Indoi and explicitly described by Hesychius as "an Indian people". They seem to have used a word sinu, from sindhu, for "river", a general meaning which it also has in some Vedic verses. Trubachov lists a number of personal and place names recorded by Greek authors (e.g. Kouphes for the Kuban river, apparently a re-use of kubh, the Kabul river, Greek Kophes), and concludes that the Maiotes and Sindoi spoke an Indo-Aryan dialect, though often with -l- instead of -r-, as in king Saulios, cfr. srya (just the opposite from Mitannic, where palita, "grey", and pingala, "reddish", appear as parita and pinkara) and with -pt- simplified to -tt- (so that, just like in Mitannic, sapta appears as satta, a feature described by Misra 1992 as "Middle IA").

Working within the AIT framework, Kretschmer saw these Sindoi as a left-over of the Indo-Aryans in their original homeland, and even as a splendid proof of the Pontic homeland theory (Trubachov is less committed to any particular homeland hypothesis). In that case, again, the name sindhu (and likewise kubh) would be an Indo-Aryan word brought into India by the Vedic-Aryan invaders.

However, Witzel himself (1999/2:$1.9) notes that the Sumerians (who recorded a handful of words from "Meluhha"/Sindh, which incidentally seem neither IA nor Dravidian) in the 3rd millennium already knew the name sindhu as referring to the lower basin of the Indus river, then the most accessible part of the Harappan civilization, whence they imported "sinda" wood. If this is not a coincidental look-alike, then either sindhu is a word of non-IE origin already used by the non-IE Harappans, in which case the Pontic Sindoi were migrants from India (demonstrating how earlier the Kurganites might have migrated from India?); or sindhu was an IE word, and proves that the Harappan civilization down to its coastline was already IA-speaking.

