Souk (en. /suːk/, natively ƪთ:ɕ'oˀơơ°:ʚ" [pʰe.əˀra.o.sɑ sʉ.ək̚], romanized phe:raaosa su:k) is a Kai-Souk language of the Song language family, and the native language of the Kai people in Indochina. Souk is one of the oldest languages in Southeast Asia, and has more speakers than the rest of the Kai-Souk languages combined. The language has a complex system of social registers and honorifics, often reflected in syntax and morphology. Due to the overwhelming influence of the Indian dynasties in Southeast Asia, Souk is composed of thousands of loanwords from Sanskrit and Pali; indeed the royal register and especially liturgical register are composed almost entirely of such loanwords.

Souk is a pitch-accent and mora-timed language. The language is primarily isolating; however, it employs many particles to express grammatical relationship and some infixes and suffixes in derivational morphology.

For much of the Kai people who do not speak Souk as their native language, and for closely-related ethnic groups speaking different native languages, Souk is a de facto lingua franca in the region, alongside French.

phe:raaosa su:k
Head direction
Nouns decline according to...
Case Number
Definiteness Gender
Verbs conjugate according to...
Voice Mood
Person Number
Tense Aspect


With 10 million native speakers, Souk is the most widely-spoken of the Song languages. There are competing theories for the classification of Song. The family bears many resemblances to the Austroasiatic languages, notably the existence of sesquisyllabic patterns and isolating morphology. However, linguists have been unable to adequately infer a genetic relationship to Mon-Khmer (synonymous with Austroasiatic) or its ancestors, due to many seemingly unrelated elements such as moraic-timing and an uncommon morphosyntactic alignment. The most likely case seems to be that proto-Song originally developed as a creole between proto-Mon-Khmer and an unknown native language.


The Souk language has a long history; indeed, it is believed the language has been spoken for about a thousand years, and its ancestors for many centuries before that. However, existing written records which attest to the use of the language are none more than five centuries in age, as the language lacked a writing system until the mid-sixteenth century. Most accounts of the language and its distribution before that time are from Khmer, Indian and Chinese sources and records.


The theory says that circa 100 AD, speakers of the Proto-Mon-Khmer language had made contact with the early Kai people on the Samut Peninsula south of the Mekong River Delta, and began to exert overwhelming influence upon them. The Proto-Mon-Khmer speakers were the more modern empire, and they colonized the land of the Kais and set up their language as the de facto language of trade and education. Lacking any formal institutions, the native Kai people began to utilize Proto-Mon-Khmer more and more, and circa 200 AD, a creole had developed which became by far the primary way of communication amongst Kai people. At some point around 450 ADF, the speakers of Proto-Mon-Khmer, being outnumbered by the Kais on the peninsula, either assimilated or were driven out, and once again the Kai people found themselves with a strong national identity.

By 700 AD, the language of the Kai people had been widely-spread and developed, and any traces of pidgin-like nature were long ago lost; few if any Kais were aware of the creole origins of their language, which we refer to as Proto-Song, as most of the dialects around the foothills of the Song mountains. It was at this point that the many dialects began to develop, which would eventually shoot off into separate languages.

Old SoukEdit

The million or so speakers of Proto-Song had separated into distinct dialects, many of which were spoken on isolated islands in the Gulf of Thailand and which thus lacked influence from the mainland. These dialects soon became languages which were often mutually unintelligible. The most widely-spoken of these languages were the Kai-Souk languages; the other branches of Song were spoken in only very remote communities on small atolls and in the hills, places in which Kai-Souk languages remained the languages of trade. Eventually, as the area became more connected, these non-Kai-Souk languages all but disappeared.

The most popular language of the Kai-Souk family was Old Souk, which was spoken in the most urban areas of the Kai Kingdom, and which had developed a writing system; first taking after Sanskrit and Pali, and then later an abugida developed from early Khmer writings. By 1100 AD, Old Souk easily had more speakers than the rest of the Kai-Souk languages, and was the only one with a proper writing system.

Old Souk began to evolve into distinct registers, often to the point that one could easily tell the class or even occupation of a speaker, if not by his choice of words then by his accent. The massive influence of Hinduism and later Buddhism on the Kai people saw many loanwords entering Old Souk from the Indic languages; soon, members of the royal court and especially the liturgical communities spoke Old Souk using almost exclusively Indic vocabulary.

In the later stages of Old Souk (circa 1600), most of the dialects and registers began to lose their phonation-contrast and pitch-register, in favor of a simple pitch-accent system. However, only by the late 1700s were such features almost entirely extinct.

Modern eraEdit

Modern Souk has not changed much since the early 1800s, save for many loanwords entering the language from French and later English, as well as a few spelling reforms. Souk is still the de facto language of trade among the Kai people, many of whom speak other Kai-Souk languages as their mother tongue, especially on the islands. There is no official 'standard' dialect of Souk, and no government body responsible for instruction. Most young Kais, even in the cities, are educated by Buddhist monks who teach them the liturgical register; over 90% of Kai people are literate because of this system. Kai people use a watered-down version of the liturgical register as formal and written language, and learn the colloquial register amongst their families and communities.

Sound SystemEdit

Souk phonology is more complex than that of Old Souk and its ancestors, especially concerning the vowels. The many tones which existed in Old Souk have transformed into new vowel phonemes. The phonetic system here best represents the phonemes as they are spoken around the Mekong River Delta, which is the dialect with the most speakers and which has been recognized by some linguists as a standard for the language. Some of the phonemes below have merged or diverged in other, especially rural, dialects.


Front Central Back
Near-close i ɨ • ʉ u
Close-mid e
Mid ø ə1
Open-mid ɛ
Near-open a~ä2 ɑ~ɒ3
  1. Schwa only exists in sesquisyllables and as reduction in long vowels
  2. Usually very centralized, but some utterances have been analyzed as containing pure [a]
  3. A semi-rounded vowel, somewhere between [ɑ] and [ɒ]

Long vowels occupy two morae. Any vowel other than 'ə' may be long, and length is phonemic. Vowel length was originally pure, with the long vowel remaining at the same place of articulation throughout; indeed this is preserved in rural dialects. In the so-called 'standard' dialect, however, the second half of a long vowel undergoes reduction, causing the long vowel to glide from its normal realization toward a more central position (nearer to 'ə'). Long rounded vowels are almost entirely unrounded by their end.

Thus /aː/ sounds like [a.ɐ], /iː/ sounds like [i.ɪ], and /ʉː/ like [ʉ.ə].

In Old Souk, consonant clusters would exist within a single syllable (along with a following vowel), such that a word like kmoo would be only one syllable in length. As the language began to become more mora-timed, the initial consonant in a cluster would be somewhat geminated. In modern Souk, which is entirely mora-timed, all consonant clusters are spread out over two morae. This is the role of the schwa [ə] in Souk: for stop consonants which cannot be properly geminated, [ə] is pronounced between the initial stop (plosive) consonant and the following cluster-forming consonant, producing an even moraic timing. The schwa sound in clusters has no pitch distinction and is never stressed.

Thus / is realized [kə.'mu], with far more emphasis on the second mora. No schwa is needed for non-plosive consonants, such that /m.ra/ is [m.ra], with the same duration on [m] as [ra].


Bilabial Alveolar Velar Glottal
Nasal m n1 ŋ~ɴ2
Plosive p • pʰ t • tʰ~cʰ3 k • kʰ ʔ
Implosive ɓ4 ɗ5
Fricative s h6
Approximant w~β̞7 j~ɰ8
Liquid r~ɾ
  1. /n/ is realized as palatal [ɲ] before a front vowel.
  2. Coda /ŋ/ remains velar in many dialects, but has become uvular among younger speakers, especially in more densely-populated areas. Initial /ŋ/ is always velar.
  3. Aspirated /tʰ/ sounds more like [cʰ] in the 'standard' dialect(s).
  4. Coda /ɓ/ becomes [b], usually unreleased but distinct from [p̚].
  5. Coda /ɗ/ is not implosive, but an interdental approximant [ð̞].
  6. /h/ is closer to [ɸ] before rounded vowels or labial consonants.
  7. Unless /w/ is a semivowel at the end of a diphthong, it is closer to [β̞].
  8. Behaves like [j], but closer to velar than palatal for most speakers. Some educated speakers (especially in urban areas) realize this phoneme exclusively as [j].


The pitch-accent of Souk developed from the pitch-register system of Old Souk, in which certain tones and phonation contrasts were dependent on each other, and neither could exist independently. For example, low tone only existed in vowel-coda syllables, and almost all falling tone syllables were nasal-coda. This has evolved into a highly predictable modern pitch-register system.

In modern Souk, there is the middle/level tone and the low-falling tone. High-falling tone long ago merged with low-falling, and rising with middle. With the exception of adpositions, all nasal-coda syllables feature low-falling tone. Any stop-coda syllable will have level tone, unless the vowel is long, in which case usually the tone will be low-falling. Vowel-coda syllables, short or long, by default have low-falling tone, unless the syllable is an adposition, a loanword, or features tabla (laryngealization).

With these guidelines, we know that rwaan (mango) is pronounced as low falling [r.wàn] and rwaat (macaw) is level [r.wat̚]. One would assume that the word tabla would have low tone; however, the final syllable is laryngealized, thus the word features level tone. As another example, consider phe:raaosa su:k [pʰe.əˀra.o.sɑ ↘sʉ.ək̚] (Souk language); due to the rule of low falling pitch on long vowels, as in su:k [↘sʉ.ək̚], we would assume that phe:raaosa [pʰe.əˀra.o.sɑ] would also be low falling, due to the long vowel e: [e.ə]. This would ordinarily be correct; however, [əˀra] is actually a genitive infix, and as such the root word is pheosa [pʰe.o.sɑ] (language), which has level pitch. The fact that the infix elongates the preceding vowel [e] does not thus produce nor contribute to a change in pitch.

The pitch-accent system is a global system. In a given clause, the accented syllable features a sudden drop in pitch, and following syllables continue to fall gradually in pitch, including particles and postpositions (even if they have level pitch in isolation). This is why we usually describe a fall in pitch as a global fall.

khneu-kea phaan1-rul-si:-so tho:-nu
/kʰ.nøˀ.kɨ ↘pʰan.rʉl.siː.so ‖ tʰoːˀ.nʉ/
"This dish was cooked with peanuts and coconut noodles."
  1. Notice that phaan is the accented syllable, yet the following three bound morphemes continue to fall in pitch until the end of the compound word.

In the example above, the fall in pitch takes place over the orange syllables. The first syllable is the location of the accent, and the following syllables continue to fall in pitch (including the rest of the word and its postpositions). The words khneu and tho:, despite being vowel-final, are laryngealized and thus do not feature low-falling pitch.


Old Souk allowed for virtually any consonant as well as some consonant clusters to exist in syllable coda. Due to the development of Old Souk as a more common, colloquial language, as well as the great influence of other local Austroasiatic languages, many of these coda phonemes merged with nasal consonants and unreleased stops. Words that underwent this merger developed a laryngealized sound, somewhat reminiscent of creaky voice or a glottal stop, which is pronounced just before the final consonant; most often syllables with this feature formerly had [h] in coda. This feature is known natively as tabla [tɑb.lɑˀ]; a word which, in Old Souk, would most likely have been pronounced [tɑhb.lɑhk].

Syllable structureEdit

Most words are monosyllabic. Syllables follow the form (S)CV(X)(F), where C is any consonant (including a glottal plosive), V is any vowel (long or short), X is an approximant /j/ or /w/, and F is a single final consonant (nasal, unreleased stop, [b], [ð̞] or [s]). S represents a sesquisyllable, which forms a sort of cluster with the initial consonant. There is a list of all coda consanants in the next section. Souk is a mora-timed language, which means that any sesquisyllable represents its own mora, and is thus pronounced for the same amount of time as the mora of the rest of the syllable. Sesquisyllables are permitted only at the beginning of a word; that is to say, a multisyllabic word may not have a sesquisyllable pattern on its second and third syllables and so on.


Main article: Kai-Souk Colonial Alphabet

The writing system of Souk utilizes many letters which represent different phonemes at different parts in a word or syllable, and many other such exceptions exist. As such, it would be counter-intuitive to present an ambiguous transliteration system; thus the Kai-Souk Colonial Alphabet, which is used to render most of the Kai-Souk languages into the Latin alphabet, attempts to present a straightforward representation of the sounds in the most common dialects.

This table represents the letters with their standard IPA values for Souk:

Vowels i ea u oo e eu ae o aa a
IPA i ɨ ʉ u e ø ɛ a ɑ
Consonants m n ng p ph t th k kh b d s h w y r l
Onset m n ŋ p t k ɓ ɗ s h w ɰ ɾ l
Coda m n ɴ b ð̞ s w j l

Being almost completely predictable, pitch accent and tabla are normally not indicated in romanization. If absolutely necessary, a drop in pitch can be represented with a grave accent (phàn) and tabla laryngealization with umlaut (tablä).


Souk grammar is relatively straightforward albeit unique in many ways. Nouns in Souk have no gender or number, and they do not decline. Their position in a sentence determines their function or case. Verbs have many suffixes and some infixes which determine their aspect, mood, and voice, and adjectives behave more like nouns; all this we will go over below.


Souk does not have literal adjectives, as in 'a sad boy' or 'an angry man'. Instead, there are nouns which represent quality, ie. өċ" mad (melancholy, sadness) or ɂƨƪȷ°ɞ" khloong (anger); we call these abstract nouns. To use an abstract noun as an adjective, you simply attach the genitive particles -ra or -sim to the abstract noun, which can appear before or after its head:

dekun mad-ra
boy melancholy-GEN
'a sad boy'
khloong-sim laaw
anger-GEN man
'an angry man'

Sentence structureEdit

The building-blocks of a sentence can be separated into five categories: the agent; its patient; the abstract verb-phrase, which treats the agent as an object or passivizes it; the literal verb-phrase, which denotes the action of the agent; and the dative or benefactive. Each category can be either a single word or a simple clause.

Consider the sentence,

mneus lea reas tam phu:n
'The laity donated food to the monks'

The above translation is the literal semantic meaning of the sentence. The last four words are indeed 'laity donate food monks'; however, the first word mneus 'inspire' is an abstract verb, and treats the agent of the clause like an object. The agent of the abstract verb is inherently unknown; it is said the abstract verb represents a realm of understanding beyond the physical, treating actions and agents as one with each other.

Statements without an abstract sound very informal and almost immature. It is uncommon for even colloquial speech to go without an abstract verb, except in brief two- or three-word utterances and gestures. However, the abstract verb-phrase can be replaced with a relative clause for the agent, which is common in more complex sentences:

thaes thim mae:n kea lea reas tam phu:n
Silence vow pronounce.PERF REL laity donate food monks
'The laity who have taken a vow of silence donated food to the monks'

In the above example, we can see that the abstract verb-phrase has been replaced by a relative clause. The relative clause works in the same way that the abstract verb-phrase normally works: it modifies the agent of the sentence, giving us more information about it. The dative/benefactive case is often ambiguous, as it can also be used in other ways, such as the following:

si yaa-yaaw me:m tho lu
hold politician choose.PERF wealth health
'The politicians chose wealth over health.'

In the above example, the dative acts as a NON-BENEFACTIVE; by placing a valent term such as 'wealth' before the dative, it becomes apparent in context that 'health' is not considered and cast aside; compare to

si yaa-yaaw me:m tho ni lu
hold politician choose.PERF wealth CONJ health
'The politicians chose wealth and/for health.'

We can see here that the conjunction ni binds 'health' and 'wealth' together, thus removing any contrast they would produce in simple juxtaposition. On another note, the verb si (hold, behold) has a meaning of possession or temptation when used in the abstract. It thus shows that the politicians were barely able to control themselves, as if possessed by greed entirely. Si is also rarely used in abstract form to refer to a single agent; thus we can understand that we are referring to several politicians, and not just one.

In an intransitive clause, there is normally no abstract verb, and instead the initial verb is literal. Consider,

me:m-na-laa phae:s mad-ra
choose-IPFV-NEG lady melancholy-GEN
'The sad lady could not choose.'

Here we can see that the initial (and only) verb is a literal verb. If we want to indicate the same abstract idea which would exist in a transitive clause, we can use an abstract noun in the dative case:

me:m-na-laa phae:s mad-ra mad
choose-IPFV-NEG lady melancholy-GEN melancholy
'Through melancholy, the sad lady could not choose.'

We can see that the dative case here is more of an instrumental or comitative. Now if we decide to make this sentence transitive (ie. we know what she could not choose), we will use the abstract verb sam, which means 'driven by melancholy' in the abstract and 'to be sad' in the literal:

sam phae:s mad-ra me:m-na-laa kho khing
sad.V lady melancholy-GEN choose-IPFV-NEG word speak
'Through melancholy, the sad lady could not choose words to say.'

Since the verb khing (speak) is in a position in the sentence where verbs do not normally appear, we can tell that the verb is used in its infinitive or nominal form.

Mood, aspect and voiceEdit

