Here is part of a news report in a mystery language:

Mystery language

Do you know or can you guess which language it is and where it’s spoken?

  1. Arabic script. Definitely not any of the Iranian languages, including Kurdish. Maybe some Turkic language. Not even a familiar Turkic language though. I go for Uyghur.
    The rightmost word in the second line is “jahandar” something, which means king or world-owner in Persian. The word after the comma in the third line is very probably a deformed version of Arabic “nahayat”, meaning “end”.
    Looks like the letter “ain” is somehow indicating a vowel, since it’s so unexpectedly frequent.

  2. Not Uyghur is as much as I can say. It’s missing some of the letters and also having some that don’t belong to the Uyghur alphabet. Does look like an alphabetic rather than an abjad system, though.

  3. This is quite a mystery. The script is definitely alphabetic, but just as definitely neither Kurdish nor Uyghur, and just as clearly not Mandarin in Xiaoerjing script. I haven’t been able to find anything online in references to Arabic-derived scripts that remotely resembles this, either.

    The ‘ains used as vowels plus hamzated u and i are very unusual, and the heavy use of letters that are normally Arabic-specific is very strange, seeing that in most languages using Arabic-based scripts, when they crop up they are a dead giveaway for loanwords from Arabic. Yet there is nothing here that is remotely recognisable as a word of Arabic origin. It definitely looks like some kind of recent invention (if it isn’t in fact some kind of code) since all Arabic-derived scripts I have seen start with Arabic spelling conventions and then tack on whatever additions and adjustments are needed to represent their particular language. This one seems to have started from scratch and gone about setting up an alphabetic script almost as if it were striving to keep itself as different from the original Arabic script as possible while still using Arabic letters.

    I’m going to try analysing this a bit to see what the distribution of the letters is: maybe that will help somewhat…

  4. I’m pretty sure it’s a Turkic language now. It seems there is some sort of mix-up in the way the letters are coded (or perhaps this is a deliberate cypher), which accounts for the weird distribution of letters with a high proportion of very unusual letter forms. I transliterated this into a Latinesque hybrid with various phonetic symbols thrown in to distinguish letters. For the most part, my letters correspond to more or less standard transliterations of the various Arabic letters found here, except for the following:

    For the ‘ain (ع), which is clearly functioning as a vowel here, I first tried ‘o’, which helped me see patterns, especially what seemed to be a preponderance of Turkic-like suffixes (-lVr [plural], dVn [dative?], -lVq=-lYk [-ity/-ness/-hood]), so I have changed it to a barred i (ɨ) which brings out the word shapes much better. For hamzated alif, ya and waw, I settled on umlauted a, i and u (ä,ï, ü) and for ya and waw I used the vowel equivalents i and u rather than the glide letters y and w.

    Here’s the provisional transcription:

    iuhan dïnüïr ẓamïrɨkar krïmnɨi jɨlgɨsɨdɨkɨ daθlɨq sudɨḍer, ẓð ẓäaɨnɨθ jahandarħɨlɨq ṣrɨnsɨṣɨnɨ mðndaq bɨr nɨr neħħe nðqtɨga iɨgɨnħaqlaidð: bïṣïn qɨlalaqdɨgan ẓɨšnɨ ẓetɨḍe qaldðrsaθ, nahaiɨtɨ qɨzɨqarlɨq netɨjɨḍe ẓïrɨšɨsen ẓerzɨidɨgɨnɨ del bašqɨlarnɨθ quledɨn kelmeidɨgan ẓɨšlardðr. ẓeḍer xeqnɨθ ṣðlɨdɨn ṣaidɨlɨnɨṣ ṣðl tïṣɨš ẓɨmkanɨiɨtɨ bulsɨla, herḍɨzmð iïnɨθdɨkɨ ṣšlnɨ xejlɨme

    It’s clear that the edh (ð) must correspond to a vowel of some sort. I leave the rest to other Omniglottists to play around with deciphering the rest of the cyphered letters. As I mentioned, I think it’s very clear now this is some Turkic language, not in its regular orthography, but either deliberately or unintentionally put into a cypher. As for me, I have to get off to take advantage of the IKEA winter sale before it ends this evening, haha!

    Good luck fellow OG-ists!

  5. The Arabic script is probably used as an alphabet, i.e. with vowels marked by letters instead by vocalisation. I’ll guess that vowels are represented by ain, alif, waw, ya, dhal and ta marbuta. Hamza is quite frequent and may distinguish short and long vowels. Under such hypothesis, a transliteration into Latin alphabet may look like this:

    iuhan dínúír żámíreka krénei jelgesedeke daþeq sudeđor, ży żázeneþ jahandarħeleq śre?seśene myndaq ber noħħo nyqtega iegenxaqalaidy:
    ?đ?n qelalaidegan żešne żoteđo qaldyrsáþ, nahaiete qezeqarleq notejeđo żírešeson.
    bašqlar qelelaidegan żešlarga qul teqeś żaúaro bu?a, x?ke senaś bíqešqa żorzeidegene dol bašqelarneþ quleden ko?oidegan żešlardyr.
    żođor xoqneþ śyleden śaideleneś śyl tíśeš żemkaneiete bulsela, horđezmy iýneþdeke śylne xojlemo.

    After writing this, I am likely to reconsider that ya represents a vowel, but am also too lazy to rewrite it.

    It has some Turkic feeling because of long words and repeated suffixes (especially the Turkic plural “-lar”), even if an obvious vowel harmony isn’t detectable, but it may be due to imperfection of either original writing system, or my transliteration. Is it some variant of Tatar?

  6. Oops. Christopher Miller has provided the transcription before me, so my effort is useless.

  7. On a hunch that since this is obviously an off version of an alphabetic Arabic-derived script, therefore one of the Central Asian Turkic languages and most likely Uighur, I replaced the dotted zs (ẓ) with w, the edhs (ð) with schwa (ə), and the thetas (θ) with ‘ng’, a syllable-final consonant in Uyghur and a couple of other central Asian Turkic languages. This got me wə (which would correspond to Turkish va ‘and’) and several words that when I google them, inevitably bring up pages in – you may have guessed it by now – Uighur: dangliq, nahayiti, qiziqarliq ‘funny’, basqilarning (the result for bašqilarning), kelmeydigan, xeqning, and bulsila. If you change məndaq to mundaq, qaldərsang to qaldursang and nəqtiga to noqtiga, those also get quite a few Uighur hits. Also, bir is ‘one’ in many Turkic languages. Despite my hunch about wə, nothing turns up with any of the words I have a w in, and only when I replace ə with another vowel do I get results. Also, none of the words I have a z in bring up any hits, so the z must be some other segment (vowel or consonant unclear to me so far…).

  8. BTW, for prase:

    Your transcription wasn’t useless, in fact it showed me that I had made a few mistakes in mine, which I have now corrected! Plus it helped confirm we are both on the right path! Cheers and Happy New Year!

  9. Hello,
  10. I have tried a bit googling and agree with Christopher that Uyghur seems to be a reasonable hypothesis. I was a bit surprised that bashqilar probably means “others” in Uyghur, I expected to have something common with Turkish baş = head. However I wonder why it isn’t written with the standard orthography. I hesitate to believe that it is deliberately encrypted to make the quiz more difficult.

  11. I’m not familiar with any language that use this style of written alphabet so i’m going to go out on a limb and say persian or possibly some dialect of it like farsi or dari?

  12. My hunch is that this was a plain old problem with text encoding and display. Just like I sometimes see French or Spanish or German accented characters pop up as gobbledygook in my browser – no matter how much I play around with browser encoding settings – or, because of the encoding I have active, some letters in English will occasionally display as Chinese characters (!). Uyghur, because of its very unusual vowel letters that really sit on the periphery of the Arabic encoding space, is probably particularly vulnerable to being displayed incorrectly by a browser with an incompatible encoding setting, the end result being Arabic script gobbledygook. A Latin script analogue would be problems displaying a language with rare characters properly, such as Latvian, Turkish, Vietnamese or Yoruba.

    Poor Simon, I feel sorry for you being tripped up by technical problems two weeks in a row! Let’s hope the third time’s a charm…!

  13. The language is Uyghur (Уйғур /ئۇيغۇر) which is spoken mainly in the Xinjiang Uyghur Autonomous Region of China.

    The text comes from China Radio International. Maybe it was already not displaying correctly when I copied it from there – I had to copy it from the source code as it wouldn’t copy from the page itself.

  14. It looks like this is a problem specific to the CRI Uyghur website. The gif headers are all in correct Uyghur text but none of the HTML text is: it’s all filled with ‘ains for vowels and thaa’s for what should be a triple dotted kaaf (-ng). I tried the Unicode “What is Unicode” page in Uyghur: no problem there, and no problem at any of the other Uighur script web sites I looked at. Looks like the Chinese state media sure aren’t doing a good job at getting their message across, at least at this particular website!

  15. I am sorry but I can’t agree that it is Uyghur. The text is unreadable in any language. In this text, e.g., there are Arabic characters (`ayn, za emphatic) which are never used in Uyghur. I know enough of Shinjiang Uyghur and can read it. If you like I can supply you with a text in Uyghur.

  16. For Podolsky:

    I and others had the same reaction as you at first but then figured out what was going on. First thing you need to do is to click on Simon’s link to CRI International to see the precise text on their Uyghur language page that he copied and posted for this quiz: it’s the first large block at top left. Then you can follow how we figured out that is beyond doubt Uyghur – as would be expected for something appearing on an Uyghur language web page – but rendered nearly unrecognisable by a mess-up that left most letters misencoded (hence the ains and emphatic consonants everywhere among other things). You can follow the steps of reasoning that showed this is misencoded Uyghur by reading the comments through, especially the following numbers:

    2, 5, 6-8, 10, 15 and 17

