A proposal for a Latin‑based script for the Persian language
Spring 2025
h.farroukh@yahoo.de
Contents
- Foreword
- Sounds and Letters
- Alphabet
- Vowels
- Consonants
- Gemination
- Glottal Stop (mul)
- Elision, Epenthesis, and Sound Shift
- Contraction
- Allomorphs
- Free Spelling of Certain Proper Nouns
- Hyphenation
- Solid, Spaced, and Hyphenated Forms
- Spaced Form
- Solid Form
- Hyphen
- Numbers
- Capitalization
- Foreign Words
- Latin‑based Scripts
- Other Scripts
- Abbreviations
- General
- Frequently Used Abbreviations
- Punctuation
- Period (.)
- Question Mark (?)
- Exclamation Mark (!)
- Comma (,)
- Semicolon (;)
- Colon (:)
- Quotation Marks (“ „)
- Hyphen (-)
- Dash (–)
- Ellipsis (…)
- Parentheses ( )
- Square Brackets [ ]
- Slash (/)
Foreword
While the Alefbā‑ye 2om project sought to generalize the standard Transcription procedure for Iranian toponymic items for Iranian toponyms for use as a parallel script, PĀRSIG is a stand‑alone proposal to replace the current Persian script. In doing so, it introduces carefully chosen departures from Alefbā‑ye 2om with the aim of improving readability and simplifying orthographic rules.
In addition to minor refinements to compounds, numbers, contractions, gemination, conventional letter names, and the treatment of foreign words, the following major changes were introduced (see the full explanation below):
- Replacement of š and ž with s̄ and z̄ to improve readability in running text and yield a more harmonious alphabet.
- Revision of the glottal-stop (mul) rules so that the apostrophe no longer needs to be tracked for positional shifts.
- Enclitics written spaced rather than hyphenated to improve readability.
Sounds and Letters
Alphabet
ALPH-1 Pārsig has 29 letters, each with lowercase and uppercase forms.
Using the macron in place of the caron (as done in Alefbā‑ye 2om improves legibility because the diacritic lies visually closer to its base letter. Applying macrons consistently to the base letters a, s, and z also yields a more harmonious alphabet. Although appropriate keyboard layouts with dedicated keys for s̄ and z̄ exist (and can be customized on mobile phones using apps like xkeyboard), dedicated single‑codepoint Unicode characters should ideally be adopted in the future.

Letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z ā s̄ z̄
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ā S̄ Z̄
ALPH-2 Where diacritics cannot reasonably be used (e.g., in a homepage URL or an email address), they may be omitted: sedā, seda; hams̄ahri, hamsahri; moz̄de, mozde.
ALPH-3 In alphabetical order, diacritic letters are collated with their base letters:
A (Ā), B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S (S̄), T, U, V, W, X, Y, Z (Z̄).
ALPH-4 Each letter has a conventional name (adapted from IPA2 Pársik):
am (ām), be, ci, de, em, fi, ge, he, im, je, ki, li, mi, ne, om, pi, qu, re, se (s̄e), ti, um, vu, dove, xe, ye, ze (z̄e).
Other Latin letters can occur for foreign words or proper names: André, Miró.
Vowels
PĀRSIG has six vowels and one diphthong:
Sound | Letter | Example |
---|---|---|
[æ] | a | abr, abzār, sag, dast |
[ɒː] | ā | āb, āftāb, bahār, rahā |
[e] | e | emruz, esfand, sedā, sāde |
[iː] | i | in, injā, bim, tehrāni |
[o] | o | ostād, ordu, boz, do |
[ow] | ow | owbās̄, jelow, peyrow, mowz |
[uː] | u | bu, dust, jāru, xubi |
VOW-1 w occurs only after o; ow is always monosyllabic.
VOW-2 Short vowels are a, e, o. Long vowels are ā, i, u, and the diphthong ow.
VOW-3 (iy-avoidance). Do not write y after i when the y would only reflect a glide between vowels. Write the sequence i + vowel directly: xiābān (¬xiyābān), xubi e (¬xubi ye), biābān (¬biyābān).
This is only done when an Arabic loanword has a geminated y pronounced iy (tahiye, vāqeiyat), see GEM-2.
Consonants
PĀRSIG has 22 consonants:
Sound | Letter | Example |
---|---|---|
[b] | b | baste, bāz, abr, tab |
[tʃ] | c | cerā, cand, āluce, gac |
[d] | d | dar, derāz, pedar, sard |
[f] | f | fanar, fer, sefid, kaf |
[g] | g | gāv, gaz, tagarg, sag |
[h] | h | hasti, hamin, rahā, rāh |
[ʤ] | j | jib, jāru, bāje, kaj |
[k] | k | kam, kenār, bikār, fandak |
[l] | l | leng, lagad, mālidan, kacal |
[m] | m | mu, mār, hamin, setam |
[n] | n | nāz, narm, benām, tan |
[p] | p | pās, pas, topol, gap |
[ɣ] | q | qam, qalam, raqam, duq |
[r] | r | rāst, raftan, sarā, tar |
[s] | s | sib, sābun, tasbit, kas |
[ʃ] | s̄ | s̄ab, s̄ib, nes̄astan, fars̄ |
[t] | t | tāb, tāze, otu, taxt |
[v] | v | vālā, vazes̄, navid, gāv |
[x] | x | xāb, xam, boxāri, paxme |
[j] | y | yek, yār, māye, ney |
[z] | z | zard, zohr, gozas̄t, rezāyat |
[ʒ] | z̄ | z̄arf, z̄āle, vāz̄e, kaz̄ |
Gemination
GEM-1 In Arabic loanwords, a consonant is sometimes geminated (tašdid). In PĀRSIG, a geminated consonant is written twice: mokarrar, tamannā, moaddel, ezzat.
GEM-2 If the geminated letter in the Persian script is ye and is pronounced iy, we write iy (adapted from Dabire): tahiye, vāqeiyat.
GEM-3 A final consonant is usually not geminated. Gemination is reflected according to pronunciation: xat, dastxat, xatti, xattāt.
Glottal Stop (mul)
The term mul (adapted from IPA2 Pársik) denotes either a brief pause or the glottal stop. In Arabic loanwords, the glottal stop is rarely pronounced in Persian; more often there is only a slight pause, which we render with an apostrophe (’).
- GS-1 Mul is not written at the beginning of a word: onoq, otāq, aziz.
- GS-2 Mul is not written between two vowel sounds—whether within a word or across morpheme/word boundaries: sāat, moallem, faāl, jāmee, tarh e jāme e behdās̄t, exterā e bozorg, now e digar, sariosseyr, tolu e xors̄id.
- GS-3 Mul is not written word-finally after a vowel: morabba, exterā, jāme, sari, hamnow, tolu.
- GS-4 Inflection, derivation, and compounding do not affect whether an apostrophe is present: onoq → badonoq; otāq → hamotāqi; aziz → Abdolaziz; eddeā → poreddeā; sari → sarian; exterā → exterāi.
- GS-5 Otherwise, mul is written according to pronunciation: s̄am’, ba’d, mas’ul.
This system sacrifices a bit of phonetic precision for consistency, which is sufficient for comprehension in context. Its main advantage is that a written apostrophe never shifts position—unlike earlier proposals that required tracking possible movements. As a point of comparison, German has many more glottal stops inside compounds without any special symbol, yet syllabification remains unproblematic. On rare occasions, a form may be ambiguous, but context resolves it (just as readers disambiguate homographs like s̄ir), e.g., tarh e jāme e behdās̄t vs. jāme ye kohne.
Elision, Epenthesis, and Sound Shift
ELI-1 Elision, epenthesis, and sound shift are written directly, i.e., the spelling follows the actual pronunciation. In other languages, elisions are sometimes marked with an apostrophe; here, however, the apostrophe is reserved for the glottal stop. An extra sign does not aid legibility, especially since sound omission occurs in limited, predictable contexts in Persian and is very frequent in colloquial usage. (Such marking would impact the readability of colloquial texts heavily. Spanish, another largely phonetic orthography, also omits such marking.)
Examples: natavān, natvān; mehrbān, mehrabān; peyrow, peyravi.
Contraction
CON-1 Contraction at the beginning or end of a word or enclitic is written directly. If a contracted word or enclitic ends up vowelless, it is hyphenated to the word it is read together with:
gar (agar), v‑in (va in), sedā‑m (sedā yam).
Allomorphs
ALLO-1 Some Persian words have allomorphs (variant pronunciations). Spelling follows pronunciation:
ces̄m, cas̄m; ju, juy; jelow, jolow, jelo, jolo.
ALLO-2 Colloquial allomorphs are also spelled as pronounced:
bārān, bārun; digar, dige.
Free Spelling of Certain Proper Nouns
PROP-1 The spelling of personal names, company names, brands, or product names may deviate from these rules where customary: Arash, Āras̄; Saipa, Sāypā.
Hyphenation
HYPH-1 When breaking a word at the end of a line, do not split a syllable:
xā‑ne; le‑bāshā; lebās‑hā.
Solid, Spaced, and Hyphenated Forms
Spaced Form
- FORM-S-1 Simple infinitives and verbs are written spaced in all compound infinitives and verbs. The non‑verbal component is treated independently. Experts often disagree on what counts as a compound verb; this rule removes the burden of deciding borderline cases:
yād gereftan, yād nagereftan, yād migiram, yād nemigiram; dar raftan, dar naraftan, dar miravam, dar nemiravam; gerdeham āmadan. - FORM-S-2 Proper names formed with the conjunctions e and o are written spaced, even if the name has only one stress:
Xiābān e Rudaki, Xāvar e Dur, Kas̄k o Bādemjān. - FORM-S-3 Enclitics (unstressed, phonetically bound words) are written spaced. Consequently, possessive/objective pronouns, short forms of budan (“to be”), the conjunctions e and o, the indefinite article i, and the exclamation ā are spaced. This helps locate the stress: zamān e vs. zamāne, ketāb i vs. ketābi:
dast e man, man o to, doxtar emān, pesar i, bidār im.
Using hyphens for enclitics as a general rule would introduce too many hyphens into running text and reduce readability. Spaces are also easier and more natural to type than hyphens; therefore, enclitics are generally spaced, not hyphenated.
- FORM-S-4 Compound prepositions with e are written spaced: zir e, kenār e, barāy e, bedun e.
Solid Form
- FORM-D-1 Affixes (inflectional and derivational) are written solid with the base:
guyand, miguyand, nemiguyand, daftarhā, āqāyān, bozorgtar, bālātarin, bozorgi, dānes̄, dānes̄mand, bimārestān, hamkār, benām, bāadab, bikār. - FORM-D-2 The default for compounds is solid unless otherwise specified in these rules:
ketābxāne, toxmemorq, goftogu, pākkon, barfpākkon, yāddās̄t, qulpeykar, qadboland, sarxorde, azxodgozas̄te, conin, conān, yekdigar, xis̄tan, ānce, injā, pasaz.
Hyphen
- FORM-H-1 Use hyphens in unfamiliar, occasion‑specific, or long compounds to aid parsing. Insert hyphens at natural sub‑compounds so that meaning remains clear:
raves̄ e sang‑dar‑miān, kam‑dardesarsāz, tāze‑be‑dowrān‑reside. - FORM-H-2 Hyphenate fixed expressions used as a single word ad hoc:
hamin man‑bemiram‑nāz‑nakonhā. - FORM-H-3 After a single letter or abbreviation used as a word (when inflected/derived/compounded), add a hyphen:
n‑om, g‑hā, p‑dār. - FORM-H-4 In compounds with co‑equal elements, or reduplications/semi‑reduplications, a hyphen may be used:
irāni‑ālmāni, rafte‑rafte, jurāb‑murāb.
Numbers
Given their special behavior, numbers are treated separately.
- NUM-1 Whole and fractional numbers written out are solid; they are joined by o:
davāzdah (12), bistose (23), nohsadocehelopanj (945), semilyunosisadhezār (3,300,000), sepanjom (3/5), bistosepanjom (23/5), bist o sepanjom (20 3/5), bist o dosadom (20.02), bistodosadom (0.22). - NUM-2 Compounds of numbers + words are solid:
panjruze, bistopanjsāle, sadhezārnafare, yāzdahdarsadi. - NUM-3 Approximate/estimating ranges use hyphens:
se‑cāhār nafar, do‑se dāne, haftād‑has̄tād tā, cehel‑panjāh darsad. - NUM-4 Digits + suffixes are solid; digits + words are hyphenated:
27om, 27i; 5‑ruze, 25‑sāle, 100,000‑nafare, 11‑darsadi. - NUM-5 Omitted digits are shown with an apostrophe:
sāl e 1393, sāl e ’93.
Capitalization
- CAP-1 Capitalize the first word of a sentence:
Man raftam. U goft: “Man raftam.” Xarid e xub i bud: Yek pirāhan o yek s̄alvār xaridam. - CAP-2 In titles and headlines, the first word is capitalized. Capitalizing the other main words is optional:
Havādes e emruz, Havādes e Emruz. - CAP-3 Forms of address and titles are capitalized before and after a personal name:
Ali Āqā, Āqā ye Bahrāmi, Maryam Xānom, Xānom e Panāhi, S̄āh Abbās, Rezā S̄āh, Karim Xān, Xāje Nasir. - CAP-4 Capitalize each main word in a multi‑word proper noun:
Sārā Panāhi, Tehrān, Sepidrud, S̄āhnāme, Sāzmān e Melal e Mottahed, Jang e Jahāni ye Dovvom, Zeres̄kpolow bā Morq. - CAP-5 Proper nouns are not capitalized in derivatives and compounds:
Tehrān, tehrāni; Xodā, xodās̄enās.
Foreign Words
Latin‑based Script
- FW-1 Loanwords are written according to Persian pronunciation:
pitzā, tāksi, rādio, oktobr, sigār. - FW-2 Names of persons, companies, brands, and products are written 1:1:
Wiliam Jones, BMW, New York Times, iPod, Windows. - FW-3 Foreign words not integrated into Persian usage are written in italics in their original spelling:
München; London.
Other Scripts
- FW-OS-1 Words are written according to Persian pronunciation:
moallem, cāy, Z̄āpon. - FW-OS-2 If a foreign word is neither a loanword nor a proper name/brand/product, write it in italics:
ader ka’san va nāvelhā, yo’refa mòmeno bessimā. - FW-OS-3 The Arabic elements abu and al are written solid with the following word. Where al undergoes assimilation, write it solid with the preceding word as well. The element ebn is written solid with the preceding word:
Abolqāsem, Abuali Sinā, Alerāqi, Nasireddin, Ziāolhaq, sariosseyr, Hoseynebn e Ali, Isabn e Ja’far, Ebn e Batute.
Abbreviations
General
- ABBR-1 If a word is abbreviated with capital letters, omit the final period. Otherwise, abbreviations end with a period:
Tehrān; TEH, Teh. - ABBR-2 All main elements should be reflected in the abbreviation of a phrase/compound. Compounds written in solid form count as one word. Otherwise, each word is abbreviated and the abbreviation is written solid:
cāhārrāh, cr.; hejri e s̄amsi, h.s̄. - ABBR-3 Case in dotted abbreviations follows their spelled‑out forms:
Dus̄ize Panāhi, Du. Panāhi; Tehrān, Teh.; cāhārrāh, cr.
Frequently Used Abbreviations
- Calendars:
x. (xors̄idi); h.x. (hejri e xors̄idi); h.s̄. (hejri e s̄amsi); h.q. (hejri e qamari); m. (milādi, pas az milād); p.m. (pis̄ az milād). - Iranian months:
FAR, far.; ORD, ord.; XOR, xor.; TIR, tir; MOR, mor.; S̄AH, s̄ah.; MEH, meh.; ĀBĀ, ābā.; ĀZA, āza.; DEY, dey; BAH, bah.; ESF, esf. - Christian months:
Z̄ĀN, z̄ān.; FEV, fev.; MĀR, mār.; ĀVR, āvr.; ME, me; Z̄UA, z̄ua.; Z̄UL, z̄ul.; UT, ut; SEP, sep.; OKT, okt.; NOV, nov.; DES, des. - Weekdays:
S̄A, s̄a.; YS̄, ys̄.; DS̄, ds̄.; SS̄, ss̄.; CS̄, cs̄.; PS̄, ps̄.; JO, jo. yā ĀD, ād. - Time:
bd. (bāmdād), 09:15 bd.; bz. (ba’dazzohr), 02:00 bz.
Punctuation
Period (.)
- PUNC-PER-1 A declarative sentence or indirect question ends with a period:
Man raftam. Az man porsid, cerā xos̄hāl am. - PUNC-PER-2 The decimal part of a number is separated by a period:
25.05, 2.735.
Question Mark (?)
- PUNC-QM-1 A question ends with a question mark:
Kojā raft? Cegune? Na?
Exclamation Mark (!)
- PUNC-EX-1 Use an exclamation mark after an exclamation, command, wish, or request:
Ce qas̄ang! Komak! Āfarin! Dar rā beband!
Comma (,)
A comma marks a short pause. Common uses include:
- PUNC-COM-1 Between a main and a subordinate clause:
Vaqt i resid, Mahin hanuz ānjā bud. - PUNC-COM-2 To replace the conjunctions va or yā:
Sobhhā boland mis̄avi, sobhāne at rā mixori va be dānes̄gāh miravi. Rāmin, Narges yā Mahin ham mitavānad komak at konad. - PUNC-COM-3 To include an apposition:
Rāmin, dust am, xeyli lāqar ast. - PUNC-COM-4 After a vocative noun:
Ey ostād e bozorgvār, az s̄omā mamnun am. Xod yā, s̄okr et. - PUNC-COM-5 Between a weekday and a date:
s̄anbe, 31om e farvardin e 1393; s̄a., 31 e far. e ’93. - PUNC-COM-6 Insert commas every three digits to the left for numbers with four or more digits:
2,000; 2,025.05; 2,000,000.
Semicolon (;)
- PUNC-SEM-1 Separates clauses more strongly than a comma but less than a period:
Mardom kār rā dust dārand; bedun e ān fekr mikonand, ke zende nistand. - PUNC-SEM-2 Separates elements of a series when the items include internal punctuation:
Dar in dānes̄gāh mitavān res̄tehā ye pezes̄ki o dandānpezes̄ki; hoquq, eqtesād va jāmees̄enāsi xānd.
Colon (:)
- PUNC-COL-1 Before direct speech:
Āmuzgār porsid: “Ke pāsox rā midānad?” - PUNC-COL-2 Before a complementary series separated by commas:
Mā be cand ciz niāz dārim: sālon, musiqi, qazā va nus̄idani. - PUNC-COL-3 Before a gloss/translation:
Tas̄xis: bāzs̄enāsi. - PUNC-COL-4 Before each utterance in a dialogue:
Navid: Cerā narafti? Narges: Vaqt nadās̄tam. - PUNC-COL-5 Before an explanation or reason:
S̄ab rā ānjā gozarāndam: Mās̄in am xarāb s̄ode bud va tàmirgāh baste bud. - PUNC-COL-6 Between hours and minutes:
07:45 bd.; 18:30.
Quotation Marks (“ „)
- PUNC-QUO-1 Enclose quotations and direct speech:
“Doruq bozorgtarin gonāh be s̄omār miraft.” Āmuzgār porsid: “Ke pāsox rā midānad?” - PUNC-QUO-2 Mark terms for commentary or first‑use emphasis:
Mahnāz fekr mikonad, ke to “afsorde” s̄odi. Morād e mā yek onsor e tarkibi’st, ke be ān “suratsāz” miguyand. Suratsāzhā bar do gune and …
Hyphen (-)
- PUNC-HYP-1 Replaces tā or az … tā in ranges:
Negāh konid be s. 21‑38. Dehxodā (1257‑1334); Sāāt e kār: s̄a.‑cs̄., 09:00‑18:00; Qatār e Tehrān‑Tabriz. - PUNC-HYP-2 Replaces be or bar between two numbers:
Esm e s̄axs 1‑1 neves̄te mis̄avad. Irān 2‑1 Koveyt rā s̄ekast dād. - PUNC-HYP-3 Avoid repetition using a hyphen:
jodā‑ va sarhamnevisi. - PUNC-HYP-4 Separates day‑month‑year in dates:
31‑01‑1993, 31‑01‑93.
Dash (–)
- PUNC-DASH-1 Marks a pause:
Hame tamām e ruz dar entezār budand – yekbāre āmad. - PUNC-DASH-2 Encloses an extended remark:
Emruz sobh – hanuz dās̄tam bā mādar am sobhāne mixordam – be man telefon kard. - PUNC-DASH-3 Indicates a speaker change:
– Be pedar gofti? – Bale. - PUNC-DASH-4 Precedes the author/source of a quotation:
“Doruq bozorgtarin gonāh be s̄omār miraft.” – Herodot
Ellipsis (…)
PUNC-ELL-1 Omit text that is not essential or easily inferred. A final period is not needed after an ellipsis at sentence end:
Pis̄nahād e Nasrin o … pazirofte s̄od. Bāzi e xub i bud … Fardā cekār mikoni?
Parentheses ( )
PUNC-PAR-1 Use parentheses for explanatory additions or alternatives. If a complete sentence is within parentheses, place a period before the closing parenthesis:
ru(y) ≈ ru yā ruy; In ketāb rā (moteassefāne) hanuz naxānde bud. Diruz be bāzār raftim. (Parvin ham āmade bud.)
Square Brackets [ ]
PUNC-SQ-1 Indicate a replacement option:
ce[a]s̄m ≈ ces̄m yā cas̄m.
Slash (/)
PUNC-SL-1 Express alternatives with a slash. If any alternative contains a space, pad the slash with spaces:
Vorudi e estaxr barāy e kudakān/bozorgsālān 8,000/12,000 Tumān ast. Darbāre ye safar be Āfriqā ye Jonubi / Keniā hanuz tasmim nagereftim.