Help:Multilingual support
<templatestyles src="Module:Hatnote/styles.css"></templatestyles>
Articles on the English Wikipedia may contain words or texts written in different languages and scripts. To be able to correctly view and edit these articles requires that you have the appropriate fonts installed and to have correctly configured your operating system and browser. This guide will help you to do so.
Contents
- 1 Overview
- 2 Scripts
- 2.1 Armenian
- 2.2 Avestan
- 2.3 Canadian Aboriginal Syllabics
- 2.4 Cherokee
- 2.5 Coptic
- 2.6 Cuneiform
- 2.7 Deseret
- 2.8 East Asian
- 2.9 Ethiopic
- 2.10 Gothic
- 2.11 Indic
- 2.12 Lisu (Fraser alphabet)
- 2.13 Old Persian cuneiform
- 2.14 Runes
- 2.15 Sutton SignWriting
- 2.16 Syriac/Aramaic script
- 2.17 Tifinagh script
- 2.18 South East Asian
- 3 Special cases
- 4 See also
- 5 Notes
- 6 External links
Overview
Unicode
Articles on Wikipedia are encoded using Unicode (specifically UTF-8)[1], an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Because UTF-8 is backwards compatible with ASCII, and most modern browsers have at least basic Unicode support, most users will experience little difficulty reading and editing Wikipedia.
For older browsers, MediaWiki, the Wikipedia software, serves the wikitext in a safe mode upon editing. Characters that cannot be represented in ASCII are temporarily converted to hexadecimal character references, looking like ሴ. Existing hexadecimal character references get an additional leading zero so they are not converted to actual characters when the page is saved, and look like ሴ. Likewise, to create a hexadecimal character reference in safe mode, not the character itself, a leading zero should be added. One can check whether safe mode is used by editing this section. If M looks like M rather than M, safe mode is used.
Font
Most computers with Microsoft Windows, Apple's OS X and many Linux variants will already have fonts with support for Latin, Greek, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean and the International Phonetic Alphabet installed. Many mobile devices, such as the iPhone and iPad also include such fonts. Several historic and accented characters (used in the transliteration of foreign scripts) may be missing, though.
Microsoft fonts
Font | Included with | Scripts | Description |
---|---|---|---|
Arial Unicode MS [1] |
|
Western, Japanese, Hangul, Johab, Big5, GB 2312, Hebrew, Arabic, Greek, Turkish, Baltic, Central European, Celtic, Cyrillic, Thai and Vietnamese | Supports a wide number of scripts, but is of a slightly lower quality than Arial because it lacks kerning and is not smoothed. Contains a minor bug that causes double-wide diacritics to be placed on the wrong characters. |
Lucida Sans Unicode [2] |
|
Western, Hebrew, Greek, Turkish, Baltic, Central European, Cyrillic | Has a much smaller character repertoire than that of Arial Unicode MS, but is more legible. |
Tahoma [3] |
|
Western, Hebrew, Arabic, Greek, Turkish, Baltic, Central European, Celtic, Cyrillic, Thai and Vietnamese | Has a much smaller character repertoire than that of Arial Unicode MS, but is more legible, especially (according to Meta) in terms of Arabic and Persian characters. |
Microsoft Sans Serif [4] Not to be confused with MS Sans Serif |
|
Western, Hebrew, Arabic, Greek, Turkish, Celtic, Baltic, Central European, Cyrillic, Thai, Vietnamese | Has better support for historical and accented Latin characters. |
Other available unicode fonts
Bolded fonts are recommended.
Font | Typeface | License | Format | Encoding |
---|---|---|---|---|
Aboriginal | Sans-serif, Serif | Freeware | OpenType | Unicode 5.2 |
Charis SIL | Serif | Open Source | OpenType | Unicode 5.1 |
Code2002 Archive copy at the Wayback Machine | Freeware (must not be altered) | TrueType | Unicode, plane 2 | |
Code2001 0.919 Archive copy at the Wayback Machine | Freeware (must not be altered) | TrueType | Unicode, plane 1 | |
Code2000 1.171 | Sans-serif | Shareware (unrestricted) | TrueType | Unicode, plane 0 |
DejaVu (free font) | Sans-serif, Sans-mono, Serif | Open Source | OpenType | Unicode 5.1 |
Doulos SIL | Serif | Open Source | OpenType | Unicode 5.1 |
Everson Mono 3.2b4 | Sans-mono | Shareware | TrueType | Unicode |
Fonts for Ancient Scripts (Greek, Egyptian, cuneiform...) | Varying | No license, but may be used for any purpose | TrueType | Unicode |
Google Noto (Project to support all Unicode scripts) | Sans-serif, Serif | Open Source | OpenType | Unicode 6.2 |
Hanazono (80,000+ Chinese characters supported) | Ming (comparable to serifed typefaces) | Freeware (unrestricted) | TrueType | Unicode |
TITUS Cyberbit Basic | Serif | Non-commercial | TrueType, but requires Windows to install | Unicode 4.0 |
Quivira | Serif | Freeware | OpenType | Unicode 7.0 |
Browsers
- Internet Explorer
- supports Latin (however not all extended sets), Greek, Cyrillic, Arabic and Hebrew. Support for East Asian and some Indic scripts is available if support for this has been installed for Windows. As Internet Explorer will only use the default font for other scripts, those are usually not supported (unless the default font does).
- Firefox
- tries to render any character using all the fonts available on the system so multilingual support is generally good. The default rendering engine does not support complex script rendering, however. Some Linux distributions ship with a Pango-based rendering engine which does, this may currently cause some display glitches with justified text, though.
- Opera
- tries to render any character using all the fonts available on the system so multilingual support is also good.[2] Opera uses the operating system to perform contextual glyph selection, ligature forming, character stacking, combining character support and other character shaping tasks.[3]
- Chrome
- Does not support the languages of India, but otherwise renders many characters. Renders Sinhala, Gurmukhi, and Tibetan scripts in the examples below, but not Devanagari (used for Hindi), Bengali, or any of the other official languages of India.
Scripts
Armenian
The Armenian alphabet is only used to write the Armenian language. It is supported by the following fonts:
- DejaVu Sans
- Noto Sans Armenian (direct download link), a font made by Google.
- Noto Serif Armenian (direct download link), the serifed version of the font made by Google.
- Segoe UI (Microsoft Windows font, available in Windows 7 and later, but only supports Armenian since Windows 8)
- Sylfaen (Microsoft Windows font, available in Windows 2000 and later)
- Times LatArm
Correct rendering | Your computer |
---|---|
![]() |
Հայաստան |
Avestan
The Avestan alphabet is used to write the Avestan language. It is supported by the following fonts:
- Ahuramazda
- Noto Sans Avestan (direct download link), a font made by Google.
Correct rendering | Your computer |
---|---|
![]() |
𐬯𐬭𐬀𐬊𐬔𐬁 |
Canadian Aboriginal Syllabics
Canadian Aboriginal syllabics are an abugida used to write a number of First Nations languages in Canada, including Cree, Ojibwe, Naskapi, Inuktitut, Blackfoot, Sayisi, and Carrier. It is supported by the following fonts:
- Aboriginal Sans (See above)
- Code2000 (See above)
- Euphemia (Microsoft Windows font, available in Windows Vista and later)
- Noto Sans Canadian Aboriginal, a font made by Google.
Correct rendering | Your computer |
---|---|
![]() |
ᓀᐦᐃᔭᐍᐏᐣ |
Cherokee
Cherokee is supported by the following fonts:
- Cherokee Digohweli, from LanguageGeek
- Noto Sans Cherokee (direct download link), a font made by Google.
- Plantagenet Cherokee (Microsoft Windows font, available in Windows Vista and later)
Correct rendering | Your computer |
---|---|
![]() |
ᎠᏂᏴᏫᏯ |
Coptic
The Coptic alphabet is used to write Coptic, the language used in Egypt before Arabic. It is currently used solely as a liturgical language, and is supported by the following fonts:
- Alphabetum is a commercial unicode font, but it is the only font that provides Bohairic Coptic letters rather than Sahidic.
- GNU FreeSerif
- Noto Sans Coptic (direct download link), a font made by Google.
- Segoe UI Symbol (Microsoft Windows font, available in Windows 7 and later)
- Quivira: Use this for the best Coptic letter/ word spacing and sizing. It provides full Unicode support for all Coptic letters.
Correct rendering | Your computer |
---|---|
![]() |
ⲙⲛⲧⲣⲙⲛⲕⲏⲙⲉ |
Cuneiform
The cuneiform script was primarily used to write Akkadian and Sumerian (including Assyrian and Babylonian). It is supported by the following fonts:
- Noto Sans Cuneiform (direct download link), a font made by Google.
- Segoe UI Historic (Microsoft Windows font, available in Windows 10 and later)
- Unicode Fonts for Oracc: Cuneiform Fonts offers several different cuneiform fonts.
Correct rendering | Your computer |
---|---|
![]() |
𒅎𒀝𒂵𒌈 |
Deseret
Deseret is supported by the following fonts:
- "Bee" Serif fonts
- "Bee" Sans Serif fonts
- Noto Sans Deseret (direct download link), a font made by Google.
- Segoe UI Symbol (Microsoft Windows font, available in Windows 7 and later)
Correct rendering | Your computer |
---|---|
![]() |
𐐔𐐯𐑅𐐨𐑉𐐯𐐻 𐐈𐑊𐑁𐐩𐐺𐐯𐐻 |
East Asian
<templatestyles src="Module:Hatnote/styles.css"></templatestyles>
Script | Correct rendering | Your computer |
---|---|---|
Traditional Chinese | ![]() |
人人生來自由, |
Simplified Chinese | ![]() |
人人生来自由, |
Japanese | ![]() |
すべての人間は、生まれながらにして自由であり、 |
Korean | ![]() |
모든 인간은 태어날 때부터 |
Ethiopic
<templatestyles src="Module:Hatnote/styles.css"></templatestyles>
The Ethiopic syllabary is used in central east Africa for Amharic, Bilen, Oromo, Tigré, Tigrinya, and other languages. It evolved from the script for classical Ge'ez, which is now strictly a liturgical language. It is supported by the following fonts:
- Abyssinica SIL
- Ethiopia Jiret
- Everson Mono
- Noto Sans Ethiopic (direct download link), a font made by Google.
- Nyala (Microsoft Windows font, available in Windows Vista and later)
- TITUS Cyberbit (direct download link)
Correct rendering | Your computer |
---|---|
![]() |
ኢትዮጵያ |
Gothic
The Gothic alphabet is supported by the following fonts:
- Cardo
- MPH 2B Damase
- Noto Sans Gothic (direct download link), a font made by Google.
- Robert Pfeffer’s Midjungards
- Robert Pfeffer’s Pfeffer Mediæval
- Robert Pfeffer’s Silubr
- Robert Pfeffer’s Skeirs
- Robert Pfeffer’s Ulfilas
- Segoe UI Historic (Microsoft Windows font, available in Windows 10 and later)
- Segoe UI Symbol (Microsoft Windows font, available in Windows 7 and later)
- Vulcanius
Correct rendering | Your computer |
---|---|
![]() |
𐌲𐌿𐍄𐌹𐍃𐌺 |
Indic
<templatestyles src="Module:Hatnote/styles.css"></templatestyles>
The following table compares how a correctly enabled computer would render the following scripts with how your computer renders them:
Script | Correct rendering | Your computer | Help page |
---|---|---|---|
Bengali | ![]() |
ক + ি → কি | Wikipedia:Bangla script display help |
Devanāgarī | ![]() |
क + ि → कि | Template:Devfonthelp |
Gujarati | ![]() |
ક + િ → કિ | |
Gurmukhī | ![]() |
ਕ + ਿ → ਕਿ | |
Kannada | ![]() |
ಕ + ಿ → ಕಿ | |
Malayalam | ![]() |
ക + െ → കെ | |
Oriya | ![]() |
କ + େ → କେ | |
Sinhala | ![]() |
ඵ + ේ → ඵේ | |
Tibetan | ![]() |
ར + ྐ + ྱ → རྐྱ | |
Tamil | ![]() |
க + ே → கே | |
Telugu | ![]() |
య + ీ → యీ |
Lisu (Fraser alphabet)
The Fraser alphabet is used only to write the Lisu language. It is supported by the following fonts:
- DejaVu Sans
- Noto Sans Lisu (direct download link), a font made by Google.
- Segoe UI (Microsoft Windows font, available in Windows 7 and later, but only supports Lisu since Windows 8)
Correct rendering | Your computer |
---|---|
![]() |
ꓛꓬꓹ ꓡꓯꓺ ꓡꓯꓺ |
Old Persian cuneiform
The Old Persian cuneiform script was used to write the Old Persian language. The script is encoded in block "Old Persian", code points 103A0–103DF (Unicode.org chart). It is supported by the following fonts:
- Aegean (free font)
- Noto Sans Old Persian (direct download link), a font made by Google.
- Segoe UI Historic (Microsoft Windows font, available in Windows 10 and later)
Correct rendering | Your computer | Transliteration |
---|---|---|
![]() |
𐎣𐎲𐎢𐎪𐎡𐎹 | Kambujiya (Cambyses II) |
Runes
Runes are supported by the following fonts:
- Junicode, a free font mostly for Medieval scripts.
- Noto Sans Runic (direct download link), a font made by Google.
- Segoe UI Historic (Microsoft Windows font, available in Windows 10 and later)
- Segoe UI Symbol (Microsoft Windows font, available in Windows 7 and later)
Script | Correct rendering | Your computer |
---|---|---|
Elder Futhark (2nd to 8th centuries) | ![]() |
ᚠᚢᚦᚨᚱᚲ |
Anglo-Saxon runes (5th to 11th centuries) | ![]() |
ᚠᚢᚦᚩᚱᚳ |
Medieval runes (12th to 15th centuries) | ![]() |
ᚠᚢᚧᛆᚱᚴ |
Sutton SignWriting
Sutton SignWriting is used to write any Sign language. It is supported with the SignWriting 2010 Typeface which includes 2 TrueType fonts:
- SignWriting 2010 Fonts project on GitHub
- SignWriting 2010 TrueType Font and SignWriting 2010 Filling TrueType Font (direct downloads)
Correct rendering | Your computer |
---|---|
File:SignWriting-render-string.png | 𝧪𝪞𝪨 𝠀𝪛𝪩 𝠀𝪛𝪡 𝧪𝪤 |
Syriac/Aramaic script
Syriac and Aramaic scripts like most Semitic scripts flow from right-to-left which can cause letters to appear in the wrong order. The tag {{rtl-lang}} fixes this issue.
Most operating systems provide support for Syriac scripts natively[citation needed], however only the Madnḥāyā variety (ܡܕܢܚܝܐ) is rendered correctly. In order to render the Serṭā (ܣܪܛܐ) and Estrangelo (ܐܣܛܪܢܓܠܐ) varieties, additional fonts are needed. These scripts are supported by the following fonts:
- Aramaic Fonts A large selection of free Aramaic TrueType fonts.
- Estrangelo Edessa (Microsoft Windows font, available in Windows XP and later)
- Meltho OpenType™ Syriac Fonts (free font)
- Noto Sans Syraic Eastern, Syraic Estrangela, and Syraic Western (direct download links). All three are made by Google.
Script | Correct rendering | Your computer |
---|---|---|
Madnḥāyā | ![]() |
ܒܪܹܝܼܫܝܼܬ݀ ܐܝܼܬ݂ܲܘܗ݇ܝ ܗ݇ܘܵܐ ܡܹܠܬܵ݀ܐ. |
Serṭā | ![]() |
ܒ݁ܪܺܝܫܺܝܬܼ ܐܻܝܬܼܰܘܗ̱ܝ ܗ̱ܘܳܐ ܡܶܠܬܼܳܐ. |
Estrangelo | ![]() |
ܒܪܝܫܝܬ ܐܝܬܘܗܝ ܗܘܐ ܡܠܬܐ. |
Tifinagh script
The Tifinagh alphabet is used to write the Berber languages. IRCAM (Institut Royal de la Culture Amazighe) has a software suite developed for Windows XP that contains a Tifinagh keyboard and a font available for download here. The script is supported by the following fonts:
- Afus Deg Wfus
- Code2000
- DejaVu Fonts
- Ebrima (Microsoft Windows font, available in Windows 7 and later)
- Fixedsys Excelsior (a stylized ornamental font, not recommended for running text)
- Hapax Berbère
- MPH 2B Damase
- Tifinaghe-Ircam Unicode
Correct rendering | Your computer |
---|---|
![]() |
ⵜⵉⴼⵉⵏⴰⵖ |
South East Asian
Balinese
The Balinese script is used to write the Balinese language. The script is encoded in block "Balinese", code points 1B00–1B7F (Unicode.org chart). It is supported by the following fonts:
- Aksara Bali (free OpenType font with keyboard driver)
- Noto Sans Balinese (direct download link), a font made by Google.
Correct rendering | ![]() |
---|---|
Your computer | ᭚ᬲ᭄ᬯᬲ᭄ᬢᬶᬧ᭄ᬭᬧ᭄ᬢᬶᬭᬶᬂᬯᬶᬓᬶᬧᬾᬤᬶᬳᬩᬲᬩᬮᬶ᭟ |
Transliteration | Swasti Prapti ring Wikipédia Basa Bali |
Batak
The Batak alphabet is used to write the Batak languages. It is supported by the following fonts:
- Batak Unicode
- Noto Sans Batak (direct download link), a font made by Google.
- Pangururan
- Prada (direct download link)
Correct rendering | Your computer |
---|---|
![]() |
ᯀᯂ᯲ᯘᯒ |
Burmese
<templatestyles src="Module:Hatnote/styles.css"></templatestyles>
The Burmese alphabet is used to write the Burmese language. The script is encoded in block "Myanmar", code points 1000-109F (Unicode.org chart). It is supported by the follow fonts:
- Myanmar2
- Myanmar3 (also available from BBCs website)
- Myanmar Census
- Myanmar Text (Microsoft Windows font, available in Windows 8 and later)
- Padauk (supports Graphite)
- Parabaik
- Parabaik Sans
- WinUni Innwa
Correct rendering | Your computer |
---|---|
![]() |
ဃ + ြ → ဃြ |
Javanese
The Javanese script is used to write the Javanese language. It is supported by Unicode 5.2 and above. The script is a so-called SIL Graphite-script, and is best supported by Firefox. As of recently however, it can be rendered by the OpenType and TrueType standards, provided the right font is used. The script is supported by the following fonts:
- Adjisaka (direct download link), a free TrueType font.
- JG Aksara Jawa (direct download link) NOT RECOMMENDED: It uses code points from other languages and thus will cause other languages to render incorrectly.
- Noto Sans Javanese (direct download link), a font made by Google.
- Javanese Text (Microsoft Windows font, available in Windows 8 and later)
- Tuladha Jejeg, a free SIL Graphite font.
- Prada (direct download link)
Correct rendering | ![]() |
|
---|---|---|
Your computer | ꧋ꦱꦸꦒꦼꦁꦫꦮꦸꦃꦮꦺꦤ꧀ꦠꦼꦤ꧀ꦲꦶꦁꦮꦶꦏꦶꦥꦺꦝꦶꦪꦃꦗꦮꦶ꧉ | |
Transliteration | Sugeng Rawuh Wonten ing Wikipédia Jawi |
Lontara
The Lontara script is used to write the Buginese, Makassarese, and Mandar language. The script is encoded in block "Buginese", code points 1A00–1A1F (Unicode.org chart). It is supported by the following fonts:
- Code2000
- Leelawadee UI, note that Leelawadee does not support the Lontara script, only the UI version does. Microsoft Windows font, available in Windows 8 and later.
- MPH 2B Damase (direct download link)
- Saweri
- Prada (direct download link)
Correct rendering | Your computer | Transliteration |
---|---|---|
![]() |
ᨅᨔ ᨕᨘᨁᨗ | Basa Ugi |
Old Tagalog/Baybayin
Baybayin (also known as the Tagalog script in Unicode and Alibata) is a form of pre-Spanish Philippine writing system in which modern minority scripts in the Philippines has descended. It is supported by the following fonts:
- Noto Sans Tagalog (direct download link), a font made by Google.
- Paul Morrow's Baybayin Fonts. Offers the most extensive list of Baybayin fonts for Windows and Macintosh operating systems.
- PUKBL is a free unicode font support which defines own assignment of Baybayin alphabet to a normal keyboard. Available for Windows and Linux users.
- Quivira is a proportional serif font that produces very readable text. Supports several scripts, among them the Babayin script.
Correct rendering | Your computer |
---|---|
![]() |
ᜀᜅ᜔ ᜊᜏᜆ᜔ ᜆᜂ ᜀᜌ᜔ ᜁᜐᜒᜈᜒᜎᜅ᜔ ᜈ ᜋᜌ᜔ ᜃᜍᜉᜆᜈ᜔, |
Sundanese
The Sundanese script is used to write the Sundanese language. The script is encoded in block "Sundanese", code points 1B80–1BBF (Unicode.org chart). It is supported by the following fonts:
- Noto Sans Sundanese (direct download link), a font made by Google.
- Sundanese Unicode (direct download link) (free font)
- Prada (direct download link)
Special cases
Esperanto
In edit box | In database and output |
---|---|
S | S |
Sx | Ŝ |
Sxx | Sx |
Sxxx | Ŝx |
Sxxxx | Sxx |
Sxxxxx | Ŝxx |
Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.
The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.
For example, the interlanguage link [[en:Luxury car]] to en:Luxury car has to be entered in the edit box as [[en:Luxxury car]] on eo:. This has caused problems with interwiki update bots in the past.
Romanian
The Romanian alphabet contains an S-comma (Ș ș) and T-comma (Ț ț). These characters were added to Unicode 3.0 at the request of the Romanian standardization institute. As font support for these characters has been poor in the past, many computer users use the similar characters S-cedilla (Ş ş) and T-cedilla (Ţ ţ) instead. However, on Wikipedia it is recommended to use the correct characters with comma below.
See also
- Help:Multilingual support (East Asian)
- Help:Multilingual support (Indic)
- Help:Multilingual support for Android
- Help:Special characters
- Wikipedia:Amharic
- Wikipedia:Bangla script display help
- Wikipedia:Gothic Keyboarding
- Wikipedia:Gothic Unicode Fonts
- Wikipedia:Kannada support
- Help:Sinhala Font Guide
- List of typefaces included with Microsoft Windows
Notes
<templatestyles src="Reflist/styles.css" />
Cite error: Invalid <references>
tag; parameter "group" is allowed only.
<references />
, or <references group="..." />
External links
- Alan Wood’s Unicode Resources: Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications
- SIL International: Computers and Writing Systems
- Unicode Font Guide For Free/Libre Open Source Operating Systems
- WAZU JAPAN's Gallery of Unicode Fonts
as:সহায়:Contents zh-min-nan:Help:Án-chóaⁿ tha̍k dv:ކޮންޕީޓަރުން ތާނަ ލިޔެކިޔުމަށް މަގެއް bpy:উইকিপিডিয়া:BN/AS/BPY script display help mr:सहाय्य:Setup For Devanagari ja:Help:特殊文字 pa:ਮਦਦ:Set up for Gurmukhi sa:सहाय्यम्:Setup For Devanagari सम्पादन tl:Tulong:Tukod para sa maraming wika
- ↑ Until June 2005, when MediaWiki 1.5 came into use on the Wikimedia projects, articles on the English Wikipedia were encoded using ISO/IEC 8859-1 (although the additional characters from the Windows-1252 character set were used in practice.) All characters from the ISO/IEC 10646 Universal Character Set could be accessed through numerical entities, as specified by the HTML 4.01 specification. Since, nearly all pages have been converted to use Unicode directly. Old discussion on the topic can be read at Wikipedia talk:Unicode.
- ↑ http://www.opera.com/support/kb/view/435/
- ↑ http://www.opera.com/docs/specs/#text