Unleash Your Multilingual Mac

How to Read/Write Languages Other Than English on your Mac
by Tom Gewecke (tom at bluesky dot org)
Updated 12/18/2012

Blog

Introduction

One of the best-kept secrets about MacOS is the built-in support it contains for reading and writing languages beyond English, including ones that use non-Latin scripts and characters. This document explains these capabilities and provides various resources to help users exploit them to the maximum degree possible. Comments and additions from readers are most welcome.

In addition, readers may find it useful to consult my blog, where I try to post info on current developments in this area.

Basic Apple documentation can be found in the Help menu of the Finder if you put "languages" in the Question box.

Apple sponsors general user-to-user Support Communities in English, Japanese, Korean, and Chinese.

The place to ask for new language features and bug fixes is the feedback channel.


OS X 10.8 Mountain Lion

These comments are based on OS X 10.8, first issued 7/25/12. Email me if you would like to see a similar text for an earlier version of OS X. For info on OS 9, ask for the page on OS X 10.4, which was the last to support running that.

Mountain Lion adds only a few new language features to what is available in Lion. See this page and this one for a summary.

Localization

OS X offers the choice of 30 system languages out of the box -- English, Japanese, French, German, Spanish, Italian, Dutch, Swedish, Danish, Norwegian, Finnish, Traditional Chinese, Simplified Chinese, Korean, Brazilian Portuguese, European Portuguese, Russian, Polish, Arabic, Czech, Hungarian, Croatian, Greek, Catalan, Hebrew, Romanian, Slovak, Thai, Ukrainian, and Turkish. These languages, which affect system-wide menus and dialogues, can also be changed, for your next login, via the Languages menu of the Language & Text pane in System Preferences. Just move your preferred language to the top of the list.

Sometimes other localizations produced by 3rd parties are made available by the Apple sites in specific countries.

Note that MS Office for Mac is monolingual. If you want to change to another language, see this page.

"Fast User Switching," activated in the Accounts preferences, enables you to quickly rotate your screen among different system languages if you set up separate users for them. Be careful to keep your keyboard the same for all login and logout operations, or you can find your password will not work.

If you poke the "Edit" button in the Language tab to see all varieties available, you get a list of over 130, the exact number depending on whether you have added any additional language fonts. The top language determines the localization of the OS (among the 30 available). Safari uses the order of languages in this list to tell sites what language it prefers, and OS X uses it to determine default fonts. So if Chinese is ahead of Japanese in this list, Chinese fonts should normally get first choice by the system in any ambiguous situation. Also the order will determine which localization will be used for any app which does not have the files needed for the language at the top of the list. You should make sure that any languages you want to read or write are on the list, as that may affect the list of encodings in Mail.app.

The top language also determines the default sort order for lists, which can be set independently via a separate menu on the same preference tab . There can be some unexpected consequences if you put an unusual language in first place. Hawaiian sort order is not at all like English, for example.

The standard sort order in OS X is based on a Unicode system. More info can be found here. Numbers come before Latin characters, Greek comes after.

To avoid Unicode order in Contacts.app (where all names in scripts other than that of the OS are put at the end), use Card > Add New Field > Phonetic First/Last Name to create a Latin script name for each contact in another script. It should then sort with the Latin names.

Applications normally contain their own localizations independent of the system. An app for switching these can be found here. If you want to permanently run an app in a localization different from that of the OS, you will need to rename the .lproj folder found inside the app.

Apple's information on how to localize applications can be found here.

Note that the system language is distinct from the keyboard language, which determines what you can type. The latter is set from the Input Source tab in the Language & Text pane. Spell checking is also set independently in the Text tab.

To change the language of the login page, see http://support.apple.com/kb/HT4102.

To get rid of system and app languages after they have been installed (normally to liberate hard drive space, about 50MB per language), some people use the program Monolingual, but if you are not careful it can do a lot of damage to your system. Personally I do not think the benefit is worth the risk.

Use the Formats Tab of the Language & Text pane to set your preferred locale for date, time, and number formats.

Typing Foreign Languages

In OS X you can select over 70 keyboards covering Arabic*, Australian, Austrian, Azeri, Armenian*, Bangla, Belgian, Brazilian, British*, Bulgarian*, Byelorussian, Canadian French, Cherokee*, Chinese*, Croatian*, Czech*, Danish, Dari, Devanagari*, Dutch, US, Estonian, Faroese, Finnish*, French*, Georgian, German, Greek*, Gujarati*, Gurmurkhi (Punjabi)*, Hawaiian, Hebrew*, Hungarian, Icelandic, Inuktitut*, Irish*, Italian*, Japanese*, Jawi, Kannada*, Kazakh, Khmer, Korean*, Kurdish, Latvian, Lithuanian, Macedonian, Malayalam*, Maltese, Maori, Myanmar, Nepali, Norwegian*, Oriya*, Pashto, Persian*, Polish*, Portuguese, Romanian*, Russian*, Sami*, Serbian*, Slovak*, Slovenian, Spanish*, Swedish*, Swiss*, Tamil*, Telugu*, Thai*, Tibetan*, Turkish*, Uighur, Ukrainian, Urdu, Uzbek, Vietnamese*, and Welsh, plus Dvorak*, Colemak, US Extended, US International PC, and Unicode Hex (an asterix indicates multiple options). In addition to the keyboards, you can choose the Character Viewer and the Keyboard Viewer.

For help with typing other languages and scripts, see these articles: traditional Mongolian, Syriac/Aramaic, Cuneiform, Mongolian Cyrillic, Braille, N'ko, Dhivehi/Maldivian, Nuosu Yi, Amharic, Orkhon/Old Turkic, Tagalog, Egyptian, and Lao,

The Apple Khmer and Myanmar layouts have bugs.

To activate the keyboards and palettes you go the Desktop menu, then to System Preferences, Language & Text, and Input Sources and check the appropriate boxes. Also make sure to check the box for showing the Input Sources (also known as the "flag" menu) in the Menu Bar at the top right of the screen, plus the box for Keyboard Viewer. The Input Sources pane lets you see the possible keyboard shortcuts for switching scripts and keyboards. By default these are not active, but can be made so by poking the button which takes you to Keyboard & Mouse Preferences/Keyboard Shortcuts. You may need to turn off the conflicting shortcut for Spotlight.

To see which key does what for a keyboard, use the Keyboard Viewer mentioned above. Pressing the physical Option, Shift, and Option+Shift keys will show what these combinations produce. To use Keyboard Viewer without having to touch any real keys, try turning on Sticky Keys in System Preferences/Universal Access.

To type "accented characters" you do not necessarily need to switch to a specialized language keyboard. Holding down the base letter on the keyboard will generate a popup menu where you can choose various accented versions (and there are ways to customize what appears in those menus). The standard Mac US keyboard also has "dead keys" for 5 common accents activated via the Option key, and the US Extended keyboard has dead keys (plus capability for inputting combining characters) for many other diacritical marks. A chart is here

Many of the available keyboards can be selected in the "flag" menu and used with all the Carbon and Cocoa programs that run on OS X. The Traditional Chinese, Simplified Chinese, Japanese, Korean, Tamil, Vietnamese Unikey, and Tibetan IM's are organized differently than the other keyboards. For Chinese (and sometimes also Japanese and Korean as concerns applications) the key info site is the

Chinese-Mac FAQ User Guide

Here are Apple's instructions for Chinese handwriting input.

A useful blog for Chinese users can be found at Zhongweb Chinese.

OS X Kotoeri includes an interesting "reverse conversion" command that will convert kanji text into kana, which can then also be transliterated into romaji. The Japanese IM can switch between Roman and direct Kana input via its Preferences pane (first tab, first item), and also allows you to choose your Roman input keyboard layout (first tab, third item). A chart of Kotoeri input codes can be found the the Kotoeri Help. Here is good site for info on Japanese input.

The Vietnamese Unikey IM's include a menu item "Convert to Han-Nom," which lets you convert the modern Latin script into the Chinese characters used in ancient Vietnamese. Here is info on the use of the Telex, VNI, and VIQR layouts.

For users who need the capability of composing Asian languages in vertical, right-to-left format, or with "Ruby" annotations, Word2004/2008/2011 or Open/Neo/LibreOffice or LightWay Text are probably the most practical choice. OS X TextEdit can also do vertical. Not all Asian fonts have proper typographical features for vertical text -- the Hiragino Japanese fonts that come with OS X do, however.

Input of RTL (Right-to-left) scripts like Arabic and Hebrew poses special challenges for word processors and other programs. The program Mellel is especially designed to deal with these. iWork apps and iBook Author may handle copy/paste well, but keyboard input and formatting is probably too buggy to use. There is new item in System Preferences/International/Text where you can activate a split cursor for bidirectional text and also set the direction to RTL, LTR, or Default for either selected text or paragraphs. See this note.

In TextEdit, for best results use rich text mode and activate the menu item Format/Text/Writing Direction/Right to Left. In Mail, you go to Format/Alignment/Writing Direction (and make sure the default font in Mail Preferences is set to Lucida Grande rather than Helvetica). For other programs it may help to use the add-on Direction Service or Writing Direction Menu.

The Pages app has bugs which make keyboard input impossible for particular characters (ZWJ, ZWNJ) which can be required for correct display of certain sequences in Arabic as well as S. Asian and SE Asian scripts.

Winsoft has special versions of Adobe products (including Tasmeem plugins for InDesign) and Filemaker for working with RTL scripts and other languages.

OS X comes with fonts, but no keyboard, for Amharic/Ethiopic and Lao. You can find sources for those by searching my blog on those terms.

Here is a way to use keyboard shortcuts to select individual keyboards (instead of using the "flag" menu).

Extra Keyboards

If you want make your own keyboards, Apple Tech Note 2056 has some information on various options. In practice the easiest thing is to use Ukelele, which provides a simple drag/drop interface for creating custom layouts. This utility comes with a large number of sample layouts so you normally do not have to program all keys, just the ones you want to change.

OS X itself includes a facility for making custom Unicode input methods. Some details can be found here.

There are many custom keyboard layouts available on the internet -- a Google search is the best way to find them. Or contact me.

To install keyboards that you download or create yourself, put them in Users/username/Library/Keyboard Layouts (or in Library/Keyboard Layouts if all usernames need access to them, or if you need them for password/login windows). Then go to System Preferences/International/Input Menu and check the box for the new keyboard. You may need to log out and log in again to have it appear.

Here is how to make a custom keyboard layout the system default.

Spell Checking and Other Dictionaries

OS X includes a system-wide spell-checker, which is accessible via the Edit/Spelling menu (Inspector > Text > More > Language in Pages). There is also an item System Preferences/Language & Text/Text to let you set spellchecking system wide without changing the OS language. In addition to US English, 10.8 has dictionaries for Australian, British, and Canadian English, German, Spanish, French, Italian, Dutch, Portuguese, Swedish, Danish, Polish and Russian.

For other languages you can add dictionaries from OpenOffice as explained here.

For info on making dictionaries, see this page.

MS Office comes with proofing tools for English , French , Spanish, Italian, Japanese, Norwegian, German, Danish, Swedish, Portuguese, Finnish, and Dutch.

A commercial alternative is SpellCatcher X

For Norwegian spell checking try this page, Finnish here, Turkish here, Welsh here, and Hebrew here.

For translation , by far the best tool is Google Translate.

OS X 10.8.2 includes reference dictionaries for US English, British English, French, German, Spanish, Japanese, and Chinese. For others see this article.

Here is info on getting Apple's Dictionary Development Kit if you would like to create your own dictionary.

Fonts

A list of fonts included with Mountain Lion can be had here. OS X can make routine use of many Windows fonts. Note, however, that viewing complex scripts which require reordering, contextual shaping, or stacking of characters (such as Arabic, Devanagari, Tibetan, Classic Mongolian, and Thai) requires a combination of font and rendering engine technology. On the Mac this is accomplished via an AAT (Apple Advanced Typography) font and ATSUI, while Windows uses an OpenType font plus Uniscribe. The result is that when you select a Windows font in OS X, complex scripts may not display correctly, and an Apple font should be used if available.

OS X is gradually increasing OpenType support, and 10.8 can use Windows Arabic and Indic fonts in TextEdit (but not in Pages). Instructions for using Apple's font tools to add some AAT features to other fonts can be found here.

Apple's Color Emoji font has special characteristics. Text produced with it will probably not be seen on non-Apple devices and even in some Apple apps like Pages.

Some info on new Chinese fonts included in Mountain Lion can be found here.

Certain fonts provided by Apple for Malayalam and Telugu have bugs. The same is true for Khmer.

The Character Viewer, found in the "Flag" (Input Source) menu, was revamped in 10.7 and no longer offers a way to view and input all the characters in a particular font. For that you will need to use a program like PopChar or Ultra Character Map. It may also be possible to transplant the CharacterViewer.app from 10.6 and run it as an independent app in later versions of OS X.

Character Viewer is ideal for finding and inputting Unicode characters by range or category. But first you need to go to the "gear wheel" at the top left and select Customize List and add Unicode and any other categories you are interested in.

To change the default font used for certain scripts, see this page.

The behavior of fonts used for non-Roman scripts and for languages like Vietnamese can sometimes be adjusted to suit particular needs regarding glyph forms, ligatures, etc. Open the Font panel, select the font, hit the "gear wheel" at lower left, and select "Typography" to see any options which may be available.

If you want to make your own font, common apps are FontLab, FontForge, RoboFont, Type, and Glyphs.

Other OS X Features

Text-to-speech is available in 26 languages: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish.

The VoiceOver screenreader for blind and low vision users handles 22 languages: Arabic, English, Czech, Danish, Dutch, Finnish, French (France), German, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese (Portugal), Portuguese (Brazil), Russian, Spanish (Spain), Swedish, Turkish, Cantonese, Mandarin (China), and Mandarin (Taiwan).

10.8.2 can take dictation in English (US, Canada, UK Australia), French (France, Canada), German, Japanese, Mandarin, Cantonese, Spanish, Korean, and Italian. To more easily change your dictation language, use DictationSwitcher

Special features of particular interest to users in China are explained here.


iOS Devices

Language capabilities of all iOS devices are essentially the same and can be found in their tech specs.

The language coverage for Voice Control and VoiceOver/Speak Selection can be found here. Not mentioned in the specs is the availability (via download when the Define button is tapped when the proper keyboard is active) of reference dictionaries for English, British English, Japanese, Chinese, French, German, and Spanish.

Voice Dream is an excellent alternative Text-to-Speech app which can handle 20 languages.

To access extra characters on the keyboards for iOS devices you hold your finger on a key and a popup menu of accented characters should appear. Spellchecking follows the keyboard being used, to switch keyboards you use the "globe" key. Whether Japanese or Chinese fonts are used for Han characters may depend on the order of languages in the language setting list. To move one higher you have to switch the OS to that language and then back to what you want. There is no way to set encodings for email or webpages, and no way for users to add fonts or keyboard layouts, other than what you may find the in app store.

A good app to try for writing Arabic/Hebrew is Textilus. Pages has bugs.

A list of iOS software (screen/virtual) and hardware keyboard layouts can be found here.

A list of iOS fonts is available here.

User Guides are available at Apple Manuals.

Languages available for the user interface and song info on music-only iPods can differ considerably by model and are also listed in their individual tech specs.


Email

OS X

The Mail program included with OS X is fully Unicode-savvy and automatically searches for glyphs in installed fonts for whatever encoding is indicated on the incoming text. The user can change the encoding for received messages from the Message/Text Encodings menu, and these can also be selected for outgoing messages. The range of encodings you have to choose from in Mail depends on the languages you have on the list in System Preferences/International/Languages, which you can change using the Edit button. One shortcoming is that Mail cannot set the default encoding for incoming messages, which is tedious if you get a lot of mail with the wrong charset specified. The default encoding for outgoing messages in Mail is sensitive to the order of languages in System Preferences/International/Languages, especially for Russian, Greek, Chinese, Japanese, and Korean. Before sending email in these you may want to test it with a message to yourself to see whether the default encoding is what your recipients will expect, and set it manually or adjust the preferences if necessary.

To force outgoing mail to be encoded as UTF-8, which can solve some problems, include a Unicode Dingbat (range 2700) in your sig.

Webmail

When doing Webmail, you are at the mercy of the behavior of the particular browser and web site being used when it comes to faithful transmission of non-English mail text. It is best to explore the settings for the site to see if anything special exists for unusual scripts, and set the encoding of the browser as best you can before composing or reading. Trial and error may be required to get it right, and sending yourself a test message is a good idea. iCloud webmail can operate in all languages as long as you check the UTF-8 box in its preferences. For the best multilingual email experience, use one of the standard mail programs rather than webmail.


iTunes and iBookstore

The Language Display Capabilities of iTunes should be the same as those of OS X, that is to say just about any language for which you can find a font.

Correct display of the language in song titles in iTunes depends on the language being properly encoded and identified in the ID3 tags of the song. If it isn't working right, you can try to fix the tags. Programs that may be useful for this are Unicode Rewriter and ID3Mod. If fixing the tags doesn't do the trick, the only alternative is to type titles in manually.

Some information on the language support available in the iBookstore is provided here.


Apple TV

The Apple TV uses a version of OS X which has different language capabilities than Mac OS X or iOS. Localizations for the menus and dialogues (chosen at initial setup or afterwards via the Settings > Language menu) are English, British English, Japanese, French, German, Spanish, Italian, Dutch, Swedish, Danish, Norwegian, Finnish, Traditional Chinese, Simplified Chinese, Korean, Brazilian Portuguese, European Portuguese, Russian, Polish, Czech, Hungarian, Croatian, Greek, Catalan, Malayan, Indonesian, Romanian, Slovak, Thai, Ukrainian, Vietnamese, and Turkish. Voiceover is available in all of them to read the screen if necessary.

Here is info on generating special characters with the Apple TV screen keyboard. Non-Latin input is not yet supported that way, but you can in theory do it via the Remote app on an iOS device.


Unicode

Traditionally computer systems could deal with only a limited number of distinct characters at once. Handling diverse languages meant remapping the same 256 codes to different characters for each one, using a font specifically designed for it. Successful communication over the internet sometimes required synchronizing the fonts at each end and translating among a couple dozen mutually incompatible character set standards, a list of which you can find in the "character encoding menu" of any browser or email program.

The development of Unicode, which is the agreed international standard for the unique encoding of all the characters used in different languages, changes this situation radically for the better. By creating a single character set that covers all scripts, Unicode allows the reading and writing of texts in any language, or the simultaneous display of many languages, without changing encodings and fonts. It should eventually become the common basis for text processing across all platforms and programs.

The most obvious practical implication of Unicode systems for users is that you don't switch languages/scripts by switching fonts. Instead you switch keyboard layouts, and the fonts take care of themselves.

The basic principle of Unicode is to assign a unique number (usually expressed in hexadecimal form) to every character. 1.1 million "codepoints" have been allocated for this purpose, divided among 17 "planes" with about 65,000 characters each. All characters in common use have been assigned to Plane 0, also known as the Basic Multilingual Plane (BMP), and some others have been placed into Planes 1, 2, and 14, as part of an ongoing process. Under the current version, Unicode 6.2, just over 110,000 characters have been allocated (plus 136,000 codepoints reserved for private use), and various scripts are in the pipeline under consideration by various committees. For further information see the Roadmap to Unicode.

In practice Unicode data is represented by one of several possible "transformation formats," or UTF's. There are two common ones, UTF-16 and UTF-8. However, only UTF-8 is normally used over the internet. (Email also often has an additional "content transfer encoding," either "base64" or "printed-quotable," which is not related to language or character set issues.)

Mac OS X has full Unicode support. TextEdit and other apps, including Character Viewer and the Unicode Hex keyboard layout, enable reading and writing of all Unicode characters. Custom keyboards, based on XML text files, can also be created to access and input any desired set them.

A good Unicode utility is UnicodeChecker. It covers all 17 Unicode planes, can be searched by character block or name, and characters can be copy/pasted into TextEdit. This program also provides various useful items in the Services menu, including conversion between Unicode and HTML entities.

For codepoints in the Unicode Private Use Area (PUA) used by Apple, see this page.

For info on Unicode and AppleScript, see here.


Copyright 2000,2012 by Thomas H. Gewecke