Setting up Microsoft Windows NT, 2000 or Windows XP to support Unicode supplementary characters

Windows uses UTF-16 internally so supporting supplementary characters means supporting surrogate code points.

For Microsoft Windows to display the supplementary characters of the Unicode Character Standard you may need to adjust your Windows registry settings. Here are the instructions to modify Windows to support displaying supplementary characters. If you already understand Unicode supplementary characters and surrogate code points, go to What if my application uses UTF-8?. If not, the next few sections provide some background.

I18nGuy Home Page

What are Supplementary Characters?

Supplementary characters are those characters in the Unicode Character Standard outside of the Basic Multilingual Plane (BMP). The BMP consists of the first 64K characters in Unicode. The remaining characters are in the supplementary planes 1-16. Each plane consists of 64K characters.

UTF-32, being based on 32-bit code units can reference characters in all 17 planes easily. UTF-8, consisting of multi-byte characters of 1 to 4 bytes, also can reference the entire Unicode character space. UTF-16 is based on 16-bit units. As 16-bit values can only reference 65,000+ values, some values need to be reserved for use in a more indirect encoding scheme that allows for referencing all 17 planes (1 million+ characters). (The encoding scheme is analogous to double-byte character programming.) These reserved 16-bit values are called surrogate code points. To properly use these values and reference the entire Unicode character space, applications need to be programmed to recognize surrogate code points and to map these values to the correct values for the associated characters. In the case of Windows, a program called Uniscribe, has the appropriate programming and is needed for display of supplementary characters.

Where are surrogate code points needed?

Surrogate code points are used only in UTF-16 to represent Unicode supplementary characters, those characters outside of the Basic Multilingual Plane (BMP). The code points for surrogates are in the ranges U+D800-U+DBFF (High Surrogates) and U+DC00-U+DFFF (Low Surrogates).

Further explanations can be found in FAQs, the Unicode Character Standard, and other information at the Unicode Consortium. See also the Resources Section of this document.

What if my application uses UTF-8?

Even though some applications use UTF-8, Windows converts the characters internally to UTF-16, so the registry setting is still relevant.

Are there any prerequisites?

You may need to install a font that supports supplementary characters. You may need to install an input method that allows entry of supplementary characters.

What about Microsoft Windows 95, 98, and ME systems?

Rumor has it that by installing Uniscribe (USP10.dll) on your system, Unicode-aware applications will gain support for supplementary characters. Caveat emptor. Microsoft hasn't said anything about supporting this configuration, but several people on the Unicode discussion list claim it works. However, you may need to install a recent version of USP10.dll. Older versions, which came with the original systems, will not have the supplementary character support.

What do the settings do?

There are 3 settings to consider.
1) The first setting causes Uniscribe to be loaded, to provide the programming for displaying supplementary characters.
2) The second setting is used to name both a Fixed and a Proportional font that supports supplementary characters, to be used by Internet Explorer.
3) The third optional setting is for Windows XP systems only. This entry allows you to set a font face name that supports plane 1 supplementary characters into a special registry entry. This font will then be the default or fallback font for plane 1 characters, if another font is not explicitly designated to do so within an application. You can also set fallback fonts for other planes by suitably modifying the entry.

Instructions

Who needs to apply these registry settings?

The first setting causes Uniscribe to be loaded. Uniscribe provides the capability to Windows to display supplementary characters. If you have trouble displaying supplementary characters, AND you are using a font that has glyphs for these characters, then consider applying the first setting. Users that have installed any of the language packs that cause Uniscribe to be loaded, will not need to apply these settings. The install will have made appropriate changes already.

The second setting is for Internet Explorer only and it makes the supplementary character font known to IE.

The third setting is for Windows XP systems only and is optional. Apply it, if you want to name a font to be the fallback font for a particular supplementary plane's characters.

Setting 1

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]
SURROGATE=(REG_DWORD)0x00000002

Setting 2

[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42]
IEFixedFontName=[Surrogate Font Face Name]
IEPropFontName=[Surrogate Font Face Name]

Setting 3

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback]
Plane1=[Plane 1 Font Face name]
Plane2=[Plane 2 Font Face name]
... etc. ...

Example web pages

For example web pages using supplementary characters, and related information, see the Unicode example pages:
I18nGuy's Unicode plane 1 examples using UTF-16 characters
I18nGuy's Unicode plane 1 examples using UTF-8 characters and
I18nGuy's Unicode plane 1 examples using Numeric Character References (NCR).

There is also an introductory page providing an overview of these pages and other related pages at:
I18nGuy's Introduction to: Examples of Unicode usage for business applications

Tom Gewecke's Unicode Examples Beyond BMP (UTF-8, PUA Plane 15)

References

The information on this page comes from:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_192r.asp

Go there to get more information on support for supplementary characters in Microsoft products.

Updated information has been provided by John McConnell of Microsoft, 2002-11-11 and included here. For reference, see these mails on the Unicode list:
original message from John
confirming message from Tex

Other Resources for Supplementary Characters

I welcome suggestions for additions to this list.

To find out how many characters are currently allocated in Unicode, see: The Number of Characters in Unicode.
To find out where a character is in Unicode, see Where is my character?.
To find out whether a script is in Unicode, or in the pipeline to be added to Unicode, see Unicode's Roadmaps.
To find web pages with supplementary characters, see the section Example web pages.
For definitions of Unicode-related terms, refer to the Unicode Glossary.
The Unicode Consortium also has a Frequently Asked Questions (FAQ) page on UTF & BOM which discusses surrogates.

Andrew West's web page Unicode Surrogate Pairs
David Perry's web page Using Plane 1 Characters

Input Methods tools: Tavultesoft Keyman
Input Methods tools: (Mac OS X) TN2056 Installable Keyboard Layouts
Character Map: Andrew West's BabelMap

Fonts: James Kass's CODE2001
Fonts: Hiragino Maru Gothic Pro W4 (Mac OS X, but works on Windows), MingUni, Simsun (Founder Extended)

Editors: Unipad
Editors: Andrew West's BabelPad

Convertors between scalar values and surrogate code points, or between UTFs:
Mark Davis's UTF Convertor (Java)
Michael Kaplan's UTF-16-->UTF-32 and back (Java)
I18nGuy's Conversion Table: Surrogates to Scalar Value/UTF-32 (300KB, HTML only)

More generally, see I18nGuy's Code pages at the touch of a button

Top of page
This page last updated 2003-02-02.