The Number Of Characters In Unicode

Ken Whistler provided this numeric breakdown of the code points in the Unicode Character Standard, showing how many are assigned to each character type, how many are reserved for future use, and the number remaining. Links to the Unicode glossary are provided where each Unicode term is initially used. An index of Unicode terminology with glossary links is also provided.

For more information about how scripts are allocated in Unicode, and which scripts may be forthcoming, see the Unicode Consortium's web page Roadmaps to Unicode. To identify if a particular character is in Unicode, see the page Where is my Character?. To understand the value of a comprehensive, universal character standard, see I18nGuy's Benefits of the Unicode Standard.


I18nGuy Home Page

The number of characters in the Unicode Character Standard version 3.2 is 95,221.

The number 95,221 is derived from: 95,156 graphic characters + 65 control codes.
That corresponds to the number of encoded characters, omitting private use area (PUA) characters.
The total number of assigned code points is 95,156 + 6400 + 131,068 + 65 = 232,689.
The total number of designated code points is (232,689 assigned code points) + (2048 + 34 + 32) = 234,803.

(Thanks to Ken Whistler for this analysis and the contents of this page.)
Also see: Unicode Standard Annex #28 Unicode 3.2
For information on earlier versions of the Unicode Standard see http://www.unicode.org/unicode/standard/versions/
Confused? See the index of terms used on this page.

Ken Whistler's accounting of characters in Unicode through version 3.2.
Character TypeUnicode Version
1.01.12.02.13.0 3.1 3.2
BMP Alphas/Symbols 4748630965096511102361023811195
Supplementary Alphas/Symbols      16911691
Han (URO) 20902209022090220902209022090220902
Han (Extension A)     658265826582
Han (Extension B)      4271142711
Han Compatibility 302302302302302302361
Supplementary Han Compatibility      542542
Hangul Syllables 235066561117211172111721117211172
Subtotal 28302 34169 38885 38887 49194 94140 95156
BMP Private Use 5632640064006400640064006400
Supplementary Private Use   131068131068131068131068131068
Surrogate Code Points   20482048204820482048
Control Codes 65656565656565
BMP NonCharacters 222223434
Supplementary NonCharacters   3232323232
BMP Reserved 31535249001813618134782777936777
Supplementary Reserved   917476917476917476872532872532

The following is a list of the terms used on this page, linked to the Unicode Glossary definitions.

Assigned Code Point BMP Character Code Point
Compatibility Control Code Encoded Character Han
Hangul NonCharacter Private Use Reserved
Supplementary Surrogate Code Point Syllable URO

Top of page
I18nguy Home Page
Send comments to Tex Texin
This page was last updated 2003-10-09.