Ken Whistler provided this numeric breakdown of the code points in the Unicode Character Standard, showing how many are assigned to each character type, how many are reserved for future use, and the number remaining. Links to the Unicode glossary are provided where each Unicode term is initially used. An index of Unicode terminology with glossary links is also provided.
For more information about how scripts are allocated in Unicode, and which scripts may be forthcoming, see the Unicode Consortium's web page Roadmaps to Unicode. To identify if a particular character is in Unicode, see the page Where is my Character?. To understand the value of a comprehensive, universal character standard, see I18nGuy's Benefits of the Unicode Standard.
The number of characters in the Unicode Character Standard version 3.2 is 95,221.
The number 95,221 is derived from: 95,156 graphic characters +
65 control codes.
That corresponds to the number of encoded
characters, omitting private use area (PUA) characters.
The total number of assigned code points is 95,156 + 6400 + 131,068 + 65 = 232,689.
The total number of designated code points is (232,689 assigned code points) + (2048 + 34 + 32) = 234,803.
(Thanks to Ken Whistler for this analysis and the contents of this page.)
Also see: Unicode Standard Annex #28 Unicode 3.2
For information on earlier versions of the Unicode Standard
see http://www.unicode.org/unicode/standard/versions/
Confused? See the index of terms used on this page.
Character Type | Unicode Version | ||||||
---|---|---|---|---|---|---|---|
1.0 | 1.1 | 2.0 | 2.1 | 3.0 | 3.1 | 3.2 | |
BMP Alphas/Symbols | 4748 | 6309 | 6509 | 6511 | 10236 | 10238 | 11195 |
Supplementary Alphas/Symbols | 1691 | 1691 | |||||
Han (URO) | 20902 | 20902 | 20902 | 20902 | 20902 | 20902 | 20902 |
Han (Extension A) | 6582 | 6582 | 6582 | ||||
Han (Extension B) | 42711 | 42711 | |||||
Han Compatibility | 302 | 302 | 302 | 302 | 302 | 302 | 361 |
Supplementary Han Compatibility | 542 | 542 | |||||
Hangul Syllables | 2350 | 6656 | 11172 | 11172 | 11172 | 11172 | 11172 |
Subtotal | 28302 | 34169 | 38885 | 38887 | 49194 | 94140 | 95156 |
BMP Private Use | 5632 | 6400 | 6400 | 6400 | 6400 | 6400 | 6400 |
Supplementary Private Use | 131068 | 131068 | 131068 | 131068 | 131068 | ||
Surrogate Code Points | 2048 | 2048 | 2048 | 2048 | 2048 | ||
Control Codes | 65 | 65 | 65 | 65 | 65 | 65 | 65 |
BMP NonCharacters | 2 | 2 | 2 | 2 | 2 | 34 | 34 |
Supplementary NonCharacters | 32 | 32 | 32 | 32 | 32 | ||
BMP Reserved | 31535 | 24900 | 18136 | 18134 | 7827 | 7793 | 6777 |
Supplementary Reserved | 917476 | 917476 | 917476 | 872532 | 872532 |
The following is a list of the terms used on this page, linked to the Unicode Glossary definitions.
Assigned Code Point | BMP | Character | Code Point |
Compatibility | Control Code | Encoded Character | Han |
Hangul | NonCharacter | Private Use | Reserved |
Supplementary | Surrogate Code Point | Syllable | URO |
Top of page
I18nguy Home Page
Send comments to Tex Texin
This page was last updated 2003-10-09.