Locales, Locales, Locales, at Unicode Conference

It was "Locales, Locales, Locales" not "Location, location, location" at the 22nd International Unicode Conference held in San Jose this week (Sept. 9-13, 2002). Locales are important to Web Services. However, Locales aren't working well today, according to some at the Unicode Conference.


I18nGuy Home Page

Chris Lilley of the World Wide Web Consortium (W3C) kicked off the conference with an exciting keynote demonstrating several dynamic Unicode graphics with SVG (Scalable Vector Graphics).

The Web Internationalization Panel later that afternoon was dominated by questions and issues surrounding Locales. "Locale" is a term used here to describe a collection of user preferences. It is a generalization of the concept used with Unix, Java and other programming environments. Unfortunately, although the concept is widespread, implementations vary considerably, and in some cases it is misconstrued to exist in some environments where it doesn't apply.

For example, although HTML and XML both use a language identifier consisting of language and country codes, similar to Unix and Java, neither markup language expects the identifier to represent a user's preference for date, money, number or other data formats. It is purely a language identifier.

Attendees wanted to know what preferences should be included in a locale setting? What happens when a user visits another country: does the user's preferences change? If I visit France, should web search engines return results in French and/or give priority to French vendors' web pages? Several attendees complained about language and other changes, and poor search results, due to a search engine detecting their new locale. This led to a discussion of privacy concerns. The desire for web servers to detect and cater to user's preferences conflicts with users' rights to privacy.

Although these issues are not new, they are highlighted by the up and coming Web Services technology. Web Services will allow machines to talk to machines, without requiring human involvement. Processing is distributed, occuring on servers located anywhere in the world. Of course, the machines are processing requests that ultimately benefit human users, and the amount of information regarding those users that is sent along with their requests, is a concern. Not enough information or too many mistaken assumptions, and the results are unsatisfactory. Reveal too much information and the user can be harmed.

Martin Dürst, and Richard Ishida, working on Internationalization for the W3C, pointed out that the new charter for the Internationalization Working Group includes a task force for Web Services. They are looking for more participants and will be examining many of the issues raised around Locales. They are starting to collect "use cases" to establish the requirements that Web Services must satisfy with respect to internationalization.

Peter Constable, researcher for SIL International, gave 2 presentations, which stoked the Locales fire. Peter described the model that linguists should use in defining language identification for the IT market. (Toward a Model for Language Identification) He says the two notions "language" and "locale" are assumed to cover the full range of distinctions to be made, whereas in fact there are other distinct language-related notions that need to be reflected in systems of identification. His second paper (An Analysis of ISO 639) looked closely at ISO 639 language codes and points out several cases where the codes do not represent languages as precisely or accurately as either linguists or software developers need them to be.

By the last day of the conference, Locales were on most people's mind. Tex Texin, XenCraft's "Xen Master", presented his paper on "What's wrong with Locales?" (PDF, 195Kb). He identified misconceptions that people have about the use of Locales in different development environments, and that standards bodies may have goals for developing the language and country codes that differ significantly from the IT industry's requirements. For example, software requires that these codes have long-term stability, since it is so hard to insure compatible software versions on the scale of the world wide web. However, the codes sometimes change due to political or other events.

Tex's presentation led naturally into a 2-session panel devoted to Locales. Tex moderated, while Cathy Wissink (Microsoft), Peter Constable (SIL), Addison Phillips (webMethods), and David Possin (Welocalize.com) responded to audience questions. A stimulating discussion kept the room fully occupied for 90 minutes. Tex created a summary of many of the key points that were made, and posted the summary, and his paper ("What's wrong with Locales?", PDF, 195Kb) and the panel's paper (PDF, 177Kb) on the Locales section of his I18nGuy web site.

Related resources

For more information on Locales, check out the following sites: Some of the problems with Locales were pointed out at the W3C Internationalization Workshop January, 2002. Suzanne Topping submitted a position statement on Locales, summarizing ongoing discussions of the Locales Yahoo Group.

Tex maintains a list of Locales-related resources and a web page for Issues and Advantages with the use of Locales. The conference papers on Locales are also available:

Language identifiers are discussed in the IETF internet standards RFC 1766 and its replacement RFC 3066.

Addison Phillips has a proposal for Locales, "ULocales: Building Global Enterprise Web Services". After a comment period, he will submit it to the W3C.

To find out more about the 7,000+ languages, see SIL's Ethnologue Languages of the world.