Monday, November 2, 2009

Section 24.1.  Overview and Terminology









24.1. Overview and Terminology






Before we proceed, let's get some terminology straight. Globalization, internationalization, and localization are often used interchangeably, yet they actually have very different meanings. Table 24-1 defines each and explains how they are related.


Table 24-1. Fundamental terms and abbreviation

Term

Abbreviation

Definition

Globalization

g11n

Application development strategy focused on making applications multilingual and locale-independent. Globalization is accomplished through internationalization and localization.

Internationalization

i18n

The design or modification of an application to work with multiple locales.

Localization

l10n

The process of actually making an application work in each specific locale. l10n includes text translation. It is made easier with a proper i18n implementation.



If you haven't seen the abbreviations mentioned in Table 24-1 before, you may be confused about the numbers sandwiched between the two letters. These terms are often abbreviated by including the first letter and last letter, with the number of characters between them in the middle. Globalization, for example, has 11 letters between the "g" and the "n," making the abbreviation g11n.



It is true that Oracle supports localization to every region of the world. I have heard it suggested, though, that Oracle's localization support means that you can load English data and search it in Japanese. Not true! Oracle does not have a built-in linguistic translation engine that performs translations on the fly for you. If you have ever witnessed the results of a machine translation, you know that you would not want this kind of so-called functionality as a built-in "feature" anyway. Oracle supports localization, but it does not implement localization for you. That is still your job.


Additional terms used in this chapter are defined

in Table 24-2; we'll expand on these in the following sections.


Table 24-2. Detailed globalization, localization, and internationalization terms

Term

Definition

Character encoding

Each character is a representation of a code point. Character encoding is the mapping between character and code point. The type of character encoding
chosen for the database determines the ability to store and retrieve these code points

.

Character set

Characters are grouped by language or region. Each regionalized set of characters is referred to as a character set.

Code point

Each character in every character set is given a unique identifier called a code point. This identifier is determined by the Unicode

Consortium. Code points can represent a character in its entirety or can be combined with other code points to form complex characters. An example of a code point is \0053.

Glyph

A glyph is the graphical display of a character that is mapped to one or more code points. The code point definition in this table used the \0053 code point. The glyph this code point is mapped to is the capital letter S.

Multibyte characters

Most Western European characters require only a single byte to store them. Multibyte characters, such as Japanese or Korean, require between two and four bytes to store a single character in the database.

NLS

National Language Support is the old name for Oracle's globalization architecture. Beginning with Oracle9i Database, it is officially referred to as Globalization Support, but you will see documentation and parameters that make reference to NLS for some time to come.

Unicode

Unicode is a standard for character encoding.










    No comments: