Code Page Overview
A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set. You use code pages to identify data that might be in different languages. For example, if you create a mapping to process Japanese data, you must select a Japanese code page for the source data.
When you choose a code page, the program or application for which you set the code page refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data.
Most machines use one of the following code pages:
- •US-ASCII (7-bit ASCII)
- •MS Latin1 (MS 1252) for Windows operating systems
- •Latin1 (ISO 8859-1) for UNIX operating systems
- •IBM EBCDIC US English (IBM037) for mainframe systems
The US-ASCII code page contains all 7-bit ASCII characters and is the most basic of all code pages with support for United States English. The US-ASCII code page is not compatible with any other code page. When you install either the PowerCenter Client, PowerCenter Integration Service, or PowerCenter repository on a US-ASCII system, you must install all components on US-ASCII systems and run the PowerCenter Integration Service in ASCII mode.
MS Latin1 and Latin1 both support English and most Western European languages and are compatible with each other. When you install the PowerCenter Client, PowerCenter Integration Service, or PowerCenter repository on a system using one of these code pages, you can install the rest of the components on any machine using the MS Latin1 or Latin1 code pages.
You can use the IBM EBCDIC code page for the PowerCenter Integration Service process when you install it on a mainframe system. You cannot install the PowerCenter Client or PowerCenter repository on mainframe systems, so you cannot use the IBM EBCDIC code page for PowerCenter Client or PowerCenter repository installations.
UNIX Code Pages
In the United States, most UNIX operating systems have more than one code page installed and use the ASCII code page by default. If you want to run PowerCenter in an ASCII-only environment, you can use the ASCII code page and run the PowerCenter Integration Service in ASCII mode.
UNIX systems allow you to change the code page by changing the LANG, LC_CTYPE or LC_ALL environment variable. For example, you want to change the code page an AIX machine uses. Use the following command in the C shell to view your environment:
locale
This results in the following output, in which “C” implies “ASCII”:
LANG="C"
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
To change the language to English and require the system to use the Latin1 code page, you can use the following command:
setenv LANG en_US.iso88591
When you check the locale again, it has been changed to use Latin1 (ISO 8859-1):
LANG="en_US.iso88591"
LC_CTYPE="en_US.iso88591"
LC_NUMERIC="en_US.iso88591"
LC_TIME="en_US.iso88591"
LC_ALL="en_US.iso88591"
For more information about changing the locale or code page of a UNIX system, see the UNIX documentation.
Windows Code Pages
The Windows operating system is based on Unicode, but does not display the code page used by the operating system in the environment settings. However, you can make an educated guess based on the country in which you purchased the system and the language the system uses.
If you purchase Windows in the United States and use English as an input and display language, your operating system code page is MS Latin1 (MS1252) by default. However, if you install additional display or input languages from the Windows installation CD and use those languages, the operating system might use a different code page.
For more information about the default code page for your Windows system, contact Microsoft.
Choosing a Code Page
Choose code pages based on the character data you use in mappings. Character data can be represented by character modes based on the character size. Character size is the storage space a character requires in the database. Different character sizes can be defined as follows:
- •Single-byte. A character represented as a unique number between 0 and 255. One byte is eight bits. ASCII characters are single-byte characters.
- •Double-byte. A character two bytes or 16 bits in size represented as a unique number 256 or greater. Many Asian languages, such as Chinese, have double-byte characters.
- •Multibyte. A character two or more bytes in size is represented as a unique number 256 or greater. Many Asian languages, such as Chinese, have multibyte characters.