Code Page Compatibility
Compatibility between code pages is essential for accurate data movement when the PowerCenter Integration Service runs in the Unicode data movement mode.
A code page can be compatible with another code page, or it can be a subset or a superset of another:
- •Compatible. Two code pages are compatible when the characters encoded in the two code pages are virtually identical. For example, JapanEUC and JIPSE code pages contain identical characters and are compatible with each other. The PowerCenter repository and PowerCenter Integration Service process can each use one of these code pages and can pass data back and forth without data loss.
- •Superset. A code page is a superset of another code page when it contains all the characters encoded in the other code page and additional characters not encoded in the other code page. For example, MS Latin1 is a superset of US-ASCII because it contains all characters in the US-ASCII code page.
Note: Informatica considers a code page to be a superset of itself and all other compatible code pages.
- •Subset. A code page is a subset of another code page when all characters in the code page are also encoded in the other code page. For example, US-ASCII is a subset of MS Latin1 because all characters in the US-ASCII code page are also encoded in the MS Latin1 code page.
For accurate data movement, the target code page must be a superset of the source code page. If the target code page is not a superset of the source code page, the PowerCenter Integration Service may not process all characters, resulting in incorrect or missing data. For example, Latin1 is a superset of US-ASCII. If you select Latin1 as the source code page and US-ASCII as the target code page, you might lose character data if the source contains characters that are not included in US-ASCII.
When you install or upgrade a PowerCenter Integration Service to run in Unicode mode, you must ensure code page compatibility among the domain configuration database, the Administrator tool, PowerCenter Clients, PowerCenter Integration Service process nodes, the PowerCenter repository, the Metadata Manager repository, and the machines hosting pmrep and pmcmd. In Unicode mode, the PowerCenter Integration Service enforces code page compatibility between the PowerCenter Client and the PowerCenter repository, and between the PowerCenter Integration Service process and the PowerCenter repository. In addition, when you run the PowerCenter Integration Service in Unicode mode, code pages associated with sessions must have the appropriate relationships:
- •For each source in the session, the source code page must be a subset of the target code page. The PowerCenter Integration Service does not require code page compatibility between the source and the PowerCenter Integration Service process or between the PowerCenter Integration Service process and the target.
- •If the session contains a Lookup or Stored Procedure transformation, the database or file code page must be a subset of the target that receives data from the Lookup or Stored Procedure transformation and a superset of the source that provides data to the Lookup or Stored Procedure transformation.
- •If the session contains an External Procedure or Custom transformation, the procedure must pass data in a code page that is a subset of the target code page for targets that receive data from the External Procedure or Custom transformation.
Informatica uses code pages for the following components:
- •Domain configuration database. The domain configuration database must be compatible with the code pages of the PowerCenter repository and Metadata Manager repository.
- •Administrator tool. You can enter data in any language in the Administrator tool.
- •PowerCenter Client. You can enter metadata in any language in the PowerCenter Client.
- •PowerCenter Integration Service process. The PowerCenter Integration Service can move data in ASCII mode and Unicode mode. The default data movement mode is ASCII, which passes 7-bit ASCII or 8-bit ASCII character data. To pass multibyte character data from sources to targets, use the Unicode data movement mode. When you run the PowerCenter Integration Service in Unicode mode, it uses up to three bytes for each character to move data and performs additional checks at the session level to ensure data integrity.
- •PowerCenter repository. The PowerCenter repository can store data in any language. You can use the UTF-8 code page for the PowerCenter repository to store multibyte data in the PowerCenter repository. The code page for the PowerCenter repository is the same as the database code page.
- •Metadata Manager repository. The Metadata Manager repository can store data in any language. You can use the UTF-8 code page for the Metadata Manager repository to store multibyte data in the repository. The code page for the repository is the same as the database code page.
- •Sources and targets. The sources and targets store data in one or more languages. You use code pages to specify the type of characters in the sources and targets.
- •PowerCenter command line programs. You must also ensure that the code page for pmrep is a subset of the PowerCenter repository code page and the code page for pmcmd is a subset of the PowerCenter Integration Service process code page.
Most database servers use two code pages, a client code page to receive data from client applications and a server code page to store the data. When the database server is running, it converts data between the two code pages if they are different. In this type of database configuration, the PowerCenter Integration Service process interacts with the database client code page. Thus, code pages used by the PowerCenter Integration Service process, such as the PowerCenter repository, source, or target code pages, must be identical to the database client code page. The database client code page is usually identical to the operating system code page on which the PowerCenter Integration Service process runs. The database client code page is a subset of the database server code page.
For more information about specific database client and server code pages, see your database documentation.
Domain Configuration Database Code Page
The domain configuration database must be compatible with the code pages of the PowerCenter repository, Metadata Manager repository, and Model repository.
The Service Manager synchronizes the list of users in the domain with the list of users and groups in each application service. If a user name in the domain has characters that the code page of the application service does not recognize, characters do not convert correctly and inconsistencies occur.
Administrator Tool Code Page
The Administrator tool can run on any node in a Informatica domain. The Administrator tool code page is the code page of the operating system of the node. Each node in the domain must use the same code page.
The Administrator tool code page must be:
- •A subset of the PowerCenter repository code page
- •A subset of the Metadata Manager repository code page
- •A subset of the Model Repository code page
PowerCenter Client Code Page
The PowerCenter Client code page is the code page of the operating system of the PowerCenter Client. To communicate with the PowerCenter repository, the PowerCenter Client code page must be a subset of the PowerCenter repository code page.
PowerCenter Integration Service Process Code Page
The code page of a PowerCenter Integration Service process is the code page of the node that runs the PowerCenter Integration Service process. Define the code page for each PowerCenter Integration Service process in the Administrator tool on the Processes tab.
However, on UNIX, you can change the code page of the PowerCenter Integration Service process by changing the LANG, LC_CTYPE or LC_ALL environment variable for the user that starts the process.
The code page of the PowerCenter Integration Service process must be:
- •A subset of the PowerCenter repository code page
- •A superset of the machine hosting pmcmd or a superset of the code page specified in the INFA_CODEPAGENAME environment variable
The code pages of all PowerCenter Integration Service processes must be compatible with each other. For example, you can use MS Windows Latin1 for a node on Windows and ISO-8859-1 for a node on UNIX.
PowerCenter Integration Services configured for Unicode mode validate code pages when you start a session to ensure accurate data movement. It uses session code pages to convert character data. When the PowerCenter Integration Service runs in ASCII mode, it does not validate session code pages. It reads all character data as ASCII characters and does not perform code page conversions.
Each code page has associated sort orders. When you configure a session, you can select one of the sort orders associated with the code page of the PowerCenter Integration Service process. When you run the PowerCenter Integration Service in Unicode mode, it uses the selected session sort order to sort character data. When you run the PowerCenter Integration Service in ASCII mode, it sorts all character data using a binary sort order.
If you run the PowerCenter Integration Service in the United States on Windows, consider using MS Windows Latin1 (ANSI) as the code page of the PowerCenter Integration Service process.
If you run the PowerCenter Integration Service in the United States on UNIX, consider using ISO 8859-1 as the code page for the PowerCenter Integration Service process.
If you use pmcmd to communicate with the PowerCenter Integration Service, the code page of the operating system hosting pmcmd must be identical to the code page of the PowerCenter Integration Service process.
The PowerCenter Integration Service generates the names of session log files, reject files, caches and cache files, and performance detail files based on the code page of the PowerCenter Integration Service process.
PowerCenter Repository Code Page
The PowerCenter repository code page is the code page of the data in the repository. The PowerCenter Repository Service uses the PowerCenter repository code page to save metadata in and retrieve metadata from the PowerCenter repository database. Choose the PowerCenter repository code page when you create or upgrade a PowerCenter repository. When the PowerCenter repository database code page is UTF-8, you can create a PowerCenter repository using UTF-8 as its code page.
The PowerCenter repository code page must be:
- •Compatible with the domain configuration database code page
- •A superset of the the Administrator tool code page
- •A superset of the PowerCenter Client code page
- •A superset of the code page for the PowerCenter Integration Service process
- •A superset of the machine hosting pmrep or a superset of the code page specified in the INFA_CODEPAGENAME environment variable
A global PowerCenter repository code page must be a subset of the local PowerCenter repository code page if you want to create shortcuts in the local PowerCenter repository that reference an object in a global PowerCenter repository.
If you copy objects from one PowerCenter repository to another PowerCenter repository, the code page for the target PowerCenter repository must be a superset of the code page for the source PowerCenter repository.
Metadata Manager Repository Code Page
The Metadata Manager repository code page is the code page of the data in the repository. The Metadata Manager Service uses the Metadata Manager repository code page to save metadata to and retrieve metadata from the repository database. The Administrator tool writes user and group information to the Metadata Manager Service. The Administrator tool also writes domain information in the repository database. The PowerCenter Integration Service process writes metadata to the repository database. Choose the repository code page when you create or upgrade a Metadata Manager repository. When the repository database code page is UTF-8, you can create a repository using UTF-8 as its code page.
The Metadata Manager repository code page must be:
- •Compatible with the domain configuration database code page
- •A superset of the Administrator tool code page
- •A subset of the PowerCenter repository code page
- •A superset of the code page for the PowerCenter Integration Service process
PowerCenter Source Code Page
The source code page depends on the type of source:
Regardless of the type of source, the source code page must be a subset of the code page of transformations and targets that receive data from the source. The source code page does not need to be a subset of transformations or targets that do not receive data from the source.
Note: Select IBM EBCDIC as the source database connection code page only if you access EBCDIC data, such as data from a mainframe extract file.
PowerCenter Target Code Page
The target code page depends on the type of target:
The target code page must be a superset of the code page of transformations and sources that provide data to the target. The target code page does not need to be a superset of transformations or sources that do not provide data to the target.
The PowerCenter Integration Service creates session indicator files, session output files, and external loader control and data files using the target flat file code page.
Note: Select IBM EBCDIC as the target database connection code page only if you access EBCDIC data, such as data from a mainframe extract file.
Command Line Program Code Pages
The pmcmd and pmrep command line programs require code page compatibility. pmcmd and pmrep use code pages when sending commands in Unicode. Other command line programs do not require code pages.
The code page compatibility for pmcmd and pmrep depends on whether you configured the code page environment variable INFA_CODEPAGENAME for pmcmd or pmrep. You can set this variable for either command line program or for both.
If you did not set this variable for a command line program, ensure the following requirements are met:
- •If you did not set the variable for pmcmd, then the code page of the machine hosting pmcmd must be a subset of the code page for the PowerCenter Integration Service process.
- •If you did not set the variable for pmrep, then the code page of the machine hosting pmrep must be a subset of the PowerCenter repository code page.
If you set the code page environment variable INFA_CODEPAGENAME for pmcmd or pmrep, ensure the following requirements are met:
- •If you set INFA_CODEPAGENAME for pmcmd, the code page defined for the variable must be a subset of the code page for the PowerCenter Integration Service process.
- •If you set INFA_CODEPAGENAME for pmrep, the code page defined for the variable must be a subset of the PowerCenter repository code page.
- •If you run pmcmd and pmrep from the same machine and you set the INFA_CODEPAGENAME variable, the code page defined for the variable must be subsets of the code pages for the PowerCenter Integration Service process and the PowerCenter repository.
If the code pages are not compatible, the PowerCenter Integration Service process may not fetch the workflow, session, or task from the PowerCenter repository.
Code Page Compatibility Summary
The following image shows code page compatibility in the Informatica environment:
The following table summarizes code page compatibility between sources, targets, repositories, the Informatica Administrator, PowerCenter Client, and Integration Service process:
Component Code Page | Code Page Compatibility |
---|
Source (including relational, flat file, and XML file) | Subset of target. Subset of lookup data. Subset of stored procedures. Subset of External Procedure or Custom transformation procedure code page. |
Target (including relational, XML files, and flat files) | Superset of source. Superset of lookup data. Superset of stored procedures. Superset of External Procedure or Custom transformation procedure code page. Integration Service process creates external loader data and control files using the target flat file code page. |
Lookup and stored procedure database | Subset of target. Superset of source. |
External Procedure and Custom transformation procedures | Subset of target. Superset of source. |
Domain Configuration Database | Compatible with the PowerCenter Repository Service. Compatible with the Metadata Manager repository. |
PowerCenter Integration Service process | Compatible with its operating system. Subset of the PowerCenter repository. Subset of the Metadata Manager repository. Superset of the machine hosting pmcmd. Identical to other nodes running the PowerCenter Integration Service processes. |
PowerCenter repository | Compatible with the domain configuration database. Superset of PowerCenter Client. Superset of the nodes running the PowerCenter Integration Service process. Superset of the Metadata Manager repository. A global PowerCenter repository code page must be a subset of a local PowerCenter repository. |
PowerCenter Client | Subset of the PowerCenter repository. |
Machine running pmcmd | Subset of the PowerCenter Integration Service process. |
Machine running pmrep | Subset of the PowerCenter repository. |
Administrator Tool | Subset of the PowerCenter repository. Subset of the Metadata Manager repository. |
Metadata Manager repository | Compatible with the domain configuration database. Subset of the PowerCenter repository. Superset of the Administrator tool. Superset of the PowerCenter Integration Service process. |