Data Sources
To perform data subset, masking, and discovery operations, you must import source metadata into the TDM repository. You can import sources from the PowerCenter repository or from a source database. To perform data generation operations, you must import target metadata into the TDM repository.
When you create a project, add one or more sources to the project. You can add more than one type of source to the project. For example, you can add a flat file source and a relational source to the project. You can create constraints to create relationships between the sources and apply filter criteria for data subset and data masking.
Note: If the source data has a numeric column with precision greater than 28, you cannot apply a data masking or data subset condition to that column. The PowerCenter Integration Service passes the row and the column to the target without applying the condition.
TDM Source Definitions
Import source metadata from PowerCenter or directly from the source database to the TDM repository. The TDM project and PowerCenter folders share the same name to ensure that the PowerCenter Integration Service stores workflow information in the correct PowerCenter folder.
You can import source metadata directly from the source if the source is a relational database. If the source contains many tables, you can increase the performance if you import the metadata directly from the source instead of from PowerCenter.
When you import source definitions from the PowerCenter repository into the TDM repository, you can import all the definitions supported by PowerCenter. When you import source metadata from PowerCenter, you import the PowerCenter folder that contains source metadata. Test Data Manager creates a copy of the folder in the TDM repository.
To use flat file sources, you must import flat files from the PowerCenter repository. You cannot create flat file connections in Test Data Manager.
Note: Rowid is a reserved keyword. You cannot import Oracle sources that have rowid as a column name or a table name.
Hadoop Data Sources
You can perform data movement and data masking operations on Hadoop data sources.
You can use the following Hadoop connections: Hive and Hadoop Distributed File System (HDFS). You can create Hive and HDFS connections in Test Data Manager, and import the Hadoop data sources in to a project. In a Hadoop plan, you can select the Hadoop connections as source, target, or both.
The Hive database schema might contain temporary junk tables that are created when you run a mapping. The following sample formats are the junk tables in a Hive database schema:
w1413372528_infa_generatedsource_1_alpha_check
w1413372528_write_employee1_group_cast_alpha_check
Ensure that you do not select any temporary tables when you import data sources.
You can create a Hadoop plan to move data from Hadoop sources, flat files, or relational databases such as Oracle, DB2, ODBC-Sybase, and ODBC-Microsoft SQL Server into Hive or HDFS targets. You can also create a Hadoop plan when you want to move data between Hadoop sources and targets. If the source is HDFS, you can move data to a Hive or an HDFS target. If the source is Hive, you can move data to a Hive or an HDFS target.
To run a Hadoop plan, TDM uses Data Integration Service that is configured for pushdown optimization. When you generate and run the Hadoop plan, TDM generates the mappings and the Data Integration Service pushes the mappings to the Hadoop cluster to improve the performance.
You cannot perform data subset or data generation operations for Hadoop sources and targets.
XSD Data Sources
You can perform data subset, data masking, and data generation operations on XML sources.
Import XML source metadata from the PowerCenter repository. To import an XML source, you import the source definition from the PowerCenter repository. A workflow that you generate and run fails if you import the XML instead of the XML source definition (XSD).
You can view the list of XSD files in a project from the Files tab in the Discover view of the project. Highlight a file to view the properties in the tab below. You can open and view the properties of each file in a separate page.
The following image shows the file page in Test Data Manager:
The file page contains the following tabs:
- Properties
- Displays the name of the XSD file, project information, and a count of the objects with data masking, data generation, or data subset criteria assignments.
- View
- You can view the elements in a file as a map or in a grid. View the elements in the file and assign a data domain. To apply a masking rule to multiple elements, you can mark elements as similar value objects. You cannot perform tasks on all elements in the map or grid view. You can choose to view select objects from a list of options in the Map View. You can filter and search for elements and attributes based on general properties in the Grid View.
- Details
- Apply subset criteria, data masking rules, or data generation rules from the Details tab on the file page.
The Details tab has the following tabs:
- General
- View the properties of an element or attribute. You can add or edit a data domain assignment, and you can mark an attribute or element as restricted or sensitive. When you run a workflow, the PowerCenter Integration Service ignores elements and attributes that you mark as restricted.
- Data Masking
- Assign a policy and data masking rule or directly assign a data masking rule to an element or attribute.
- Data Generation
- Assign a data generation rule to an element or attribute, or create and assign a data generation rule.
- Criteria
- Specify the data subset criteria to apply to an element or attribute.
Editing the Properties of an Element in an XSD File
You can view the general properties of an element or attribute that you select in the map view or grid view. You can edit the properties that are not read-only.
1. Open the project and click the Files tab in the Discover view.
A list of XSD files in the project appears.
2. Select the required file to open the file in a separate page.
3. Select the element or attribute that you want to edit.
The properties of the element or attribute appear on the Details tab.
4. To open the Edit Properties window, click Edit on the General Properties tab.
The Edit Properties window lists the following properties:
Property | Description |
---|
Name | Read-only. Name of the element or attribute. |
XPath | Read-only. The path expression of the element or attribute in the XSD file. |
Data Type | Read-only. The data type of the element or attribute. |
Precision | Read-only. The precision for the element or attribute. The maximum number of digits or the maximum number of characters that the element or attribute can accommodate. |
Data Domain | The data domain assigned to the element or attribute. Select a data domain from the list. |
Restricted | The PowerCenter Integration Service does not write the value of a restricted element or attribute to a target. Select this option to mark the element or attribute as restricted. |
Sensitive | Indicates whether an element or attribute contains sensitive data. Select this option to mark the element or attribute as sensitive. |
5. You can edit the following properties as required:
- - To assign a data domain to the element or attribute, select a data domain from the list of data domains.
- - To mark the attribute or element as restricted, select the Restricted check box. During TDM operations the PowerCenter Integration Service ignores elements and attributes that you mark as restricted.
- - To mark the element or attribute as sensitive, select the Sensitive check box.
6. Click OK to save the changes.
XSD Import Guidelines
Read the following information before you import an XSD file into the PowerCenter repository from PowerCenter Designer:
- •To open the Change XMLViews Creation and Naming Options window, click Advanced Options in the Import XML Definition dialog box. Set the following options for the import:
- - Choose to override all infinite lengths with a low value. The value is the default length if the XSD file does not specify a length. You do not require a high value if the elements are of a small length. A high length value requires more memory and affects the performance. Select the option and enter an override value.
- - If the data contains elements that require a higher precision, manually set the precision of the element to the required value after you import the file. Open the file in PowerCenter Designer and select the element in the Schema Components panel. Edit the length on the Actions tab.
- - Choose to analyze elements and attributes in standalone XML as global declarations. Required in TDM to ensure correct mapping of the namespaces of the different elements and attributes. Import of the XSD in TDM fails if the namespaces of elements and attributes are not mapped correctly.
- - Use the default settings for the other options.
- - Enter a value for the Default length for anyType element mapped to string option.
- •In the XML Definition Creation Options page of the XML Wizard, select Hierarchy Relationships and the Normalized XML Views option. Use this option to create a root and expand the XML components under the root. When you choose normalized XML views, every element or attribute appears once. One-to-many relationships become separate XML views with keys to relate the views. If you create denormalized XML views, all elements and attributes display in one hierarchical group.
PowerExchange Nonrelational Sources
You can perform TDM operations on nonrelational sources that you define in PowerExchange.
You first need to define datamaps in PowerExchange and import them into PowerCenter. Create the policies, projects, and operations in TDM. Import the PowerCenter sources into a project in TDM.
You can perform TDM operations on the following PowerExchange nonrelational sources:
- •Adabas
- •IMS unload
- •Sequential
- •Hierarchical and nonhierarchical VSAM
Note: You cannot import a PowerExchange source directly into the TDM repository.
Importing Data Sources
Import data sources from PowerCenter or from an external database to a project in Test Data Manager.
1. To view a list of the projects in Test Data Manager, click Projects.
2. Click a project in the list to open it.
3. Click Actions > Import Metadata.
The Import Metadata window appears.
4. Choose one of the following options:
- - PowerCenter Repository. Import metadata from PowerCenter.
- - Datasource Connection. Import metadata from a database connection.
5. Choose whether you want to review the metadata changes before you import the data sources. Test Data Manager displays the information before the import runs. You can choose to skip the import option.
Test Data Manager shows you the metadata you are importing and the impact on rules and domain assignments or entities and groups in Test Data Manager.
6. Click Next.
7. Choose one of the following options:
- - If you selected PowerCenter Repository, select the PowerCenter folder that contains the data source you want to import. You can filter folders by the folder name or the description.
- - If you selected Datasource Connection, select the schema to import. You can filter schemas by schema name.
8. Click Next.
9. Select the files or tables that you want to import. You can filter the tables by data source, table name, or table description. If you choose a Hive database and if there are temporary junk tables present in the schema, ensure that you do not select those tables.
10. Click Next.
11. Choose when to import the sources. Choose one of the following options:
- - Import Now. Import the data source immediately. To run the import in the background, select Run import in the background.
- - Schedule Later. Schedule the import to occur at a specific date and time. Click the calendar to select a date. Use the hour and minute sliders to set the time.
12. Click Finish.
If the import is running, view the progress of the import job in the Monitor view. After the job finishes, access the imported metadata through the Data Sources details view.
Identity Columns in TDM
Read the following information before you perform TDM operations on identity columns:
- •You cannot move data from a source DBO schema to a target non-DBO schema on a Sybase database if the data contains an identity column and indexes.
- •If you enable the auto identity option in the database and there are no primary keys or unique keys in a table, Sybase adds the SYB_IDENTITY_COL column. You must mark the column as restricted. A workflow fails if you do not mark the SYB_IDENTITY_COL column as restricted.