User Guide > Projects > Data Sources
  

Data Sources

To perform data subset, masking, and discovery operations, you must import source metadata into the TDM repository. You can import sources from the PowerCenter repository or from a source database. To perform data generation operations, you must import target metadata into the TDM repository.
When you create a project, add one or more sources to the project. You can add more than one type of source to the project. For example, you can add a flat file source and a relational source to the project. You can create constraints to create relationships between the sources and apply filter criteria for data subset and data masking.
Note: If the source data has a numeric column with precision greater than 28, you cannot apply a data masking or data subset condition to that column. The PowerCenter Integration Service passes the row and the column to the target without applying the condition.

TDM Source Definitions

Import source metadata from PowerCenter or directly from the source database to the TDM repository. The TDM project and PowerCenter folders share the same name to ensure that the PowerCenter Integration Service stores workflow information in the correct PowerCenter folder.
You can import source metadata directly from the source if the source is a relational database. If the source contains many tables, you can increase the performance if you import the metadata directly from the source instead of from PowerCenter.
When you import source definitions from the PowerCenter repository into the TDM repository, you can import all the definitions supported by PowerCenter. When you import source metadata from PowerCenter, you import the PowerCenter folder that contains source metadata. Test Data Manager creates a copy of the folder in the TDM repository.
To use flat file sources, you must import flat files from the PowerCenter repository. You cannot create flat file connections in Test Data Manager.
Note: Rowid is a reserved keyword. You cannot import Oracle sources that have rowid as a column name or a table name.

Hadoop Data Sources

You can perform data movement and data masking operations on Hadoop data sources.
You can use the following Hadoop connections: Hive and Hadoop Distributed File System (HDFS). You can create Hive and HDFS connections in Test Data Manager, and import the Hadoop data sources in to a project. In a Hadoop plan, you can select the Hadoop connections as source, target, or both.
The Hive database schema might contain temporary junk tables that are created when you run a mapping. The following sample formats are the junk tables in a Hive database schema:
w1413372528_infa_generatedsource_1_alpha_check
w1413372528_write_employee1_group_cast_alpha_check
Ensure that you do not select any temporary tables when you import data sources.
You can create a Hadoop plan to move data from Hadoop sources, flat files, or relational databases such as Oracle, DB2, ODBC-Sybase, and ODBC-Microsoft SQL Server into Hive or HDFS targets. You can also create a Hadoop plan when you want to move data between Hadoop sources and targets. If the source is HDFS, you can move data to a Hive or an HDFS target. If the source is Hive, you can move data to a Hive or an HDFS target.
To run a Hadoop plan, TDM uses Data Integration Service that is configured for pushdown optimization. When you generate and run the Hadoop plan, TDM generates the mappings and the Data Integration Service pushes the mappings to the Hadoop cluster to improve the performance.
You cannot perform data subset or data generation operations for Hadoop sources and targets.

XSD Data Sources

You can perform data subset, data masking, and data generation operations on XML sources.
Import XML source metadata from the PowerCenter repository. To import an XML source, you import the source definition from the PowerCenter repository. A workflow that you generate and run fails if you import the XML instead of the XML source definition (XSD).
You can view the list of XSD files in a project from the Files tab in the Discover view of the project. Highlight a file to view the properties in the tab below. You can open and view the properties of each file in a separate page.
The following image shows the file page in Test Data Manager:
The XSD file page contains a Properties tab, Map or Grid view, and a Details tab. You can edit the properties and apply data subset criteria, data masking rules, and data generation rules from the Details tab.
The file page contains the following tabs:
Properties
Displays the name of the XSD file, project information, and a count of the objects with data masking, data generation, or data subset criteria assignments.
View
You can view the elements in a file as a map or in a grid. View the elements in the file and assign a data domain. To apply a masking rule to multiple elements, you can mark elements as similar value objects. You cannot perform tasks on all elements in the map or grid view. You can choose to view select objects from a list of options in the Map View. You can filter and search for elements and attributes based on general properties in the Grid View.
Details
Apply subset criteria, data masking rules, or data generation rules from the Details tab on the file page.
The Details tab has the following tabs:
General
View the properties of an element or attribute. You can add or edit a data domain assignment, and you can mark an attribute or element as restricted or sensitive. When you run a workflow, the PowerCenter Integration Service ignores elements and attributes that you mark as restricted.
Data Masking
Assign a policy and data masking rule or directly assign a data masking rule to an element or attribute.
Data Generation
Assign a data generation rule to an element or attribute, or create and assign a data generation rule.
Criteria
Specify the data subset criteria to apply to an element or attribute.

Editing the Properties of an Element in an XSD File

You can view the general properties of an element or attribute that you select in the map view or grid view. You can edit the properties that are not read-only.
    1. Open the project and click the Files tab in the Discover view.
    A list of XSD files in the project appears.
    2. Select the required file to open the file in a separate page.
    3. Select the element or attribute that you want to edit.
    The properties of the element or attribute appear on the Details tab.
    4. To open the Edit Properties window, click Edit on the General Properties tab.
    The Edit Properties window lists the following properties:
    Property
    Description
    Name
    Read-only. Name of the element or attribute.
    XPath
    Read-only. The path expression of the element or attribute in the XSD file.
    Data Type
    Read-only. The data type of the element or attribute.
    Precision
    Read-only. The precision for the element or attribute. The maximum number of digits or the maximum number of characters that the element or attribute can accommodate.
    Data Domain
    The data domain assigned to the element or attribute. Select a data domain from the list.
    Restricted
    The PowerCenter Integration Service does not write the value of a restricted element or attribute to a target. Select this option to mark the element or attribute as restricted.
    Sensitive
    Indicates whether an element or attribute contains sensitive data. Select this option to mark the element or attribute as sensitive.
    5. You can edit the following properties as required:
    6. Click OK to save the changes.

XSD Import Guidelines

Read the following information before you import an XSD file into the PowerCenter repository from PowerCenter Designer:

PowerExchange Nonrelational Sources

You can perform TDM operations on nonrelational sources that you define in PowerExchange.
You first need to define datamaps in PowerExchange and import them into PowerCenter. Create the policies, projects, and operations in TDM. Import the PowerCenter sources into a project in TDM.
You can perform TDM operations on the following PowerExchange nonrelational sources:
Note: You cannot import a PowerExchange source directly into the TDM repository.

Importing Data Sources

Import data sources from PowerCenter or from an external database to a project in Test Data Manager.
    1. To view a list of the projects in Test Data Manager, click Projects.
    2. Click a project in the list to open it.
    3. Click Actions > Import Metadata.
    The Import Metadata window appears.
    4. Choose one of the following options:
    5. Choose whether you want to review the metadata changes before you import the data sources. Test Data Manager displays the information before the import runs. You can choose to skip the import option.
    Test Data Manager shows you the metadata you are importing and the impact on rules and domain assignments or entities and groups in Test Data Manager.
    6. Click Next.
    7. Choose one of the following options:
    8. Click Next.
    9. Select the files or tables that you want to import. You can filter the tables by data source, table name, or table description. If you choose a Hive database and if there are temporary junk tables present in the schema, ensure that you do not select those tables.
    10. Click Next.
    11. Choose when to import the sources. Choose one of the following options:
    12. Click Finish.
    If the import is running, view the progress of the import job in the Monitor view. After the job finishes, access the imported metadata through the Data Sources details view.

Identity Columns in TDM

Read the following information before you perform TDM operations on identity columns: