Customized Data Objects
Customized data objects are physical data objects with one or more relational resources. Create a customized data object if you want to perform operations such as joining data, filtering rows, sorting ports, or running custom queries when the Data Integration Service reads source data. You can reuse a customized data object in a mapping, mapplet, or profile.
You can create customized data objects in projects and folders. You cannot change the customized data object from within a mapping, mapplet, or profile. If you change a customized data object in a project or folder, the Developer tool updates the object in all mappings, mapplets, and profiles that use the object.
The following figure shows a sample customized data object that is open in the editor:
Create a customized data object to perform the following tasks:
- •Create a custom query to replace the default query that the Data Integration Service runs to read the source data. The default query is a SELECT statement that references each column that the Data Integration Service reads from the source.
- •Define parameters for the data object. You can define and assign parameters in a customized data object to represent connections. You can define parameters for the connection name, table owner, and table name. When you run a mapping that uses the customized data object, you can define different values for the connection parameters at runtime.
- •Join source data that originates from the same source database. You can join multiple tables with primary key-foreign key relationships whether or not the relationships exist in the database.
- •Retain key relationships when you synchronize the object with the sources. If you create a customized data object that contains multiple tables, and you define key relationships that do not exist in the database, you can retain the relationships when you synchronize the data object.
- •Select distinct values from the source. If you choose Select Distinct, the Data Integration Service adds a SELECT DISTINCT statement to the default SQL query.
- •Filter rows when the Data Integration Service reads source data. If you include a filter condition, the Data Integration Service adds a WHERE clause to the default query.
- •Specify sorted ports. If you specify a number for sorted ports, the Data Integration Service adds an ORDER BY clause to the default SQL query.
- •Specify an outer join instead of the default inner join. If you include a user-defined join, the Data Integration Service replaces the join information specified by the metadata in the SQL query.
- •Add pre- and post-mapping SQL commands. The Data Integration Service runs pre-mapping SQL commands against the source database before it reads the source. It runs post-mapping SQL commands against the source database after it writes to the target.
You can create customized data objects from the following types of connections and objects:
- •DB2 i5/OS connections
- •DB2 z/OS connections
- •IBM DB2 connections
- •JDBC connections
- •Microsoft SQL Server connections
- •ODBC connections
- •Oracle connections
- •Relational data objects
You can also add sources to a customized data object through a custom SQL query.
Key Relationships
You can create key relationships between sources in a customized data object when the sources are relational resources. Key relationships allow you to join the sources within the customized data object.
Note: If a customized data object uses relational data objects as sources, you cannot create key relationships within the customized data object. You must create key relationships between the relational data objects instead.
When you import relational resources into a customized data object, the Developer tool retains the primary key information defined in the database. When you import related relational resources into a customized data object at the same time, the Developer tool also retains key relationship information. However, if you import related relational resources separately, you must re-create the key relationships after you import the objects into the customized data object.
When key relationships exist between sources in a customized data object, the Data Integration Service joins the sources based on the related keys in each source. The default join is an inner equijoin that uses the following syntax in the WHERE clause:
Source1.column_name = Source2.column_name
You can override the default join by entering a user-defined join or by creating a custom query.
To create key relationships in a customized data object, first create a primary key in the referenced source transformation. Then create the relationship in the source transformation that contains the foreign key.
The key relationships that you create exist in the customized data object metadata. You do not need to alter the source relational resources.
Customized Data Object Write Properties
The Data Integration Service uses write properties when it writes data to relational resources. To edit write properties, select the Input transformation in the Write view, and then select the Advanced properties.
The following table describes the write properties that you configure for customized data objects:
Property | Description |
---|
Truncate Hive Target Partition | Overwrites the partition in the Hive target in which the data is being inserted. To enable this option, you must also select the option to truncate target tables. You can truncate external partitioned tables and non-partitioned tables. Default is disabled. |
Load type | Type of target loading. Select Normal or Bulk. If you select Normal, the Data Integration Service loads targets normally. You can choose Bulk when you load to DB2, Sybase, Oracle, or Microsoft SQL Server. If you specify Bulk for other database types, the Data Integration Service reverts to a normal load. Bulk loading can increase mapping performance, but it limits the ability to recover because no database logging occurs. Choose Normal mode if the mapping contains an Update Strategy transformation. If you choose Normal and the Microsoft SQL Server target name includes spaces, configure the following environment SQL in the connection object: SET QUOTED_IDENTIFIER ON |
Update override | Overrides the default UPDATE statement for the target. |
Delete | Deletes all rows flagged for delete. Default is enabled. |
Insert | Inserts all rows flagged for insert. Default is enabled. |
Create or replace table at run time | The Data Integration Service drops the target table at run time and replaces it with a table based on a target table that you identify. |
Truncate target table | Truncates the target before it loads data. Default is disabled. |
Update strategy | Update strategy for existing rows. You can select one of the following strategies: - - Update as update. The Data Integration Service updates all rows flagged for update.
- - Update as insert. The Data Integration Service inserts all rows flagged for update. You must also select the Insert target option.
- - Update else insert. The Data Integration Service updates rows flagged for update if they exist in the target and then inserts any remaining rows marked for insert. You must also select the Insert target option.
|
PreSQL | SQL command the Data Integration Service runs against the target database before it reads the source. The Developer tool does not validate the SQL. |
PostSQL | SQL command the Data Integration Service runs against the target database after it writes to the target. The Developer tool does not validate the SQL. |
Creating a Customized Data Object
Create a customized data object to add to a mapping, mapplet, or profile. After you create a customized data object, add sources to it.
1. Select a project or folder in the Object Explorer view.
2. Click File > New > Data Object.
The New dialog box appears.
3. Select Relational Data Object and click Next.
The New Relational Data Object dialog box appears.
4. Click Browse next to the Connection option and select a connection to the database.
5. Click Create customized data object.
6. Enter a name for the customized data object.
7. Click Browse next to the Location option and select the project where you want to create the customized data object.
8. Click Finish.
The customized data object appears under Physical Data Objects in the project or folder in the Object Explorer view.
Add sources to the customized data object. You can add relational resources or relational data objects as sources. You can also use a custom SQL query to add sources.
Adding Relational Resources to a Customized Data Object
After you create a customized data object, add sources to it. You can use relational resources as sources.
Before you add relational resources to a customized data object, you must configure a connection to the database.
1. In the Connection Explorer view, select one or more relational resources in the same relational connection.
2. Right-click in the Connection Explorer view and select Add to project.
The Add to Project dialog box appears.
3. Select Add as related resource(s) to existing customized data object and click OK.
The Add to Data Object dialog box appears.
4. Select the customized data object and click OK.
5. If you add multiple resources to the customized data object, the Developer tool prompts you to select the resource to write to. Select the resource and click OK.
If you use the customized data object in a mapping as a write transformation, the Developer tool writes data to this resource.
The Developer tool adds the resources to the customized data object.
Adding Relational Data Objects to a Customized Data Object
After you create a customized data object, add sources to it. You can use relational data objects as sources.
1. Open the customized data object.
2. Select the Read view.
3. In the Object Explorer view, select one or more relational data objects in the same relational connection.
4. Drag the objects from the Object Explorer view to the customized data object Read view.
5. If you add multiple relational data objects to the customized data object, the Developer tool prompts you to select the object to write to. Select the object and click OK.
If you use the customized data object in a mapping as a write transformation, the Developer tool writes data to this relational data object.
The Developer tool adds the relational data objects to the customized data object.
Creating Keys in a Customized Data Object
Create key columns to identify each row in a source transformation. You can create one primary key in each source transformation.
1. Open the customized data object.
2. Select the Read view.
3. Select the source transformation where you want to create a key.
The source must be a relational resource, not a relational data object. If the source is a relational data object, you must create keys in the relational data object.
4. Select the Keys properties.
5. Click Add.
The New Key dialog box appears.
6. Enter a key name.
7. If the key is a primary key, select Primary Key.
8. Select the key columns.
9. Click OK.
10. Save the customized data object.
Creating Relationships within a Customized Data Object
You can create key relationships between sources in a customized data object.
The source transformation that you reference must have a primary key.
1. Open the customized data object.
2. Select the Read view.
3. Select the source transformation where you want to create a foreign key.
The source must be a relational resource, not a relational data object. If the source is a relational data object, you must create relationships in the relational data object.
4. Select the Relationships properties.
5. Click Add.
The New Relationship dialog box appears.
6. Enter a name for the foreign key.
7. Select a primary key from the referenced source transformation.
8. Click OK.
9. In the Relationships properties, select the foreign key columns.
10. Save the customized data object.