Database Ingestion and Replication > Configuring a database ingestion and replication task > Transform the data

Transform the data

You can apply trim transformations to selected tables and columns to remove spaces to the left or right of character column values. You can also define row-level filter rules that contain column conditions to filter out data rows retrieved from source tables before the data is applied to the target. Additionally, Access Management transformations allow you to protect sensitive data by applying filtering and data protections before your data reaches the target.

Note:

If you edit row-level filters in the task for a deployed job, you must Redeploy the job afterwards for the updated filters to take effect.

1On the Transform Data page, select the tables and columns to which you want to assign a transformation.

You can use the Find box to search for items based on table name or column name. Select Table Names or Columns next the Find box and then enter all or part of the name in the box. If found, the string is highlighted in the names of the listed items.

2To add a trim transformation, click Add Transformation (not the down-pointing arrow next to it).

Note:

You can apply trim transformations and row-level filters to the same tables and columns.

The How do you want to transform your data? dialog box appears.

3Click the + (Add a new row) icon to add a row. Then, in the Transformation Type list, select one of the following options:

- Trim Left. Trim spaces to the left of character column values.
- Trim Right. Trim spaces to the right of character column values.
- Trim. Trim spaces to the left of and to the right of character column values.

Click the Save icon to add the entry.

4Click Next to go to the Summary tab where you can review your transformation settings.

5If the settings are correct on the Summary tab, click Save to save them and return to the initial Transform Data page.

6To add another trim transformation type for a different table or set of tables, repeat steps 1 through 5.

When you return to the initial Transform Data page, the list shows the tables to which the trim transformations were applied. For example:

Shows Trim Right and Trim Left transformation have been applied respectively to the DEPARTMENTS and DISCOUNTS tables.

Tip:

You can remove a transformation assignment on the Transform Data page. Select the table with the unwanted transformation and click Clear All.

7To add row-level filters to tables and columns, select the tables and then click the down arrow next to Add Transformation and select Add Row Filter.

The Add Row Filter option is available only for Db2 for i, Db2 for LUW, Db2 for z/OS, Microsoft SQL Server, Oracle, and SAP HANA source tables. The task can use any load type.

The How do you want to filter your data? dialog box appears.

8Select the table and filter type to apply the filter conditions.

aSelect a table that you want to assign a filter to.
bSelect one of the following filter types:

▪ Basic
▪ Advanced

The default option is Basic.

9To add a Basic filter, complete the following substeps:

aClick the + (Add a new row) icon to add a row.
bUnder Column Name, select a column.

Columns with unsupported data types for row filtering are marked as "Not supported."

cUnder Operator, select an operator type to use with the value.
dUnder Value, select or enter a value, depending on the column type. Then click the Save icon on the right end of the row to save the condition.

For example, the following image shows a basic filter with conditions set for two columns: Shows two basic filters for two columns in the DISCOUNTS table.

Shows two basic filters for two columns in the DISCOUNTS table.

The following table describes the values that are valid for each column data type supported for filtering:

Column data type	Description
INTEGER	Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -2147483648 and 2147483647.
LONG	Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.
BIGINT	Enter a numeric value. You can use "+" and "-" only once before the number. Maximum length is 50 digits.
BIGDEC	Enter a numeric value. You can use "+" and "-" only once before the number. A decimal is allowed. Maximum length is 50 digits.
STRING	Enter text.
DATE	Use the date picker to select the date.
TIME	Enter the value in the format HH:MM:SS.MS, with milliseconds being optional and up to a maximum length is 9 digits. For example, 13:14:15.123456789
DATETIME	Use the date picker to select the date and time.
OFFSET_DATETIME	Use the date picker to select the date, time, and time zone.

Note:

Database Ingestion and Replication does not support BOOLEAN, BINARY, BLOB, CLOB, and graphic column data types.

eClick Validate to test the syntax of the specified condition.
fTo add another Basic condition for the same column or a different column in the table, repeat steps a through e.

The AND operator is used to combine the multiple conditions.

gClick Save to validate and save the changes.
hWhen done defining Basic filter conditions, click OK to return to the Transform Data page.

10To define an Advanced filter that consists of multiple conditions combined with the AND or OR operator, manually enter the conditions in the box.

Note:

If you entered a Basic filter conditions for a column and then switched to the Advanced filter, the Basic condition is displayed so that you can add to it to make a more complex filter.

aUnder Column Name, select a column and click the > arrow.

The column name appears in the Filter Condition box.

Note:

For combined load tasks, do not include columns that you expect will be updated during CDC processing. If the column is updated, it might become ineligible for replication and cause unpredictable results. In this case, you'd need to Resync the job.

bIn the Filter Condition box, type one or more conditions for the selected column. Manually enter conditions using the supported syntax and the appropriate operators, which can vary based on the column data type. You can also nest conditions using parentheses. See Syntax for row-level filtering. When done, click the Save icon on the right end of the row to save the advanced filter.

The following table describes the values that are valid for each column data type supported for filtering:

Column data type	Description
INTEGER	Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -2147483648 and 2147483647.
LONG	Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.
BIGINT	Enter a number value. You can use "+" and "-" only once before the number. Maximum length is 50 digits.
BIGDEC	Enter a numeric value. You can use "+" and "-" only once before the number. A decimal is allowed. Maximum length is 50 digits.
STRING	Enter an input attribute in single quotes (').
DATE	Enter the value in the format YYYY-MM-DD. Enter the input attribute in single quotes (').
TIME	Enter the value in the format HH:MM:SS.MS, with milliseconds (MS) being optional and up to a maximum length is 9 digits. Enter the input attribute in single quotes ('). For example, 13:14:15.123456789
DATETIME	Enter the date and time in the following format: YYYY-MM-DDTHH:MM:SS:MS For example, 2024-12-31T03:04:05.123456789 Enter the input attribute in single quotes (').
OFFSET_DATETIME	Enter the date, time, and time zone in the following format: YYYY-MM-DDTHH:MM:SS.MS+05:00 For example, 2024-03-15T10:03:04.123456789+05:00 Enter the input attribute in single quotes (').

Notes:

▪ Database Ingestion and Replication does not support BOOLEAN, BINARY, BLOB, CLOB, and graphic column data types.
▪ All date, time, and datetime values are matched against the source date and time. Time changes between Daylight Saving Time and standard time are not accommodated.

cClick Validate to test the syntax of the specified conditions.

Note:

Switching from Advanced to Basic filter type after creating or editing an Advanced filter condition, deletes all changes to the filter condition, even if you saved it.

dClick Save to validate and save the changes and then click OK to return to the Transform Data page.

Note:

Do not modify any column included in the filter after the task has been deployed. If you do so, the row-level filtering might not work properly.

The Filters column on the Transform Data page shows the applied filters as hyperlinks. Clicking the link opens the selected filter in edit mode. Tables with an advanced filter display Advanced next to their filter conditions in the Filters column.

11To configure Access Management, select the tables to which you want to apply data access policies and then click the down arrow next to Add Transformation and select Access Management.

Note:

The Access Management option is available only for SQL Server source tables and only if a feature flag is set for your organization. If Access Management isn't available, contact Informatica Global Customer Support to request access.

Select the Access Management option to apply data access policices to tables.

The Access Management dialog box appears.

12To add data access policies, configure the following fields:

Field	Description
Consumer	The role of the user accessing the data, defined within Data Governance and Catalog. This role applies to all tables in the database ingestion and replication task.
Usage	Identifies the usage context within which the user intends to use the data. Users create and manage usage contexts in Data Governance and Catalog.
Consistency Seed	Provides a value that represents the rule logic that the transformation applies to the data element classification. A consistency seed ensures that a rule's behavior for a data element classification will be consistent for the same consistency seed value, usage context, and user across all tasks. For example, you can ensure that a rule always replaces the last name "Smith" with "Abcd." If you want consistent tokenization between different Access Managenent transformations, insert the same consistency seed in other Access Managenent transformations. If you don't want consistent tokenization between different Access Managenent transformations, generate a unique seed.
Connection	Defines the connection used to retrieve the related data catalogs and data assets scanned by the Data Governance and Catalog scanners from the database referenced by this connection. By default, this is the same connection used to access the source tables in the configured database ingestion and replication task.
Catalog	Specifies the particular catalog within the collection of catalogs scanned by the Data Governance and Catalog scanners from the database referenced by the connection. This catalog is used to perform matching against the source tables to identify the corresponding data assets. By default, the most recently scanned catalog is selected.

The default data assets are displayed for all the tables you selected for the Access Management transformation in the previous step.

Using the associated connection and catalog, default data assets are retrieved by performing an exact match between the source table names and the cataloged data asset names. If a corresponding data asset is not found, either because the source table was not scanned or the data asset name does not exactly match the source table, the missing asset will be indicated with a red vertical line. You can select the Show Missing Data check box to filter and display only those source tables that do not have a corresponding data asset.

Click OK to return to the Transform Data page.

The Access Management column on the Transform Data page shows the selected data assets against every table as hyperlinks. Clicking the link opens the selected data asset in edit mode.

On the Transform Data page, clicking the Clear All button at the top right hand corner removes all filters, including trim transformations, row-level filters, and access management policies from the selected tables.

13When done, click Next.

Syntax for row-level filtering

If you create Advanced row-level filters when you define a database ingesition and replication task, ensure that you enter the filter conditions using the correct syntax. Otherwise, filter validation is likely to fail.

Operators

In an Advanced filter, you can use the following operators within a condition, depending on the column data type:

Operator	Description
=	Equals
!=	Does not equal
>	Greater than
>=	Greater than or equal to
<	Less than
<=	Less than or equal to
IS NULL	Contains a null
IS NOT NULL	Cannot contain a null
BETWEEN x AND y	Greater than or equal to x and less than or equal to y
NOT BETWEEN %s AND %s	Not greater than or equal to x and less than or equal to y
LIKE	A comparative operative for string columns only. Example: LIKE '%06%7__' . This condition matches against the following values: 06789, A06X789, AB06XY789", "06X789, and A06789. However, it does not match against these values: A06789Z, A0678, A6789, "".
NOT LIKE	A comparative operative for string columns only.
IN	True if the operand is equal to one of a list of expressions
NOT IN	True if the operand is NOT equal to one of a list of expressions
+ - / *	Numeric computation operators for addition, subtraction, division, and multiplication

Syntax rules

In Advanced filters, use with the following syntax rules:

•Enclose string, date, time, datetime, offset_datetime values in single quotation marks. For example: 'SomeName'
•In datetime and offset_datetime values, enter the character "T" between date and time. For example: '2024-11-15T00:00:00.000000001'

Data access policies overview

Understand how data access policies filter data and de-identify it.

Note:

You can use Access Management only if this feature is enabled for your organization. To get access to this feature, contact Informatica Global Customer Support

You create data access policies on the Data Access Management page in Data Governance and Catalog.

Data access policies can replace, transform, or redact values in a data set while maintaining the overall usefulness of the data. A data access policy can protect different values with different mappings, based on factors such as the intended user of the data and metadata classifications that users assign to the source data. Data access policies can help your organization comply with data privacy regulations such as the European General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).

Rules in a data access policy can apply multiple data filters based on the following attribute types:

•Asset term
•Data element term
•Data element classification
•Data entity classification
•Order usage context
•User group

Access Policy transformations can apply the following types of data access policies:

•Data filter policies
•Data de-identification policies

Data filter policies are sets of data filter rules that limit, filter, or otherwise restrict user access to records within a data asset. Data filter rules do this by applying pre-defined filters that control access to rows or records of data.

Data filter rules evaluate data elements based on their data element classification and data type using standard operators compared to specified values. Where the rule criteria is satisfied, a flag is set in an additional filter field for subsequent processing.

Data de-identification policies are sets of data de-identification rules that apply pre-defined data protections to data element classifications. A data element classification is a categorization applied to fields within data assets to indicate the category of data such as birth dates, national identifiers, and postal codes.

Data de-identification rules can apply multiple data de-identification techniques, including the following operations:

•Retaining data
•Redacting all values of a given type such as birth dates
•Replacing specified field values with NULL
•Truncating values such as redacting the first three characters of a postal code
•Replacing values with consistently tokenized values such as always replacing "Smith" with "Abcd" or "1234" with "5678"
•Generalizing date values to the month, year, or decade
•Replacing values with a constant text value such as replacing all passwords with five asterisks
•Substituting values with values stored in a file
•Hashing values with standard algorithms

For more information about data access policies, see Data access policies in the Data Governance and Catalog online help.