Enterprise Data Lake

Active rules are mapplets developed using the Developer tool. You can use active rules to apply complex transformations such as aggregator and Data Quality transformations to worksheets for matching and consolidation.

An active rule uses all rows within a data set as input. You can select multiple worksheets to use as inputs to the rule. The application adds a worksheet containing the rule output to the project.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Delete Duplicate Rows

Effective in version 10.2.2, you can delete rows containing duplicate values from a worksheet.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Cluster and Categorize Column Data

Effective in version 10.2.2, you can cluster similar values in a column, and then categorize the values based on recommendations from Enterprise Data Lake. The application uses a phonetic algorithm to cluster similar values, and then suggests that you replace the less frequently occurring values with the most frequently occurring value.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

CLAIRE-based Recommendations

Effective in version 10.2.2, the application uses the embedded CLAIRE machine learning discovery engine to provide recommendations when you prepare data.

When you view the Project page, the application displays alternate and additional recommendations derived from upstream data sources based on data lineage, as well as documented primary-foreign key relationships.

When you select a column in a worksheet during data preparation, the application displays suggestions to improve the data based on the column data type in the Column Overview panel.

When you perform a join operation on two worksheets, the application utilizes primary-foreign key relationships to indicate incompatible sampling when low overlap for desired key pairs occurs.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Conditional Aggregation

Effective in 10.2.2, you can use AND and OR logic to apply multiple conditions on IF calculations that you use when you create an aggregate worksheet in a project.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Data Masking

Effective in version 10.2.2, Enterprise Data Lake integrates with Informatica Dynamic Data Masking, a data security product, to enable masking of sensitive data in data assets.

To enable data masking in Enterprise Data Lake, you configure the Dynamic Data Masking Server to apply masking rules to data assets in the data lake. You also configure the Informatica domain to enable Enterprise Data Lake to connect to the Dynamic Data Masking Server.

Dynamic Data Masking intercepts requests sent to the data lake from Enterprise Data Lake, and applies the masking rules to columns in the requested asset. When Enterprise Data Lake users view or perform operations on columns containing masked data, the actual data is fully or partially obfuscated based on the masking rules applied.

For more information, see the "Masking Sensitive Data" chapter in the Informatica 10.2.2 Enterprise Data Lake Administrator Guide.

Localization

Effective in version 10.2.2, the user interface supports Japanese. You can also use non-Latin characters in project names and descriptions.

Partitioned Sources and Targets

Effective in version 10.2.2, Enterprise Data Lake can read data from partitioned sources during import, publish, or copy operations. The application can also append data to partitioned targets in the data lake during import, publish, copy, or upload operations.

Add Comments to Recipe Steps

Effective in version 10.2.2, you can add a comment to a recipe step. Use comments to improve collaboration and provide details to meet auditing requirements.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Save a Recipe as a Mapping

Effective in version 10.2.2, you can save a recipe as a mapping, instead of publishing the recipe and creating a new output table.

You can save the mapping to the Model repository associated with the Enterprise Data Lake Service, or you can save the mapping to an .xml file. Developers can use the Developer tool to review and modify the mapping, and then execute the mapping when appropriate based on system resource availability.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Amazon S3, ADLS, WASB, MapR-FS as Data Sources

Effective in version 10.2.2, you can prepare data in files stored in the following data sources:

You must create a resource in Enterprise Data Catalog for each data source containing data that you want to prepare. A resource is a repository object that represents an external data source or metadata repository. Scanners attached to a resource extract metadata from the resource and store the metadata in Enterprise Data Catalog.

For more information about creating resources in Enterprise Data Catalog, see the "Managing Resources" chapter in the Informatica 10.2.2 Catalog Administrator Guide.

Statistical Functions

Effective in version 10.2.2, you can apply the following statistical functions to columns in a worksheet when you prepare data:

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Date and Time Functions

Effective in version 10.2.2, you can apply the following date and time functions to columns in a worksheet when you prepare data:

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Math Functions

Effective in version 10.2.2, you can apply the following math functions to columns when you prepare data:

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Text Functions

Effective in version 10.2.2, you can apply the following text functions to columns when you prepare data:

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Window Functions

Effective in version 10.2.2, you can use window functions to perform operations on groups of rows within a worksheet. The group of rows on which a function operates is called a window, which you define with a partition key, an order by key, and optional offsets. A window function calculates a return value for every input row within the context of the window.

You can use window functions to perform the following tasks:

You can apply multiple window functions to a worksheet. For example, you might apply a function to calculate the sum of values for each row following the current row within a window, and then apply another function to calculate the average of the same values.

Enterprise Data Lake adds a column containing the results of each function you apply to the worksheet.

For more information, see the "Prepare Data" chapter in the Informatica 10.2.2 Enterprise Data Lake User Guide.

Purge Audit Events

Effective in version 10.2.2, you can run the infacmd edl purgeevents command to delete user activity events from the audit history database. You can optionally run the command to delete project history events from the database.

Spark Execution Engine

Effective in version 10.2.2, Enterprise Data Lake uses the Spark engine for high resource consumption activities such as asset publication, and to run active rule mapplets that use the Python transformation. Using the Spark engine for high resource consumption activities provides better performance, and enables an Enterprise Data Lake deployment on Amazon Elastic MapReduce (EMR) to take advantage of autoscaling.

Enterprise Data Lake

Apply Active Rules

Delete Duplicate Rows

Cluster and Categorize Column Data

CLAIRE-based Recommendations

Conditional Aggregation

Data Masking

Localization

Partitioned Sources and Targets

Add Comments to Recipe Steps

Save a Recipe as a Mapping

Amazon S3, ADLS, WASB, MapR-FS as Data Sources

Statistical Functions

Date and Time Functions

Math Functions

Text Functions

Window Functions

Purge Audit Events

Spark Execution Engine