Enterprise Data Lake
This section describes new Enterprise Data Lake features in version 10.2.1.
Column Data
Effective in version 10.2.1, you can use the following features when you work with columns in worksheets:
- •You can categorize or group related values in a column into categories to make analysis easier.
- •You can view the source of the data for a selected column in a worksheet. You might want to view the source of the data in a column to help you troubleshoot an issue.
- •You can revert types or data domains inferred during sampling on columns to the source type. You might want to revert an inferred type or data domain to the source type if you want to use the column data in a formula.
For more information, see the "Prepare Data" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.
Manage Data Lake Resources
Effective in version 10.2.1, you can use the Enterprise Data Lake application to add and delete Enterprise Data Catalog resources. Catalog resources represent the external data sources and metadata repositories from which scanners extract metadata that can be used in the data lake.
For more information, see the "Managing the Data Lake" chapter in the Informatica 10.2.1 Enterprise Data Lake Administrator Guide.
Data Preparation Operations
Effective in version 10.2.1, you can perform the following operations during data preparation:
- Pivot Data
- You can use the pivot operation to reshape the data in selected columns in a worksheet into a summarized format. The pivot operation enables you to group and aggregate data for analysis, such as summarizing the average price of single family homes sold in each city for the first six months of the year.
- Unpivot Data
- You can use the unpivot operation to transform columns in a worksheet into rows containing the column data in key value format. The unpivot operation is useful when you want to aggregate data in a worksheet into rows based on keys and corresponding values.
- Apply One Hot Encoding
- You can use the one hot encoding operation to determine the existence of a string value in a selected column within each row in a worksheet. You might use the one hot encoding operation to convert categorical values in a worksheet to numeric values required by machine learning algorithms.
For more information, see the "Prepare Data" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.
Prepare JSON Files
Effective in version 10.2.1, you can sample the hierarchal data in JavaScript Object Notation Lines (JSONL) files you add to your project as the first step in data preparation. Enterprise Data Lake converts the JSON file structure into a flat structure, and presents the data in a worksheet that you use to sample the data.
For more information, see the "Prepare Data" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.
Recipe Steps
Effective in version 10.2.1, you can use the following features when you work with recipes in worksheets:
- •You can reuse recipe steps created in a worksheet, including steps that contain complex formulas or rule definitions. You can reuse recipe steps within the same worksheet or in a different worksheet, including a worksheet in another project. You can copy and reuse selected steps from a recipe, or you can reuse the entire recipe.
- •You can insert a step at any position in a recipe.
- •You can add a filter or modify a filter applied to a recipe step.
For more information, see the "Prepare Data" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.
Schedule Export, Import, and Publish Activities
Effective in version 10.2.1, you can schedule the exporting, importing, and publishing of data assets. Scheduling an activity enables you to import, export or publish updated data assets on a recurring basis.
When you schedule an activity, you can create a new schedule, or you can select an existing schedule. You can use schedules created by other users, and other users can use schedules that you create.
For more information, see the "Scheduling Export, Import, and Publish Activities" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.
Security Assertion Markup Language Authentication
Effective in version 10.2.1, the Enterprise Data Lake application supports Security Assertion Markup Language (SAML) authentication.
For more information on configuring SAML authentication, see the Informatica 10.2.1 Security Guide.
View Project Flows and Project History
Effective in version 10.2.1, you can view project flow diagrams and review the activities performed within a project.
You can view a flow diagram that shows you how worksheets in a project are related and how they are derived. The diagram is especially useful when you work on a complex project that contains numerous worksheets and includes numerous assets.
You can also review the complete history of the activities performed within a project, including activities performed on worksheets within the project. Viewing the project history might help you determine the root cause of issues within the project.
For more information, see the "Create and Manage Projects" chapter in the Informatica 10.2.1 Enterprise Data Lake User Guide.