Data Engineering Integration
Read this section to learn what's new for Data Engineering Integration in version 10.5.
EXTRACT_STRUCT Function
Effective in 10.5, you can use the EXTRACT_STRUCT function in dynamic expressions to extract all elements from a dynamic struct port in an Expression transformation.
The EXTRACT_STRUCT function flattens dynamic struct ports. The expression for the output ports uses the dot operator to extract elements in the dynamic struct.
For more information, see the Informatica 10.5 Transformation Language Reference.
File Manager for Cloud File Preprocessing
Effective in 10.5, you can perform file preprocessing such as list, copy, rename, move, remove, and watch on cloud ecosystems Microsoft Azure and Amazon AWS.
filemanager Commands
The following table describes the available commands for the filemanager utility:
Commands | Description |
---|
copy | Use the copy command to copy files on an Amazon AWS cloud ecosystem. |
copyfromlocal | Use the copyfromlocal command to copy files from a local system to a cloud ecosystem. |
list | Use the list command to list files on a cloud ecosystem. |
move | Use the move command to move files on a cloud ecosystem. |
removefile | Use the removefile command to delete files from a cloud ecosystem. |
rename | Use the rename command to rename files on a cloud ecosystem. |
watch | Use the watch command to watch files that trigger a file processing event, mapping, or workflow on a cloud ecosystem. |
For more information, see the Informatica 10.5 Command Reference.
Mapping Audits
You can create an audit to validate the consistency and accuracy of data that is processed in a mapping.
An audit is composed of rules and conditions. Use a rule to compute an aggregated value for a single column of data. Use a condition to make comparisons between multiple rules or between a rule and constant values.
You can configure audits for the following mappings that run on the native environment or the Spark engine:
- •Read operations in Amazon S3, JDBC V2, Microsoft Azure SQL Data Warehouse, and Snowflake mappings.
- •Read operations for complex files such as Avro, Parquet, and JSON in HDFS mappings.
- •Read and write operations in Hive and Oracle mappings.
For more information, see the Data Engineering Integration 10.5 User Guide.
Profile on the Databricks Cluster
Effective in version 10.5, you can run profiles on the Databricks cluster.
- Profiling on the Databricks cluster
- You can create and run profiles on the Databricks cluster in the Informatica Developer and Informatica Analyst tools. You can perform data domain discovery and create scorecards on the Databricks Cluster.
For information about the profiles on the Databricks cluster, see Informatica 10.5 Data Discovery Guide.
Sensitive Data Recommendations and Insights by CLAIRE
Effective in version 10.5, CLAIRE artificial intelligence detects sensitive data in mapping sources when Enterprise Data Catalog is configured on the domain.
Recommendations list source columns that contain sensitive data based on data quality rules. You can also add custom types to the sensitive data that CLAIRE detects.
For more information about recommendations and insights, see the Data Engineering Integration User Guide.
Warm Pool Support for Ephemeral Clusters on Databricks
Effective in version 10.5, you can configure ephemeral Databricks clusters with warm pools. A warm pool is a pool of VM instances reserved for ephemeral cluster creation.
When you configure the warm pool instances in the Databricks environment, the instances wait on standby in a running state for ephemeral cluster creation. You can choose to have the instances remain on standby when the ephemeral clusters are terminated.
For more information, see the chapter on cluster workflows in the Data Engineering Integration User Guide.