Data Engineering Streaming
This section describes new Data Engineering Streaming features in version 10.4.0.
Confluent Schema Registry in Streaming Mappings
Effective in version 10.4.0, you can use Confluent Kafka as sources and targets in streaming mappings using schema registry.
You can use Confluent Kafka to store and retrieve Apache Avro schemas in streaming mappings. Schema registry uses Kafka as its underlying storage mechanism.
For more information, see the Data Engineering Streaming 10.4.0 User Guide.
Data Quality Transformations in Streaming Mappings
Effective in version 10.4.0, you can use data quality transformations in streaming mappings.
You can use the following data quality transformations in streaming mappings to apply the data quality process on the streaming data:
- •Address Validator transformation
- •Classifier transformation
- •Parser transformation
- •Standardizer transformation
For more information, see the Data Engineering Streaming 10.4.0 User Guide.
Ephemeral Cluster in Streaming Mappings
Effective in version 10.4.0, you can run a workflow to create an ephemeral cluster that runs mapping and other tasks on a cloud platform cluster.
To resume data process from the point in which a cluster is deleted, you can run streaming mappings on ephemeral cluster by specifying an external storage and a checkpoint directory.
For more information, see the Data Engineering Streaming 10.4.0 User Guide.
FileName Port in Amazon S3
Effective in version 10.4.0, when you create a data object write operation for Amazon S3 files, the FileName port appears by default.
At run time, the Data Integration Service creates separate directories for each value in the FileName port and adds the target files within the directories.
For more information, see the Data Engineering Streaming 10.4.0 User Guide.
Microsoft Azure Data Lake Storage Gen2
Effective in version 10.4.0, you can use Microsoft Azure Data Lake Storage Gen2 as a target in streaming mappings.
Azure Data Lake Storage Gen2 is built on Azure Blob Storage. Azure Data Lake Storage Gen2 has the capabilities of both Azure Data Lake Storage Gen1 and Azure Blob Storage. You can use Azure Databricks version 5.4 or Azure HDInsight version 4.0 to access the data stored in Azure Data Lake Storage Gen2.
For more information, see the Data Engineering Streaming 10.4.0 User Guide.
Streaming Mappings in Azure Databricks
Effective in version 10.4.0, you can run streaming mappings in Azure Databricks service in Microsoft Azure cloud services.
- Sources and Targets
You can run streaming mappings against the following sources and targets within the Databricks environment:
- - Microsoft Azure Event Hubs
- - Azure Data Lake Storage Gen2 (ADLS Gen2)
- Transformations
You can add the following transformations to a Databricks streaming mapping:
- - Aggregator
- - Expression
- - Filter
- - Joiner
- - Normalizer
- - Rank
- - Router
- - Union
- - Window
- Data Types
- The following data types are supported:
- - Array
- - Bigint
- - Date/time
- - Decimal
- - Double
- - Integer
- - Map
- - Struct
- - Text
- - String
- Workflows
You can develop cluster workflows to create ephemeral clusters in the Databricks environment. Use Azure Data Lake Storage Gen1 (ADLS Gen1) and Azure Data Lake Storage Gen2 (ADLS Gen2) to create ephemeral clusters in the Databricks environment.
For more information about streaming mappings in Azure Databricks, see the Data Engineering Streaming 10.4.0 User Guide.
Dynamic Mappings in Data Engineering Streaming
Effective in version 10.4.0, dynamic mapping support in Data Engineering Streaming is available for technical preview.
You can use Confluent Kafka data objects as dynamic sources and targets in a streaming mapping.
Technical preview functionality is supported for evaluation purposes but is unwarranted and is not production-ready. Informatica recommends that you use in non-production environments only. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support.