Data Engineering Streaming
This section describes new Data Engineering Streaming features in version 10.4.1.
FileName Port for ADLS Gen2
Effective in version 10.4.1, when you create a data object write operation for ADLS Gen2, the FileName port appears by default.
At run time, the Data Integration Service creates separate directories for each value in the FileName port and adds the target files within the directories. You can use the file name port in ADLS Gen2 target to ingest CDC data from PWX CDC Publisher.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
Ingest CDC Data from Multiple Kafka Topics
Effective in 10.4.1, you can ingest CDC data from the PWX CDC Publisher from multiple Kafka topics onto Data Engineering systems in one or more mappings.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
JDBC V2 Lookup Transformation
Effective in version 10.4.1, you can use a JDBC data object read operation to look up data in a JDBC V2 table.
You can add a JDBC V2 data object read operation as a lookup in a mapping. You can then configure a lookup condition to look up data from the JDBC V2 table. You can run this mapping on a Databricks engine.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
Parquet Data Format for Complex Targets
Effective in version 10.4.1, you can use the Parquet data format for complex targets.
You can use the Parquet data format for complex targets such as HDFS, ADLS Gen2, and Amazon S3 in the streaming mappings.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
Rollover Parameters in Amazon S3 and ADLS Gen2 Targets
Effective in 10.4.1, you can use different rollover parameters for Amazon S3 and ADLS Gen2 targets to decide the rollover time or size for each target.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
Sources and Targets in Databricks
Effective in version 10.4.1, you can use the Kafka and Confluent Kafka as sources and targets in streaming mappings in a Databricks environment.
You can run the streaming mappings in Databricks environment in both AWS cloud ecosystems and Microsoft Azure cloud services.
For more information, see the Data Engineering Streaming 10.4.1 User Guide.
Streaming Mappings in AWS Databricks
Effective in version 10.4.1, you can run streaming mappings in AWS Databricks service in AWS cloud ecosystems.
You can use AWS Databricks to run mappings with the following functionality:
Sources and Targets
You can run streaming mappings against the following sources and targets within the Databricks environment:
- •Amazon S3
- •Kinesis Streams
- •Kinesis Firehose
Transformations
You can add the following transformations to a Databricks streaming mapping in AWS:
- •Aggregator
- •Expression
- •Filter
- •Joiner
- •Normalizer
- •Rank
- •Router
- •Union
- •Window
Data Types
AWS Databricks supports the same data types as Azure Databricks.
The following data types are supported:
- •Array
- •Bigint
- •Date/time
- •Decimal
- •Double
- •Integer
- •Map
- •Struct
- •Text
- •String
For more information, see the following Data Engineering Streaming 10.4.1 User Guide.