Big Data Streaming
This section describes new Big Data Streaming features in version 10.2.2.
Azure Event Hubs Data Objects
Effective in version 10.2.2, you can deploy a streaming mapping that has an event hub as a source in the following distributions:
- •Amazon EMR
- •Azure HDInsight with ADLS storage
- •Cloudera CDH
- •Hortonworks HDP
Cross-account IAM Role in Amazon Kinesis Connection
Effective in version 10.2.2, you can use the cross-account IAM role to authenticate an Amazon Kinesis source.
Use the cross-account IAM role to share resources in one AWS account with users in a different AWS account without creating users in each account.
For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.
Intelligent Structure Model
Effective in version 10.2.2, you can use intelligent structure models in Big Data Streaming.
You can incorporate an intelligent structure model in a Kafka, Kinesis, or Azure Event Hubs data object. When you add the data object to a mapping, you can process any input type that the model can parse.
The data object can accept input and parse PDF forms, JSON, Microsoft Excel, Microsoft Word tables, CSV, text, or XML input files, based on the file which you used to create the model.
For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.
Header Ports for Big Data Streaming Data Objects
Effective in version 10.2.2, some data objects contain default header ports that represent metadata associated with events. For example, the timestamp port contains the time at which the event is generated. You can use the header ports to group and process the data.
For more information about the header ports, see the Informatica Big Data Streaming 10.2.2 User Guide.
AWS Credential Profile in Amazon Kinesis Connection
Effective in version 10.2.2, you can use AWS credential profile based authentication in Amazon Kinesis connection.
When you create an Amazon Kinesis connection, you can enter an AWS credential profile name. The mapping accesses the AWS credentials through the profile name listed in the AWS credentials file during run time.
For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.
Spark Structured Streaming
Effective in version 10.2.2, Big Data Streaming uses Spark Structured Streaming to process streaming data.
Spark Structured Streaming is a scalable and fault-tolerant open source stream processing engine built on the Spark engine. It can handle late arrival of streaming events and process streaming data based on source timestamp.
The Spark engine runs the streaming mapping continuously. It reads the data, divides the data into micro batches, processes the micro batches, publishes the results, and then writes to a target.
For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.
Window Transformation
Effective in version 10.2.2, you can use the following features when you create a Window transformation:
- Watermark Delay
- The watermark delay defines threshold time for a delayed event to be accumulated into a data group.
- Watermark delay is a threshold where you can specify the duration at which late arriving data can be grouped and processed. If an event data arrives within the threshold time, the data is processed, and the data is accumulated into the corresponding data group.
- Window Port
- The window port specifies the column that contains the timestamp values based on which you can group the events. The accumulated data contains the timestamp value. Use the Window Port column to group the event time data that arrives late.
For more information, see Informatica Big Data Streaming 10.2.2 User Guide.