Big Data Streaming

This section describes new Big Data Streaming features in version 10.2.2.

Azure Event Hubs Data Objects

Effective in version 10.2.2, you can deploy a streaming mapping that has an event hub as a source in the following distributions:

Cross-account IAM Role in Amazon Kinesis Connection

Effective in version 10.2.2, you can use the cross-account IAM role to authenticate an Amazon Kinesis source.

Use the cross-account IAM role to share resources in one AWS account with users in a different AWS account without creating users in each account.

For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.

Intelligent Structure Model

Effective in version 10.2.2, you can use intelligent structure models in Big Data Streaming.

You can incorporate an intelligent structure model in a Kafka, Kinesis, or Azure Event Hubs data object. When you add the data object to a mapping, you can process any input type that the model can parse.

The data object can accept input and parse PDF forms, JSON, Microsoft Excel, Microsoft Word tables, CSV, text, or XML input files, based on the file which you used to create the model.

For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.

Header Ports for Big Data Streaming Data Objects

Effective in version 10.2.2, some data objects contain default header ports that represent metadata associated with events. For example, the timestamp port contains the time at which the event is generated. You can use the header ports to group and process the data.

For more information about the header ports, see the Informatica Big Data Streaming 10.2.2 User Guide.

AWS Credential Profile in Amazon Kinesis Connection

Effective in version 10.2.2, you can use AWS credential profile based authentication in Amazon Kinesis connection.

When you create an Amazon Kinesis connection, you can enter an AWS credential profile name. The mapping accesses the AWS credentials through the profile name listed in the AWS credentials file during run time.

For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.

Spark Structured Streaming

Effective in version 10.2.2, Big Data Streaming uses Spark Structured Streaming to process streaming data.

Spark Structured Streaming is a scalable and fault-tolerant open source stream processing engine built on the Spark engine. It can handle late arrival of streaming events and process streaming data based on source timestamp.

The Spark engine runs the streaming mapping continuously. It reads the data, divides the data into micro batches, processes the micro batches, publishes the results, and then writes to a target.

For more information, see the Informatica Big Data Streaming 10.2.2 User Guide.

Window Transformation

Effective in version 10.2.2, you can use the following features when you create a Window transformation: