Data Engineering Streaming

Cassandra is an open source, NoSQL database that is highly scalable and provides high availability. You can use Cassandra to store large amounts of data spread across data centers or when your applications require high write access speed.

For more information, see the Data Engineering Streaming 10.5 User Guide.

DataProc

Effective in version 10.5, Data Engineering Streaming supports DataProc for cluster configuration.

Google Cloud Storage Target on Google Dataproc

Effective in version 10.5, you can use a Google Cloud Storage as a target in streaming mappings to run on a Google Dataproc cluster.

Google Dataproc is a lightweight implementation of Hadoop and Apache Spark on the Google cloud platform. When you integrate Informatica Data Engineering Streaming with Dataproc, you configure an on-premises Informatica domain to run jobs on the Dataproc cloud cluster. You must configure the Dataproc cluster before you integrate the cluster with Data Engineering Streaming.

For more information, see the Data Engineering Streaming 10.5 User Guide.

Google PubSub

Effective in version 10.5, you can use Google PubSub as a source in streaming mappings.

Use a Google PubSub source to read messages from the configured Google Cloud PubSub subscription.

Google PubSub is an asynchronous messaging service that decouples services that produce events from services that process events. You can use Google PubSub as a messaging-oriented middleware or for event ingestion and delivery for streaming analytics pipelines. Google PubSub offers durable message storage and real-time message delivery with high availability and consistent performance at scale. You can run Google PubSub servers in all the available Google Cloud regions around the world.

For more information, see the Data Engineering Streaming 10.5 User Guide.

High Precision Data Types

Effective in version 10.5, you can enable high-precision mode in streaming mappings. The Spark engine can process decimal values with up to 38 digits of precision.

In high-precision mode, the Spark engine supports decimal data types with precision up to 38 digits and a maximum scale of 38. The scale must be less than the precision.

For more information, see Data Engineering Streaming 10.5 User Guide.

Kudu

Effective in version 10.5, you can use Kudu as a target in streaming mappings.

Kudu is a columnar storage manager developed for the Apache Hadoop platform. You can use Kudu to store data in tables. Kudu has a simple data model, where the Kudu table has a primary key made up of one or more columns, each with a defined type. Kudu tables have a columnar structure which helps to vectorize and compress data easily. Use Kudu to perform real-time analytics on fast data. You can use Kudu for fast data search, updates and inserts.

For more information, see the Data Engineering Streaming 10.5 User Guide.

Python Transformation in Databricks

Effective in version 10.5, you can add Python transformation to streaming mappings in the Databricks environment on the AWS or Azure platforms.