Streaming Ingestion and Replication sources

You can ingest high volume, real-time data from supported streaming sources to on-premises and cloud targets that Streaming Ingestion and Replication supports. You can ingest data in the form of events or messages.

You can use the following data sources in a streaming ingestion and replication task:

To determine the connectors to use for these source types, see Connectors and Connections > Streaming Ingestion and Replication connectors.

Amazon Kinesis Streams sources

Use a Kinesis Streams source to read data from an Amazon Kinesis Stream. To create a Kinesis Streams source connection, use the Kinesis connection type.

Kinesis Streams is a real-time data stream processing service that Amazon Kinesis offers within the AWS ecosystem. Kinesis Streams is a customizable option that you can use to build custom applications to process and analyze streaming data. As Kinesis Streams cannot automatically scale to meet data in-flow demand, you must manually provision enough capacity to meet system needs.

Before you use a Kinesis stream source, perform the following tasks:

Streaming Ingestion and Replication does not support profile based and cross account authentication. Amazon Web Services credentials used for Amazon Kinesis must have permissions to access to Amazon DynamoDB and Amazon CloudWatch services.

AMQP sources

Use an Advanced Message Queuing Protocol (AMQP) source to read messages from an AMQP message queue. To create an AMQP source connection, use the AMQP connection type.

AMQP is a message-oriented standard with queuing, routing, reliability and security features. AMQP is a wire-level, platform-agnostic protocol that you can use to facilitate business transactions by passing real-time message streams.

Using the AMQP connector, you can read messages from AMQP brokers, monitor a message queue, and handle subscribe patterns for brokered messaging. The streaming ingestion and replication task uses RabbitMQ as the AMQP broker. RabbitMQ is a distributed message broker system that is fast, scalable, and durable. RabbitMQ uses AMQP 0-9-1 messaging protocol for the secure transfer of messages.

In a streaming ingestion and replication task, you can use an AMQP source to subscribe to a stream of incoming messages. The AMQP broker stores the messages in the message queue until the streaming ingestion and replication job receives the message off the queue. When the streaming ingestion and replication job receives a message, the job acknowledges the receipt of the message. The acknowledged message is then removed from the message queue.

You can use the AMQP source when you have long-running tasks that you want to run as reliable background jobs. You can also choose to use an AMQP source for communication between applications where one part of the system needs to notify another part, such as order handling in a webshop.

Azure Event Hubs Kafka sources

You can configure a Kafka source to connect to Azure Event Hubs. To create an Azure Event Hubs Kafka source connection, use the Kafka connection type.

When you create a standard or dedicated tier Event Hubs namespace, the Kafka endpoint for the namespace is enabled by default. You can then use the Azure Event Hubs enabled Kafka connection as a source connection while configuring a streaming ingestion and replication task. Enter the Event Hubs name as the topic name.

The Azure Event Hubs source information that you enter while configuring a streaming ingestion and replication task is same as that of a normal Kafka source configuration. For more information about Azure Event Hubs Kafka source properties, see Azure Event Hubs Kafka source properties.

Configure the following properties while creating a Kafka connection in Administrator:

For more information about creating an Azure Event Hubs Kafka source connection, see the Connections help.

Note: Event Hubs for Kafka is available only on standard and dedicated tiers. The basic tier doesn't support Kafka on Event Hubs.

Business 360 Events sources

You can use Business 360 Events to publish events from Business 360 applications to supported targets, such as Kafka, Amazon S3, and flat files. To create a Business 360 Events source connection, use the Business 360 Events connection type.

Before you use a Business 360 Events source, perform the following tasks:

Flat File sources

Use a flat file as a source to read incoming real-time data. Configure a flat file connection to read data from flat files that are stored in the same directory.

A streaming ingestion and replication task reads each row in a flat file source and ingests the data to a configured target. When a flat file is continuously updated in real time, the streaming ingestion and replication task reads only the newly added content instead of reading the complete file again.

Streaming ingestion and replication can read data from the delimited flat files. The delimiter character must be a carriage return (\r), a line feed (\n), or a combination of both.

Google PubSub sources

Use a Google PubSub source to read messages from the configured Google Cloud PubSub subscription. To create a Google PubSub source connection, use the Google PubSub connection type.

Google PubSub is an asynchronous messaging service that decouples services that produce events from services that process events. You can use Google PubSub as a messaging-oriented middleware or for event ingestion and delivery for streaming analytics pipelines. Google PubSub offers durable message storage and real-time message delivery with high availability and consistent performance at scale. You can run Google PubSub servers in all the available Google Cloud regions around the world.

Before you use Google PubSub connector, you must ensure that you meet the following prerequisites:

In a streaming ingestion and replication task, you can use a Google PubSub source to subscribe to messages from a Google PubSub topic.

JMS sources

Use a JMS source to read data from a JMS provider. To create a JMS source connection, use the JMS connection type.

JMS providers are message-oriented middleware systems that send JMS messages. The JMS source reads JMS messages either from a JMS provider message queue or from a JMS provider based on the message topic. You can use a JMS source with IBM MQ, Oracle Weblogic JMS, Universal Messaging, and TIBCO JMS.

The JMS source can read the following JMS message types:

Setting up the JMS driver

Before you configure a JMS connection, make sure to place the drivers specific to your JMS server provider in the Secure Agent installation directory at <Secure Agent installation directory>/apps/StreamingIngestion/ext.

For an Oracle Weblogic JMS provider, the required file is wlthint3client.jakarta.jar.

Kafka sources

Use a Kafka source to read messages from a Kafka topic. To create a Kafka source connection, use the Kafka connection type.

Kafka is a publish-subscribe messaging system. It is an open-source distributed streaming platform that persists the streaming data in a Kafka topic. Any topic can then be read by any number of systems that need data in real-time. Kafka can serve as an interim staging area for streaming data that can be consumed by different downstream consumer applications can consume.

Kafka runs as a cluster comprised of one or more servers each of which is called a broker. Kafka brokers stream data in the form of messages. These messages are published to a topic. When you create a Kafka source, you create a Kafka consumer to read messages from a Kafka topic.

In a streaming ingestion and replication task, you can use a Kafka source to subscribe to a stream of incoming data. When you configure a Kafka source to read from a Kafka topic, you can specify the topic name or use a Java supported regular expression to subscribe to all topics that match a specified pattern.

You can use the same Kafka connection to create an Amazon Managed Streaming for Apache Kafka (Amazon MSK) or a Confluent Kafka source connection. You can then use the Amazon MSK source or the Confluent Kafka source in a streaming ingestion and replication task to read messages from an Apache Kafka or a Confluent Kafka topic.

MQTT sources

Use an MQTT source to read data from an MQ Telemetry Transport (MQTT) broker. To create an MQTT source, use the MQTT connection type.

MQTT is a publish-subscribe messaging system. It is a simple, lightweight, and persistent messaging protocol. It is designed for constrained devices and low-bandwidth, high-latency, or unreliable networks. Both publishers and subscribers are MQTT clients. MQTT decouples the publisher from the subscriber, so a broker manages the client connections.

An MQTT broker receives all messages, filters the messages, determines which client subscribed to each message, and then sends messages to the subscribed clients. If multiple MQTT sources connect to one MQTT broker, each connection must have a unique identifier. When you run a streaming ingestion and replication job to ingest data from an MQTT source, Streaming Ingestion and Replication first writes the data to an internal queue before writing the data to a target.

Note: An MQTT source must have a unique client identifier. If two MQTT sources have the same client identifier, the MQTT broker rejects both the clients and the streaming ingestion and replication job gets into running with warning state.

Streaming Ingestion and Replication supports MQTT Quality of Service (QoS) level 1. Level 1 indicates that the client sends the message to the broker at least once, but the message might be delivered more than once. After the broker acknowledges the message receipt, the client deletes the message from the outbound queue. The QoS Level is restricted to client to broker or broker to client communication.

OPC UA sources

Use an OPC UA source to read messages from an OPC UA application tag. To create an OPC UA source connection, use the OPCUA connection type.

Open Platform Communications (OPC) is one of the important communication protocols for Industry 4.0 and the IIoT (Industrial Internet Of Things). OPC Unified Architecture (OPC UA) is a machine-to-machine communication protocol used for industrial automation. OPC UA provides a flexible and adaptable mechanism to move data between enterprise systems, monitoring devices, and sensors that interact with real-world data. You can use OPC UA to establish communication for simple downtime status or for massive amounts of highly complex plant-wide information.

The OPC UA source is a client that collects data from OPC servers. Data points in OPC are tags that represent data from devices and provide real-time access to data. In a streaming ingestion and replication task, you can create an OPC UA source to read the incoming data based on the list of tags that you provide. You must mention the tags in a JSON array format.

REST V2 sources

Use a REST V2 source to read data from a web service application. To create a REST V2 source connection, use the REST V2 connection type.

REST V2 source connector is a generic connector for cloud applications with REST API. It supports Swagger specification version 2.0. The Swagger specification file contains operation ID, path parameters, query parameters, header fields, and payload details.

When you can create a REST V2 source connection in Administrator, for astreaming ingestion and replication task, you can configure one of the following REST authentication types: