Property | Description |
---|---|
Name | A name for the streaming ingestion and replication task. The names of streaming ingestion and replication tasks must be unique within the organization. Task names can contain alphanumeric characters, spaces, and underscores. Names must begin with an alphabetic character or underscore. Task names are not case-sensitive. |
Location | The project folder to store the task. |
Runtime Environment | Runtime environment that contains the Secure Agent. The Secure Agent runs the task. For streaming ingestion and replication tasks, the Cloud Hosted Agent is not supported and does not appear in the Runtime Environment list. Serverless runtime environments are also not supported. |
Description | Optional. Description about the task. Maximum length is 4,000 characters. |
Property | Description |
---|---|
Connection | Name of the Amazon Kinesis Stream source connection. |
Connection Type | The Amazon Kinesis connection type. The connection type populates automatically based on the connection that you select. |
Stream | Name of the Kinesis Stream from which you want to read data. |
Property | Description |
---|---|
Append GUID to DynamoDB table name | Specifies whether or not to add a GUID as a suffix to the Amazon DynamoDB table name. If disabled, you must enter the Amazon DynamoDB table name. Default is enabled. |
Amazon DynamoDB | Amazon DynamoDB table name to store the checkpoint details of the Kinesis source data. The Amazon DynamoDB table name is generated automatically. However, if you enter a name of your choice, the streaming ingestion and replication task prefixes the given name to the auto-generated name. |
Property | Description |
---|---|
Connection | Name of the AMQP source connection. |
Connection Type | The AMQP connection type. The connection type populates automatically based on the connection that you select. |
Queue | Name of the existing AMQP queue from which the streaming ingestion and replication task reads the messages. This queue is pre-defined by the AMQP administrator. |
Auto Acknowledge messages | You can choose True or False. If you choose True, the AMQP broker automatically acknowledges the received messages. |
Batch Size | The maximum number of messages that must be pulled in a single session. Default is 10 messages. |
Property | Description |
---|---|
Connection | Name of the Kafka source connection. |
Connection Type | The Kafka connection type. The connection type populates automatically based on the connection that you select. |
Topic | Name of the Event Hubs from which you want to read the events. You can either enter the topic name manually or fetch the already created metadata of the Kafka enabled Event Hubs connection.
The Select Source Object dialog box appears showing all the available topics. |
Property | Description |
---|---|
Consumer Configuration Properties | Comma-separated list of configuration properties for the consumer to connect to Kafka. Specify the values as key-value pairs. For example, key1=value1, key2=value2. The group.id property of Kafka consumer is autogenerated. You can override this property. |
Property | Description |
---|---|
Connection | Select a Business 360 Events connection to read events from the Business 360 data store. |
Connection Type | The connection type, that is, Business 360 Events. Note: You can't modify this attribute. |
Business Object | Select the publishing event asset from which you want to read the data. |
Property | Description |
---|---|
Connection | Name of the flat file source connection. |
Connection Type | The Flat file connection type. The connection type appears automatically based on the connection that you select. |
Initial Start Position | Starting position from which the data is to be read in the file to tail. You can choose one of the following positions to start reading:
|
Tailing Mode | Tail a file or multiple files based on the logging pattern. You can choose one of the following modes:
|
File | Absolute path with the name of the file you want to read. Name of the file to tail or regular expression to find the files to tail. Enter the base directory for multiple files mode. |
Connection Property | Description |
---|---|
Rolling Filename Pattern | Name pattern for the file that rolls over. If the file to tail rolls over, the file name pattern is used to identify files that have rolled over. The underlying streaming ingestion Secure Agent recognizes this file pattern. When the Secure Agent restarts, and the file has rolled over, it picks up from where it left off. You can use asterisk (*) and question mark (?) as wildcard characters to indicate that the files are rolled over in the same directory. For example, ${filename}.log.*. Here, asterisk (*) represents the successive version numbers that would be appended to the file name. |
Property | Description |
---|---|
Connection | Name of the Google PubSub source connection. |
Connection Type | The Google PubSub connection type. The connection type populates automatically based on the connection that you select. |
Subscription | Name of the subscription on the Google PubSub service from which messages should be pulled. The Google PubSub connection supports only the pull delivery type for a subscription. |
Batch Size | Maximum number of messages that the Cloud service bundles together in a batch. Default is 1. |
Property | Description |
---|---|
Connection | Name of the JMS source connection. |
Connection Type | JMS connection type. The connection type populates automatically based on the connection that you select. |
Destination Type | Type of destination that the source service sends the JMS message to. You can choose one of the following destination types:
Default is Queue. |
Shared Subscription | Enables multiple consumers to access a single subscription. Applies to the topic destination type. Default is false. |
Durable Subscription | Enables inactive subscribers to retain messages and deliver retained messages when the subscribers reconnect. Applies to the topic destination type. Default is false. |
Subscription Name | Name of the subscription. Applies to the topic destination type, when the topic subscription is sharable, durable, or both. If no value is specified, the ingestion service generates a unique subscription name. |
JMS Destination | Name of the queue or topic that the JMS provider delivers the message to. Note: If a JMS connection is created with the JMS Weblogic server, the queue or topic JMS Destination must start with a period, followed by a slash (./). For example: ./<JMS Server module name>!<Queue or topic name> For more information about connecting Streaming Ingestion and Replication to an Oracle Weblogic JMS server, see Informatica Knowledge Base article 000186952. |
Property | Description |
---|---|
Client ID | Optional. Unique identifier that identifies the JMS connection. The streaming ingestion and replication task generates a unique client ID if a value isn't specified for an unshared durable subscription. |
Property | Description |
---|---|
Connection | Name of the Kafka source connection. |
Connection Type | The Kafka connection type. The connection type populates automatically based on the connection that you select. |
Topic | Kafka source topic name or a Java supported regular expression for the Kafka source topic name pattern to read the events from. You can either enter the topic name manually or fetch the metadata of the Kafka connection. To select the metadata of the Kafka connection perform the following actions:
The Select Source Object dialog box appears, showing all the topics or topic patterns available in the Kafka broker. Note: When you add a new Kafka source topic to a streaming ingestion and replication job that is in Up and Running state, redeploy the job immediately to avoid data loss from the new topics. |
Property | Description |
---|---|
group.id | Specifies the name of the consumer group the Kafka consumer belongs to. If group.id doesn't exist when you construct the Kafka consumer, the task creates the consumer group automatically. This property is auto-generated. You can override this property. Default is key1=value1, key2=value2. |
auto.offset.reset | Specifies the behavior of the consumer when there is no committed position or when an offset is out of range. You can use the following types of auto offset reset:
When you read data from a Kafka topic or use a topic pattern and the offset of the last checkpoint is deleted during message recovery, provide the following property to recover the messages from the next available offset: auto.offset.reset=earliest Otherwise, the streaming ingestion and replication task reads data from the latest offset available. |
message-demarcator | Kafka source receives messages in batches. You can contain all Kafka messages in a single batch for a given topic and partition. This property allows you to provide a string to use as a demarcation for multiple Kafka messages. If you don't provide a value, each Kafka message is triggered as a single event. You can use the following delimiters as demarcators:
message-demarcator=${literal(' '):unescapeXml()} message-demarcator=${literal(','):unescapeXml()} message-demarcator=${literal(';'):unescapeXml()} message-demarcator=${literal('	'):unescapeXml()} |
max.poll.records | Specifies the maximum number of records returned in a single call to poll. For example, max.poll.records=100000 |
Property | Description |
---|---|
Connection | Name of the MQTT source connection. |
Connection Type | The MQTT connection type. The connection type populates automatically based on the connection that you select. |
Topic | Name of the MQTT topic. |
Connection Property | Description |
---|---|
Client ID | Optional. Unique identifier that identifies the connection between the MQTT source and the MQTT broker. The client ID is the file-based persistence store that the MQTT source uses to store messages when they are being processed. If you do not specify a client ID, the streaming ingestion and replication task uses the client ID provided in the MQTT connection. However, if you have not specified the client ID even in the MQTT connection, the streaming ingestion and replication task generates a unique client ID. |
Max Queue Size | Optional. The maximum number of messages that the processor can store in memory at the same time. Default value is 1024 bytes. |
Property | Description |
---|---|
Connection | Name of the OPC UA source connection. |
Connection Type | The OPC UA connection type. The connection type populates automatically based on the connection that you select. |
Tag List Specified As | Format in which the list of tags is specified. Select one of the following formats:
|
Tags or File Path | List of tags or path to the file containing the list of tags to be read, specified as a JSON array. The list of tags or file path cannot exceed 2048 characters. |
Minimum Publish Interval | The minimum publish interval of subscription notification messages, in milliseconds. Set this property to a lower value to detect the rapid change of data. Default is 1,000 milliseconds. |
Property | Description |
---|---|
Connection | Name of the REST V2 source connection. |
Connection Type | The REST V2 connection type. The connection type populates automatically based on the connection that you select. |
REST Endpoints | List of REST endpoints specified in the input Swagger file. These endpoints appear based on the chosen REST connection. |
Scheme | List of schemes specified in the Swagger definition. The selected scheme is used to create a the URL. |
Poll Interval | Interval between two consecutive REST calls. Default is 10 seconds. |
Action on Unsuccessful Response codes | Action required for unsuccessful REST calls. You can choose of the following actions:
|
Property | Description |
---|---|
Connection | Name of the Amazon Kinesis Data Firehose target connection. |
Connection Type | The Amazon Kinesis connection type. The connection type populates automatically based on the connection that you select. |
Stream Name/Expression | Kinesis stream name or a regular expression for the Kinesis stream name pattern. Use the $expression$ format for the regular expression. $expression$ evaluates the data and sends the matching data to capturing group 1. |
Property | Description |
---|---|
Connection | Name of the Amazon Kinesis Stream target connection. |
Connection Type | The Amazon Kinesis connection type. The connection type populates automatically based on the connection that you select. |
Stream Name/Expression | Kinesis stream name or a regular expression for the Kinesis stream name pattern. Use the $expression$ format for the regular expression. $expression$ evaluates the data and sends the matching data to capturing group 1. |
Property | Description |
---|---|
Connection | Name of the Amazon S3 target connection. |
Connection Type | The Amazon S3 V2 connection type. The connection type populates automatically based on the connection that you select. |
Object Name/Expression | Amazon S3 file name or a regular expression for the Amazon S3 file name pattern. Use the $expression$ format for a regular expression. $expression$ evaluates the data and sends the matching data to capturing group 1. |
Property | Description |
---|---|
Partitioning Interval | Optional. The time interval according to which the streaming ingestion task creates partitions in the Amazon S3 bucket. To use this option, you must add a ${Timestamp} expression to the object name in the Object Name/Expression field. Default is none. For more information, see Amazon S3 target. |
Minimum Upload Part Size | Optional. Minimum upload part size when uploading a large file as a set of multiple independent parts, in megabytes. Use this property to tune the file load to Amazon S3. Default value is 5120 MB. |
Multipart Upload Threshold | Optional. Multipart download minimum threshold to determine when to upload objects in multiple parts in parallel. Default value is 5120 MB. |
Property | Description |
---|---|
Connection | Name of the Azure Event Hubs target connection. |
Connection Type | The Azure Event Hubs connection type. The connection type populates automatically based on the connection that you select. |
Event Hub | The name of the Azure Event Hubs. |
Property | Description |
---|---|
Shared Access Policy Name | Optional. The name of the Event Hub Namespace Shared Access Policy. The policy must apply to all data objects that are associated with this connection. To read from Event Hubs, you must have Listen permission. To write to an Event Hub, the policy must have Send permission. |
Shared Access Policy Primary Key | Optional. The primary key of the Event Hub Namespace Shared Access Policy. |
Property | Description |
---|---|
Connection | Name of the Databricks target connection. |
Connection Type | The Databricks connection type. The connection type populates automatically based on the connection that you select. |
Staging Location | Relative directory path to store the staging files.
|
Target Table Name | Name of the Databricks table to append. |
Property | Description |
---|---|
Target Database Name | Overrides the database name provided in the Databricks connection in Administrator. |
Property | Description |
---|---|
Connection | Name of the flat file target connection. |
Connection Type | The flat file connection type. The connection type appears based on the connection that you select. |
Staging Directory Location | Path to the staging directory on the Secure Agent. Specify the staging directory where to stage the files when you write data to a flat file target. Ensure that the directory has sufficient space and you have write permissions to the directory. |
Rollover Size * | The file size, in KB, at which the task moves the file from the staging directory to the target. For example, set the rollover size to 1 MB and name the file target.log. If the source service sends 5 MB to the target, the streaming ingestion and replication task first creates the target.log.<timestamp> file. When the size of target.log.<timestamp> reaches 1 MB, the task rolls the file over. |
Rollover Events Count * | Number of events or messages to accumulate for file rollover. For example, if you set the rollover events count to 1000, the task rolls the file over when the file accumulates 1000 events. |
Rollover Time * | Length of time, in milliseconds, for a target file to roll over. After the time period has elapsed, the target file rolls over. For example, if you set rollover time as 1 hour, the task rolls the file over when the file reaches a period of 1 hour. |
File Name | The name of the file that the task creates on the target. |
* Specify a value for at least one rollover option to perform target file rollover. |
Property | Description |
---|---|
Connection | Name of the Google BigQuery V2 target connection. |
Connection Type | The Google BigQuery V2 connection type. The connection type populates automatically based on the connection that you select. |
Dataset Name | Name of the Google BigQuery dataset. The dataset must exist in the Google Cloud Platform. |
Table name | Name of the Google BigQuery table to insert data to in JSON format. |
Property | Description |
---|---|
Connection | Name of the Google Cloud Storage target connection. |
Connection Type | The Google Cloud Storage connection type. The connection type populates automatically based on the connection that you select. |
Number of Retries | The number of times the streaming ingestion and replication task retries to write to the Google Cloud Storage target. Default is 6. |
Bucket | The container to store, organize, and access objects that you upload to Google Cloud Storage. |
Key | Name of the Google Cloud Storage target object. |
Property | Description |
---|---|
Proxy Host | Host name of the outgoing proxy server that the Secure Agent uses. |
Proxy Port | Port number of the outgoing proxy server. |
Content Type | The file content type. You can specify any MIME types, such as application.json, multipart, text, or html. These values are not case sensitive. Default is text. |
Object ACL | Access control associated with the uploaded object. Choose one of the following types of authentication:
|
Server Side Encryption Key | Server-side encryption key for the Google Cloud Storage bucket. Required if the Google Cloud Storage bucket is encrypted with SSE-KMS. |
Content Disposition Type | Type of RFC-6266 Content Disposition to be attached to the object. Choose either Inline or Attachment. |
Property | Description |
---|---|
Connection | Name of the Google PubSub target connection. |
Connection Type | The Google PubSub connection type. The connection type populates automatically based on the connection that you select. |
Topic | Name of the target Google PubSub topic. |
Batch Size | Maximum number of messages that the Cloud service bundles together in a batch. Default is 1. |
Property | Description |
---|---|
Connection | Name of the JDBC V2 target connection. |
Connection Type | The JDBC V2 connection type. The connection type populates automatically based on the connection that you select. |
Table name | Name of the table to insert data to in JSON format. |
Property | Description |
---|---|
Connection | Name of the Kafka target connection. |
Connection Type | The Kafka connection type. The connection type populates automatically based on the connection that you select. |
Topic Name/Expression | Kafka topic name or a Java supported regular expression for the Kafka topic name pattern. Use the $expression$ format for the regular expression. $expression$ evaluates the data and sends the matching data to capturing group 1. You can either enter the topic name manually or fetch the already created metadata of the Kafka connection.
The Select Target Object dialog box appears showing all the topics available in the Kafka broker. However, Kafka topic name patterns do not appear in the list. |
Property | Description |
---|---|
Producer Configuration Properties | The configuration properties for the producer. |
Metadata Fetch Timeout in milliseconds | The time after which the metadata is not fetched. |
Batch Flush Size in bytes | The batch size of the events after which a streaming ingestion and replication task writes data to the target. |
Property | Description |
---|---|
Connection | Name of the Microsoft Azure Data Lake Storage Gen2 target connection. |
Connection Type | The ADLS Gen2 connection type. The connection type populates automatically based on the connection that you select. |
Write Strategy | The operation type to write data to ADLS Gen2 file. If the file exists in ADLS Gen2 storage, you can select to overwrite, append, fail, or rollover the existing file. Default is Append.
|
Interim Directory | Path to the staging directory in ADLS Gen2. Specify the staging directory where you want to stage the files when you write data to ADLS Gen2. Ensure that the directory has sufficient space and you have write permissions to the directory. Applicable when you select the Write Strategy as Rollover. While configuring an ADLS Gen 2 target in a streaming ingestion and replication job, if you do not specify any value for the rollover properties, the files remain in the interim directory. When you stop or undeploy the streaming ingestion and replication job, these files in the interim directory are moved to the target location, by default. |
Rollover Size | Target file size, in kilobytes (KB), at which to trigger rollover. Applicable when you select the Write Strategy as Rollover. |
Rollover Events Count | Number of events or messages that you want to accumulate for the rollover. Applicable when you select the Write Strategy as Rollover. |
Rollover Time | Length of time, in milliseconds, for a target file to roll over. After the time period has elapsed, the target file rolls over. Applicable when you select the Write Strategy as Rollover. |
File Name/Expression | File name or a regular expression for the file name pattern. Use the $expression$ format for the regular expression. $expression$ evaluates the data and sends the matching data to capturing group 1. |
Property | Description |
---|---|
Filesystem Name Override | Overrides the default file system name provided in connection. This file system name is used write to a file at run time. |
Directory Override | Overrides the default directory path. The ADLS Gen2 directory that you use to write data. Default is root directory. The directory path specified while creating the target overrides the path specified while creating a connection. |
Compression Format | Optional. Compression format to use before the streaming ingestion and replication task writes data to the target file. Use one of the following formats:
Default is None. To read a compressed file from the data lake storage, the compressed file must have specific extensions. If the extensions used to read the compressed file are not valid, the Secure Agent does not process the file. |
Property | Description |
---|---|
Transformation Type | Select Combiner. |
Transformation Name | Name of the Combiner transformation. |
Minimum Number of Events | Minimum number of events to collect before the transformation combines the events into a single event. Default is 1. |
Maximum Aggregate Size | Maximum size of the combined events in megabytes. If not specified, this transformation waits to meet any of the other two conditions before combining the events. |
Time Limit | Maximum time to wait before combining the events. If not specified, this transformation waits for the other conditions before combining the events or waits forever. |
Delimiter | Symbols used to specify divisions between data strings in the transformed data. Applicable only for the binary data format. |
Append the delimiter character to the last record in each batch | When there are many batches with events or records, you can choose whether to use the delimiter character at the end of the last record in each batch. This enables the delimiter character to act as the division between each batch. |
Property | Description |
---|---|
Transformation Type | Select Filter. |
Transformation Name | Name of the Filter transformation. |
Filter Type | Type of filter to evaluate the incoming data. Use one of the following filter types:
|
Expression | Expression for the filter type that you select. |
Property | Description |
---|---|
Transformation Type | Select Format Converter. |
Transformation Name | Name of the Format Converter transformation. |
Convert to Format | The streaming ingestion and replication task converts incoming data to the selected format. Currently, the Format Converter transformation converts the incoming data only to Parquet format. |
Date Format * | Enter the format of dates in input fields. For example, MM/dd/yyyy. |
Time Format * | Enter the format of time in input fields. For example, HH/mm/ss. |
Timestamp Format * | Enter the format of timestamps in input fields. For example, the epoch timestamp for 10/11/2021 12:04:41 GMT (MM/dd/yyyy HH:mm:ss) is 1633953881 and the timestamp in milliseconds is 1633953881000. |
Expect Records as Array | Determines whether to expect a single record or an array of records. Select this property to expect arrays in each record. Applies only to XML incoming messages. By default, this property is deselected. |
* If the format is not specified, it is considered in milliseconds since the epoch (Midnight, January 1, 1970, GMT). |
Property | Description |
---|---|
Transformation Type | Select Java. |
Transformation Name | Name of the Java transformation. |
Classpath | The JAR file used to run the Java code. You can use a separator to include multiple JAR files. On UNIX, use a colon to separate multiple classpath entries. On Windows, use a semicolon to separate multiple classpath entries. For example, /home/user/commons-text-1.9.jar; /home/user/json-simple-1.1.1.jar |
Import code | Import third-party, built-in, and custom Java packages. You can import multiple packages. Use a semicolon to separate multiple packages. You can use the following syntax to import packages: import <package name> For example, import java.io.*; |
Main code | A Java code that provides the transformation logic. For example, JSONParser parser = new JSONParser(); try { JSONObject object = (JSONObject) parser.parse(inputData); object.put("age", 23); outputData=object.toJSONString(); } catch (ParseException e) { throw new RuntimeException(); } |
Property | Description |
---|---|
Transformation Type | Select Jolt. |
Transformation Name | Name of the Jolt transformation. |
Jolt Specification | Enter a JSON structure to add a chain of multiple operations using the following operations:
For example, [ { "operation": "shift", "spec": { "breadbox": "counterTop" } } ] For example, [ { "operation": "default", "spec": { "counterTop": { "loaf1": { "slices": [ "slice1", "slice2", "slice3", "slice4 ] } } } } ] } For example, [ { "operation": "cardinality", "spec": { "counterTop": { "loaf1": { "slices": "ONE" } } } } ] For example, [ { "operation": "remove", "spec": { "counterTop": { "loaf2": "", "jar1": "" } } } ] For example, [ { "operation": "modify-overwrite-beta", "spec": { "counterTop": { "jar2": { "contents": "=toUpper" } } } } ] For example, [ { "operation": "sort" } ] The following is an example of a Jolt specification involving multiple operations: Input: {"name":"test"} [{ "operation": "shift", "spec": { "name": "testname" } }, { "operation": "default", "spec": { "city": ["Anantapur", "Bangalore", "Hyderabad"] } }, { "operation": "cardinality", "spec": { "city": "ONE" } }, { "operation": "remove", "spec": { "age": "" } }, { "operation": "modify-overwrite-beta", "spec": { "city": "=toUpper" } }, { "operation": "sort"}] Note: If the input records doesn't match the Jolt specification, the transformation writes null records to the target. |
Property | Description |
---|---|
Transformation Type | Select Python. |
Transformation Name | Name of the Python transformation. |
Script Input Type | Python script input type. You can either enter the Python script in Script Body or provide the path to the Python script available in the Script Path. |
Python Path | Directory to the Python path libraries. |
Property | Description |
---|---|
Transformation Type | Select Splitter. |
Transformation Name | Name of the Splitter transformation. |
Split Type | Split condition to evaluate the incoming data. Use one of the following split types:
|
Line Split Count | The maximum number of lines that each output split file contains, excluding header lines. |
Byte Sequence | Specified sequence of bytes on which to split the content. |
Property | Description |
---|---|
Split Expression | Split condition to evaluate the incoming data. Use one of the following split types:
|
JSONPath Expression | A JSONPath expression that specifies the array element to split into JSON or scalar fragments. The default JSONpath Expression is $. |
Property | Description |
---|---|
Split Depth | The XML nesting depth to start splitting the XML fragments. The default split depth is 1. |