Kafka Data Objects
A Kafka data object is a physical data object that represents data in a Kafka stream. After you configure a Messaging connection, create a Kafka data object to write to Apache Kafka brokers.
Kafka runs as a cluster comprised of one or more servers each of which is called a broker. Kafka brokers stream data in the form of messages. These messages are published to a topic. When you write data to a Kafka messaging stream, specify the name of the topic that you publish to. You can also write to a Kerberised Kafka cluster.
Kafka topics are divided into partitions. Spark Streaming can read the partitions of the topics in parallel. This gives better throughput and could be used to scale the number of messages processed. Message ordering is guaranteed only within partitions. For optimal performance you should have multiple partitions.
When you write to Kafka brokers, you can use the partionId, Key, and TopicName output ports. You can override these ports when you create the mapping. You can create or import a Kafka data object.
After you create a Kafka data object, create a write operation. You can use the Kafka data object write operation as a target in Streaming mappings. If you want to configure high availability for the mapping, ensure that the Kafka cluster is highly available.
When you configure the data operation properties, specify the format in which the Kafka data object writes data. You can specify XML, JSON, Avro, or Flat as format. When you specify XML format, you must provide a XSD file. When you specify Avro format, provide a sample Avro schema in a .avsc file. When you specify JSON or Flat format, you must provide a sample file.
You can pass any payload format directly from source to target in Streaming mappings. You can project columns in binary format pass a payload from source to target in its original form or to pass a payload format that is not supported.
Streaming mappings can read, process, and write hierarchical data. You can use array, struct, and map complex data types to process the hierarchical data. You assign complex data types to ports in a mapping to flow hierarchical data. Ports that flow hierarchical data are called complex ports.
For more information about processing hierarchical data, see the Informatica Big Data Management User Guide.
For more information about Kafka clusters, Kafka brokers, and partitions see
http://kafka.apache.org/082/documentation.html.
Kafka Data Object Overview Properties
The Data Integration Service uses overview properties when it reads data from or writes data to a Kafka broker.
Overview properties include general properties that apply to the Kafka data object. They also include object properties that apply to the resources in the Kafka data object. The Developer tool displays overview properties for Kafka messages in the Overview view.
General Properties
The following table describes the general properties that you configure for Kafka data objects:
Property | Description |
---|
Name | The name of the Kafka data object. |
Description | The description of the Kafka data object. |
Connection | The name of the Kafka connection. |
Objects Properties
The following table describes the objects properties that you configure for Kafka data objects:
Property | Description |
---|
Name | The name of the topic or topic pattern of the Kafka data object. |
Description | The description of the Kafka data object. |
Native Name | The native name of Kafka data object. |
Path Information | The type and name of the topic or topic pattern of the Kafka data object. |
Column Properties
The following table describes the column properties that you configure for Kafka data objects:
Property | Description |
---|
Name | The name of the Kafka data object. |
Native Name | The native name of the Kafka data object. |
Type | The native data type of the Kafka data object. |
Precision | The maximum number of significant digits for numeric data types, or the maximum number of characters for string data types. |
Scale | The scale of the data type. |
Description | The description of the Kafka data object. |
Access Type | The type of access the port or column has. |
Kafka Data Object Write Operation Properties
The Data Integration Service uses write operation properties when it writes data to a Kafka broker.
General Properties
The Developer tool displays general properties for Kafka targets in the Write view.
The following table describes the general properties that you view for Kafka targets:
Property | Description |
---|
Name | The name of the Kafka broker. This property is read-only. |
Description | The description of the Kafka broker. |
Ports Properties
Ports properties for a physical data object include port names and port attributes such as data type and precision.
The following table describes the ports properties that you configure for Kafka broker sources:
Property | Description |
---|
Name | The name of the resource. |
Type | The native data type of the resource. |
Precision | The maximum number of significant digits for numeric data types, or the maximum number of characters for string data types. |
Scale | The scale of the data type. |
Description | The description of the resource. |
Run-time Properties
The run-time properties displays the name of the connection.
The following table describes the run-time property that you configure for Kafka targets:
Property | Description |
---|
Connection | Name of the Kafka connection. |
Target Properties
The targets properties list the targets of the Kafka data object.
The following table describes the sources property that you can configure for Kafka targets:
Property | Description |
---|
Target | The target which the Kafka data object writes to. You can add or remove targets. |
Advanced Properties
The Developer tool displays the advanced properties for Kafka targets in the Input transformation in the Write view.
The following table describes the advanced properties that you configure for Kafka targets:
Property | Description |
---|
Operation Type | Specifies the type of data object operation. This is a read-only property. |
Metadata Fetch Timeout in milliseconds | The time after which the metadata is not fetched. |
Batch Flush Time in milliseconds | The interval after which the data is published to the target. |
Batch Flush Size in bytes | The batch size of the events after which the data is written to the target. |
Producer Configuration Properties | The configuration properties for the producer. If the Kafka data object is writing data to a Kafka cluster that is configured for Kerberos authentication, include the following property: security.protocol=SASL_PLAINTEXT,sasl.kerberos.service.name=kafka,sasl.mechanism=GSSAPI |
For more information about Kafka broker properties, see
http://kafka.apache.org/082/documentation.html.
Column Projections Properties
The Developer tool displays the column projection properties in the Properties view of the write operation.
To specify column projection properties, double click on the write operation and select the data object. The following table describes the columns projection properties that you configure for Kafka targets:
Property | Description |
---|
Column Name | The field in the target that the data object writes to. This property is read-only. |
Type | The native data type of the target. This property is read-only. |
Enable Column Projection | Indicates that you use a schema to publish the data to the target. By default, the data is streamed in binary format. To change the streaming format, select this option and specify the schema format. |
Schema Format | The format in which you stream data to the target. You can select one of the following formats: |
Schema | Specify the XSD schema for the XML format, acsample file for JSON, or .avsc file for Avro format. For the Flat file format, configure the schema to associate a flat file to the Kafka target. When you provide a sample file, the Data Integration Service uses UTF-8 code page when writing the data. |
Column Mapping | The mapping of data object to the target. Click View to see the mapping. |
Project Column as Complex Data Type | Project columns as complex data type for hierarchical data. For more information, see the Informatica Big Data Management User Guide. |