AmazonKinesis Data Objects
An AmazonKinesis data object is a physical data object that represents data in a Amazon Kinesis Data Stream. After you create an AmazonKinesis connection, create an AmazonKinesis data object to read from Amazon Kinesis Data Streams.
Kinesis Data Streams is a real-time data stream processing option that Amazon Kinesis offers within the AWS ecosystem. It is a customizable option for users who want to build custom applications to process and analyze streaming data. You must manually provision enough capacity to meet system needs.
When you configure the AmazonKinesis data object, specify the name of the Amazon Kinesis Data Stream that you read from. After you create the data object, create a read operation to read data from an Amazon Kinesis Data Stream. You can then add the data object read operation as a source in streaming mappings.
When you configure the data operation properties, specify the format in which the data object reads data. When you read from Amazon Kinesis Data Stream sources, you can read data in JSON, XML, Avro, Flat, or binary format. When you specify XML format, you must provide a XSD file. When you specify Avro format, provide a sample Avro schema in a .avsc file. When you specify JSON or Flat format, you must provide a sample file.
You can pass any payload format directly from source to target in Streaming mappings. You can project columns in binary format pass a payload from source to target in its original form or to pass a payload format that is not supported.
Streaming mappings can read, process, and write hierarchical data. You can use array, struct, and map complex data types to process the hierarchical data. You assign complex data types to ports in a mapping to flow hierarchical data. Ports that flow hierarchical data are called complex ports.
Note: You cannot run a mapping with an AmazonKinesis data object on a MapR distribution.
For more information about processing hierarchical data, see the Informatica Big Data Management User Guide.
For more information about Kinesis Data Streams, see the Amazon Web Services documentation.
AmazonKinesis Data Object Overview Properties
Overview properties include general properties that apply to the AmazonKinesis data object. The Developer tool displays overview properties of the data object in the Overview view.
You can configure the following overview properties for AmazonKinesis data objects:
- General
- You can configure the following general properties for the AmazonKinesis data object:
- - Name. Name of the AmazonKinesis data object.
- - Description. Description of the AmazonKinesis data object.
- - Native Name. Name of the AmazonKinesis data object.
- - Path Information. The path of the data object in AmazonKinesis. For example, /DeliveryStreams/router1
- Column
- You can configure the name, native name, data type, precision, scale, and description of the columns in the AmazonKinesis resource.
- Advanced
The following are the advanced properties for the AmazonKinesis data object:
- - Amazon Resource Name. The Kinesis resource that the AmazonKinesis data object is reading from or writing to.
- - Type. The type of delivery stream that the AmazonKinesis data object is reading from or writing to. The delivery stream is either Kinesis Stream or Firehose DeliveryStream
- - Number of Shards. Specify the number of shards that the Kinesis Stream is composed of. This property is not applicable for Firehose DeliveryStream.
AmazonKinesis Data Object Read Operation Properties
The Data Integration Service uses read operation properties when it reads data from AmazonKinesis Streams.
General Properties
The Developer tool displays general properties for AmazonKinesis sources in the Read view.
The following table describes the general properties for the AmazonKinesis data object read operation:
Property | Description |
---|
Name | The name of the AmazonKinesis data object This property is read-only. You can edit the name in the Overview view. When you use the AmazonKinesis stream as a source in a mapping, you can edit the name in the mapping. |
Description | The description of the AmazonKinesis data object operation. |
Ports Properties
Ports properties for a physical data object include port names and port attributes such as data type and precision.
The following table describes the ports properties that you configure for AmazonKinesis stream sources:
Property | Description |
---|
Name | The name of the source. |
Type | The native data type of the source. |
Precision | The maximum number of significant digits for numeric data types, or the maximum number of characters for string data types. |
Detail | The detail of the data type. |
Scale | The scale of the data type. |
Description | The description of the resource. |
Sources Properties
The sources properties list the resources of the Amazon Kinesis data object.
The following table describes the sources property that you can configure for Amazon Kinesis Streams sources:
Property | Description |
---|
Sources | The sources which the Amazon Kinesis data object reads from. You can add or remove sources. |
Run-time Properties
The run-time properties include properties that the Data Integration Service uses when reading data from the source at run time.
The run-time property for AmazonKinesis Stream source includes the name of the AmazonKinesis connection.
Advanced Properties
The following table describes the advanced properties for AmazonKinesis Stream sources:
Property | Description |
---|
Operation Type | Specifies the type of data object operation. This is a read-only property. |
Guaranteed Processing | Guaranteed processing ensures that the mapping processes messages published by the sources and delivers them to the targets at least once. In the event of a failure, there could be potential duplicates but the messages are processed successfully. If the external source or the target is not available, the mapping execution stops to avoid any data loss. Select this option for guaranteed delivery of data streamed from the AmazonKinesis Stream. |
Degree of Parallelism | The number of processes that run in parallel within a shard. Specify a value that is less than or equal to the number of shards. |
Column Projections Properties
The following table describes the columns projection properties that you configure for Amazon Kinesis Stream sources:
Property | Description |
---|
Column Name | The name field that contains data. This property is read-only. |
Type | The native data type of the resource. This property is read-only. |
Enable Column Projection | Indicates that you use a schema to read the data that the source streams. By default, the data is streamed in binary format. To change the format in which the data is processed, select this option and specify the schema format. |
Schema Format | The format in which the source processes data. You can select one of the following formats: |
Schema | Specify the XSD schema for the XML format, the sample JSON for the JSON format. Specify a .avsc file for the Avro format or a sample file for the Flat format. |
Column Mapping | The mapping of source data to the data object. Click View to see the mapping. |
Project Column as Complex Data Type | Project columns as complex data type for sources with hierarchical data. Select this option if the source has hierarchical data. For more information on hierarchical data, see the Informatica Big Data Management User Guide. |
Configuring Scheme for Flat Files
Configure schema for flat files when you configure column projection properties.
1. On the Column Projection tab, enable column projection and select the flat schema format.
The page displays the column projection properties page.
2. On the column projection properties page, configure the following properties:
- - Sample Metadata File. Select a sample file.
- - Code page. Select the UTF-8 code page.
- - Format. Format in which the source processes data. Default value is Delimited. You cannot change it.
3. Click Next.
4. In the delimited format properties page, configure the following properties:
Property | Description |
---|
Delimiters | Specify the character that separates entries in the file. Default is a comma (,). You can only specify one delimiter at a time. If you select Other and specify a custom delimiter, you can only specify a single-character delimiter. |
Text Qualifier | Specify the character used to enclose text that should be treated as one entry. Use a text qualifier to disregard the delimiter character within text. Default is No quotes. You can only specify an escape character of one character. |
Preview Options | Specify the escape character. The row delimiter is not applicable as only one row is created at a time. |
Maximum rows to preview | Specify the rows of data you want to preview. |
5. Click Next to preview the flat file data object.
If required, you can change the column attributes. The data type timestampWithTZ format is not supported.
6. Click Finish.
The data object opens in the editor.