User Guide > Targets in a Streaming Mapping > HBase Data Objects
  

HBase Data Objects

An HBase data object is a physical data object that represents data in an HBase resource. After you create an HBase connection, create an HBase data object with a write data operation to write data to an HBase table.
When you create an HBase data object, you can select an HBase table and view all the column families in the table. You can specify the column names in the column family if you know the column name and data type, or you can search the rows in the HBase table and specify the columns.
You can write to a column family or to a single binary column. When you create the data object, specify the column families to which you can write or choose to write all the data as a single stream of binary data.

Data Object Column Configuration

When you want to write data to columns in a column family, you can specify the columns when you create the HBase data object.
You can write data to columns in one f the following ways:
When you manually create the data object, the column name should be of the following format: columnFamily__columnQualifier

Add Columns

When you create a data object, you can specify the columns in one or more column families in an HBase table.
When you add an HBase table as the resource for an HBase data object, all the column families in the HBase table appear. If you know the details of the columns in the column families, you can select a column family and add the column details. In the Column Families dialog box, select the column family to which you want to add the columns. Column details include column name, data type, precision, and scale.
Although data is stored in binary format in HBase tables, you can specify the associated data type of the column to transform the data. To avoid data errors or incorrect data, verify that you specify the correct data type for the columns.
Verify that you specify valid column details when you add columns to avoid unexpected run-time behaviors. If you do not specify a value for a column when you write data to an HBase table, the Data Integration Service specifies a null value for the column at run time.
If the HBase table has more that one column family, you can add column details for multiple column families when you create the data object. Select one column family at a time and add the columns details. The column family name is the prefix for all the columns in the column family for unique identification.

Search and Add Columns

When you create a data object, you can search the rows in an HBase table to identify the column in the table and select the columns you want to add.
When you do not know the columns in an HBase table, you can search the rows in the table to identify all the columns and the occurrence percentage of the column. You can infer if the column name is valid based on the number of times the column occurs in the table. For example, if column name eName occurs rarely while column name empName occurs in a majority of rows, you can infer the column name as empName.
When you search and add columns, you can specify the maximum number of rows to search and the occurrence percentage value for a column. If you specify the maximum numbers of rows as 100 and the column occurrence percent as 90, all columns that appear at least 90 times in 100 rows appear in the results. You can select the columns in the results to add the columns to the data object.

Get All Columns

Binary data or data that can be converted to a byte array can be stored in an HBase column. You can read from and write to an HBase tables in bytes.
When you create a data object, you can choose to get all the columns in a column family as a single stream of binary data.
Use the HBase data object as a target to write data in all the columns in the source data object as a single column of binary data in the target HBase table.
The Data Integration Service generates the data in the binary column based on the protobuf format. Protobuf format is an open source format to describe the data structure of binary data. The protobuf schema is described as messages.

HBase Object Overview Properties

The Data Integration Service uses overview properties when it writes data to an HBase resource.
Overview properties include general properties that apply to the HBase data object. They also include object properties that apply to the resources in the HBase data object. The Developer tool displays overview properties for HBase resources in the Overview view.

General Properties

The following table describes the general properties that you configure for the HBase data objects:
Property
Description
Name
Name of the HBase data object.
Location
The project or folder in the Model repository where you want to store the HBase data object.
Native Name
Native name of the HBase data object.
Path
Path to the HBase data object.

Add Column Properties

In the Column Families dialog box, select the column family to which you want to add the columns. The following table describes the column properties that you configure when you associate columns with column families:
Property
Description
Name
Name of the column in the column family.
Type
Data type of the column.
Precision
Precision of the data.
Scale
Scale of the data.

HBase Data Object Write Operation Properties

The Data Integration Service uses write operation properties when it writes data to an HBase resource.
HBase data object write operation properties include run-time properties that apply to the HBase data object.

Advanced Properties

The Developer tool displays the advanced properties for HBase targets in the Advanced view.
The following table describes the advanced properties that you can configure for an HBase data object in a streaming mapping:
Property
Description
Operation Type
The type of data object operation.
This is a read-only property.
Date Time Format
Format of the columns of the date data type.
Specify the date and time formats by using any of the Java date and time pattern strings.
Auto Flush
Optional. Indicates whether you want to enable Auto Flush to run each Put operation immediately.
You can set auto flush to the following values:
  • - Enable Auto Flush to set the value to true. The Data Integration Service runs each Put operation immediately as it receives them. The service does not buffer or delay the Put operations. Operations are not retried on failure. When you enable auto flush, the operations are slow as you cannot run operations in bulk. However, you do not lose data as the Data Integration Service writes the data immediately.
  • - Disable Auto Flush to set the auto flush value to false. When you disable auto flush, the Data Integration Service accepts multiple Put operations before making a remote procedure call to perform the write operations. If the Data integration Service stops working before it flushes any pending data writes to HBase, that data is lost. Disable auto flush if you need to optimize performance.
Default is disabled.