Big Data Management User Guide > Mapping Targets in the Hadoop Environment > Hive Targets
  

Hive Targets

A mapping that is running in the Hadoop environment can write to a Hive target.
A Hive target can be an internal table or an external table. Internal Hive tables are managed by Hive and are also known as managed tables. External Hive tables are managed by an external source such as HDFS, Amazon S3, Azure Blob, WASB, or ADLS.
Consider the following restrictions when you configure a Hive target in a mapping that runs in the Hadoop environment:
When a mapping creates or replaces a Hive table, the type of table that the mapping creates depends on the run-time engine that you use to run the mapping.
The following table shows the table type for each run-time engine:
Run-Time Engine
Resulting Table Type
Blaze
MANAGED_TABLE
Spark
EXTERNAL_TABLE
Hive
MANAGED_TABLE
You can design a mapping to truncate an internal or external Hive table that is bucketed and partitioned.
In a mapping that runs on the Spark engine or the Blaze engine, you can create a custom DDL query that creates or replaces a Hive table at run time. However, with the Blaze engine, you cannot use a backtick (`) character in the DDL query. The backtick character is required in HiveQL when you include special characters or keywords in a query.
The Spark engine can write to bucketed Hive targets. Bucketing and partitioning of Hive tables can improve performance by reducing data shuffling and sorting.

PreSQL and PostSQL Commands

You can create SQL commands for Hive targets. You can execute the SQL commands to execute SQL statements such as insert, update, and delete on the Hive target.
PreSQL is an SQL command that runs against the Hive target before the mapping reads from the source. PostSQL is an SQL command that runs against the Hive target after the mapping writes to the target.
You can use PreSQL and PostSQL on the Spark engine. The Data Integration Service does not validate PreSQL or PostSQL commands for a Hive target.

Pre-Mapping SQL Commands

PreSQL is an SQL command that runs against a Hive target before the mapping reads from a source.
For example, you might use a Hive target in a mapping. The data stored in the Hive target contains old records from the previous day and you must delete the old records in the Hive target before you run the mapping that writes new records to the target. To delete the old records in the Hive target, you can configure a PreSQL command.

Post-Mapping SQL Commands

PostSQL is an SQL command that runs against a Hive source after the mapping writes to the target.
For example, you might use a Hive target in a mapping that runs in a test environment. After the mapping writes to the target, you might want to delete the records stored in the Hive target so that you can test the same Hive target again in the next mapping run. To delete the records in the Hive target, you can configure a PostSQL command.

Truncating Hive Targets

Truncate Hive target tables to delete the table contents. You can truncate internal and external Hive tables in the Hadoop environment.
Truncate a Hive table in the Hive table properties. The following image shows the Hive table properties:
This image shows the Properties View for the Hive table. The properties to configure are located on the Advanced tab. The selected properties are Truncate target table and Truncate Hive Target Partition. These properties have checkmarks next to them.
To truncate the entire Hive table, choose the option to truncate the target table. To truncate a partition in the Hive table, you must choose to truncate the target table and to truncate the partition in the Hive target table.
Consider the following restrictions when you truncate a Hive table in a mapping that runs in the Hadoop environment:

Updating Hive Targets with an Update Strategy Transformation

For mappings that run on the Spark engine, you can use Hive MERGE statements to perform Update Strategy tasks. When a query uses a MERGE statement instead of INSERT, UPDATE or DELETE statements, processing is more efficient.
To use Hive MERGE, select true for the option in the Advanced Properties of the Update Strategy transformation.
The mapping ignores the Hive MERGE option and the Data Integration Service uses INSERT, UPDATE and DELETE to perform the operation under the following scenarios:
The mapping log contains results of the operation, including whether restrictions affected results.
When the update affects partitioning or bucketing columns, updates to the columns are omitted.
Note: The Developer tool and the Data Integration Service do not validate against this restriction. If the Update Strategy expression violates these restrictions, the mapping might produce unexpected results.

Rules and Guidelines for Hive Targets on the Blaze Engine

You can include Hive targets in an Informatica mapping that runs on the Blaze engine.
Consider the following rules and guidelines when you configure a Hive target in a mapping that runs on the Blaze engine:

RCFile as Hive Tables

The Blaze engine can read and write to RCFile as Hive tables. However, the Blaze engine supports only the ColumnarSerDe SerDe. In Hortonworks, the default SerDe for an RCFile is LazyBinaryColumnarSerDe. To read and write to an RCFile table, you must create the table by specifying the SerDe as org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.
For example:
CREATE TABLE TEST_RCFIle
(id int, name string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;
You can also set the default RCFile SerDe from the Ambari or Cloudera manager. Set the property hive.default.rcfile.serde to org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

Compressed Hive Tables

The Blaze engine can read and write to Hive tables that are compressed. However, to read from a compressed Hive table or write to a Hive table in compressed format, you must set the TBLPROPERTIES clause as follows:
The property name and value are not case sensitive. Depending on the file format, the table property can take different values.
The following table lists the property names and values for different file formats:
File Format
Table Property Name
Table Property Values
Avro
avro.compression
BZIP2, deflate, Snappy
ORC
orc.compress
Snappy, ZLIB
Parquet
parquet.compression
GZIP, Snappy
RCFile
rcfile.compression
Snappy, ZLIB
Sequence
sequencefile.compression
BZIP2, GZIP, LZ4, Snappy
Text
text.compression
BZIP2, GZIP, LZ4, Snappy
Note: Unlike the Hive engine, the Blaze engine does not write data in the default ZLIB compressed format when it writes to a Hive target stored as ORC format. To write in a compressed format, alter the table to set the TBLPROPERTIES clause to use ZLIB or Snappy compression for the ORC file format.
The following text shows sample commands to create table and alter table: