Mass Ingestion Guide > Create > Target
  

Target

The target describes the Hive or HDFS target where you want to ingest the source tables from the relational database.
You specify different target properties depending on the scenario that you select in the specification definition. If you configure the specification to ingest data from a relational database to Hive, you configure a Hive connection and Hive table properties to define the target. If you configure the specification to ingest data from a relational database to HDFS, you configure an HDFS connection and an ingestion directory to define the target.
The mass ingestion solution ingests all data to the target. The mass ingestion solution does not provide the option to append data that has been recently updated. Each time that you run the mass ingestion specification, the existing data in the Hive or HDFS target is deleted and replaced with the data configured in the ingestion job.

Configuring a Hive Target

Configure a Hive target to ingest data to a Hive table. When you configure the mass ingestion specification to ingest data to a Hive target, you configure a Hive connection and Hive properties to define the target.
You can ingest data to an internal or external Hive table. Internal Hive tables are managed by Hive. External Hive tables are unmanaged tables. You can specify an external location for an external Hive table such as Amazon S3, Azure Blob, HDFS, WASB, or ADLS.
The following image shows the Target page for a Hive target:
This screenshot shows the Target page of the mass ingestion specification for a Hive target. On the Target page, you can configure properties to define the Hive target. The table below this image lists the properties that you can configure. There is a note at the bottom of the page that says "Partition and cluster options are available on the Table Parameters page." In the top-right corner, you have the option Next to go to the next page, or the button X to discard the specification.
The following table describes the properties that you can configure to define the Hive target:
Property
Description
Target Connection
Required. The Hive connection used to find the Hive storage target.
If changes are made to the available Hive connections, refresh the browser or log out and log back in to the Mass Ingestion tool.
Target Schema
Required. The schema that defines the target tables.
Target Table Prefix
The prefix added to the names of the target tables.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Target Table Suffix
The suffix added to the names of the target tables.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Hive Options
Select this option to configure the Hive target location.
DDL Query
Select this option to configure a custom DDL query that defines how data from the source tables is loaded to the target tables.
Storage Format
Required. The storage format of the target tables. You can select Text, Avro, Parquet, or ORC. Default is Text.
External Table
Select this option if the table is external.
External Location
The external location of the Hive target. By default, tables are written to the default Hive warehouse directory.
A sub-directory is created under the specified external location for each source that is ingested. For example, you can enter /temp. A source table named PRODUCT is ingested to the external location
/temp/PRODUCT/
Configure partition and cluster properties for specific target Hive tables when you configure the transformation override.

DDL Query

When you configure a mass ingestion specification to ingest data to a Hive target, you can configure a custom DDL query to define how data from the source tables is loaded to the target tables.
You can define the DDL query to customize the target table or specify additional parameters. The target table contains the columns that you define in the DDL query.
To define a DDL query, use SQL statements and placeholders. Use the placeholders to fetch the table name, column list, and column names. The Data Integration Service substitutes the placeholders with actual values at run time according to the tables that you ingest. You must enclose the placeholders within curly brackets. For example, {INFA_TABLE_NAME}.
You can use the following placeholders:
INFA_TABLE_NAME
Fetches the target table name at run time.
INFA_COLUMN_LIST
Fetches a list of columns in the target table at run time.
For example, you might ingest a table CUSTOMER. To define how you want to ingest the table in the target, you can enter the following DDL query:
CREATE TABLE {INFA_TABLE_NAME} ({INFA_COLUMN_LIST}) CLUSTERED BY (LAST_NAME) INTO 10 BUCKETS STORED AS TEXT
At run time, the Data Integration Service substitutes {INFA_TABLE_NAME} with CUSTOMER, and it substitutes {INFA_COLUMN_LIST} with the list of columns that appear in the table CUSTOMER. The Data Integration Service might expand the DDL query to the following query:
CREATE TABLE CUSTOMER (FIRST_NAME STRING, LAST_NAME STRING, EMAIL STRING, GENDER STRING, CREDIT_CARD DECIMAL (38,0), CREDIT_CARD_TYPE STRING, STATE, STRING, USSTATE STRING, CITY STRING) CLUSTERED BY (LAST_NAME) INTO 10 BUCKETS STORED AS TEXT
Note: You cannot use a placeholder to specify the partition columns and clustered by columns. When you specify the partition columns and clustered by columns, enter the column name in the DDL query.
The following image shows the option to configure a DDL query for a Hive target:
This image shows the Hive target table properties where DDL Query is selected. The options to configure the storage format and external location are replaced with the option to configure the query.

Configuring an HDFS Target

Configure an HDFS target to ingest data to a flat file on HDFS. When you configure the mass ingestion specification to ingest data to an HDFS target, you configure an HDFS connection and an ingestion directory to define the target.
The following image shows the Target page for an HDFS target:
This screenshot shows the Target page of the mass ingestion specification for an HDFS target. On the Target page, you can configure properties to define the HDFS target. The table below this image lists the properties that you can configure. There is a note at the bottom of the page that says "Partition and cluster options are available on the Table Parameters page." In the top-right corner, you have the option Next to go to the next page, or the button X to discard the specification.
The following table describes the properties that you can configure to define the HDFS target:
Property
Description
Target Connection
Required. The HDFS connection used to find the HDFS storage target.
If changes are made to the available HDFS connections, refresh the browser or log out and log back in to the Mass Ingestion tool.
Target Table Prefix
The prefix added to the names of the target files.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Target Table Suffix
The suffix added to the names of the target files.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Ingestion Directory
Required. The target directory on HDFS. A sub-directory is created under the ingestion directory for each source that is ingested.
If the specified directory already exists, the directory is replaced.
For example, you can enter /temp. A source table named PRODUCT is ingested to the directory
/temp/PRODUCT/
Compression
Required. The compressed file format that stores the target files. You can select None, Gzip, Bzip2, LZO, Snappy, or Custom. If you select Custom, enter the compression codec. Default is None.
Compression Codec
If you select custom compression, enter the fully qualified class name implementing the Hadoop CompressionCodec interface.
Delimiters
The delimiters used to separate data in the target files. You can select comma, semicolon, space, tab, or other. If you select Other, you can define a custom delimiter.
Other Delimiter
Required if you choose Other Delimiter. Enter a custom delimiter.

Compression Codec

When you configure a mass ingestion specification to ingest data to an HDFS target directory, you can configure a compression codec to write the ingested data to a compressed file.
You can select one of the following compression options:
If you specify a custom compression codec, you must specify the fully qualified class name implementing the Hadoop CompressionCodec interface.