Lookup Transformation in the Hadoop Environment
The Lookup transformation processing in the Hadoop environment depends on the engine that runs the transformation.
Lookup Transformation Support on the Blaze Engine
Mapping validation fails in the following situations:
- •The cache is configured to be shared, named, persistent, dynamic, or uncached. The cache must be a static cache.
If you add a data object that uses Sqoop as a Lookup transformation in a mapping, the Data Integration Service does not run the mapping through Sqoop. It runs the mapping through JDBC.
Lookup Transformation Support on the Spark Engine
Some processing rules for the Spark engine differ from the processing rules for the Data Integration Service.
Mapping Validation
Mapping validation fails in the following situations:
- •Case sensitivity is disabled.
- •The lookup condition in the Lookup transformation contains binary data type.
- •The cache is configured to be shared, named, persistent, dynamic, or uncached. The cache must be a static cache.
The mapping fails in the following situation:
- •The transformation is unconnected and used with a Joiner or Java transformation.
Multiple Matches
When you choose to return the first, last, or any value on multiple matches, the Lookup transformation returns any value.
If you configure the transformation to report an error on multiple matches, the Spark engine drops the duplicate rows and does not include the rows in the logs.
Lookup Transformation Support on the Hive Engine
If you add a data object that uses Sqoop as a Lookup transformation in a mapping, the Data Integration Service does not run the mapping through Sqoop. It runs the mapping through JDBC.
When you a run mapping that contains a Lookup transformation, the Data Integration Service creates lookup cache .jar files. Hive copies the lookup cache .jar files to the following temporary directory:/tmp/<user_name>/hive_resources . The Hive parameter hive.downloaded.resources.dir determines the location of the temporary directory. You can delete the lookup cache .jar files specified in the LDTM log after the mapping completes to retrieve disk space.
Mapping Validation
Mapping validation fails in the following situations:
- •The cache is configured to be shared, named, persistent, dynamic, or uncached. The cache must be a static cache.
- •The lookup is a relational Hive data source.
Mappings fail in the following situations:
- •The lookup is unconnected.