You can create mappings in SQL ELT mode to read data from Databricks, Amazon S3, or Microsoft Azure Data Lake Storage Gen2, load it to Databricks, and perform all of the data transformation within Databricks.
To create a mapping in SQL ELT mode, you create a mapping, and then select Mapping - SQL ELT as the mapping type. You're then prompted to choose Databricks as the cloud ecosystem and a Databricks target connection. If your organization doesn't have any Databricks connections, you're prompted to create one.
After you choose the target connection, the Mapping Designer opens. When you create a mapping, a Source transformation and a Target transformation are already on the canvas for you to configure.
Sources in mappings in SQL ELT mode
When you configure the source connection in the Source transformation, you can choose an Amazon S3 V2, a Databricks, or a Microsoft Azure Data Lake Storage Gen2 connection.
Amazon S3 V2 source properties
You can configure the following properties for an Amazon S3 V2 source:
•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, Parameter
•Format - Delimited, Avro, JSON, ORC, and Parquet
•Source Type - File and directory
•Folder Path
•File Name
For information on how to configure the supported properties, see the help for Amazon S3 V2 Connector.
Databricks source properties
You can configure the following properties for a Databricks source:
•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, query, parameter
•Database Name
•Table Name
•SQL Override
Microsoft Azure Data Lake Storage Gen2 source properties
You can configure the following properties for a Microsoft Azure Data Lake Storage Gen2 source:
•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, Parameter
•Format - Delimited, Avro, JSON, ORC, and Parquet
•Filesystem Name Override
•Source Type - File and directory
•Allow Wildcard Characters
•Directory Override
•File Name Override
For information on how to configure the supported properties, see the help for Microsoft Azure Data Lake Storage Gen2 Connector.
Targets in mappings in SQL ELT mode
When you configure a Target transformation in a mapping in SQL ELT mode, you need to use only a Databricks connection.
You can configure the following properties in a Databricks target transformation in a mapping in SQL ELT mode:
•Target Type - Single Object, Parameter, Create New Target at Runtime
•Operation - Insert, Update, Delete, Data Driven
•Target Database Name
•Target Table Name
•Write Disposition - Append, Truncate
•Update Mode - Update as Update, Update else Insert
For information on how the targets in mappings in SQL ELT mode behaves differently from the targets in other types of mappings, see Targets in mappings in SQL ELT mode in Mappings in the Data Integration help.
Transformations in mappings in SQL ELT mode
A mapping in SQL ELT mode includes transformations that Databricks can process.
You can use the following transformations in a mapping in SQL ELT mode:
•Source
•Target
•Aggregator
•Expression
•Filter
•Joiner
•Lookup
•Rank
•Router
•Sorter
•Union
Functions in mappings in SQL ELT mode
When you create expressions within a mapping in SQL ELT mode, you must use the native functions and expression syntax of Databricks and not Informatica functions and expression syntax.
You can use the following native functions in a mapping in SQL ELT mode:
Aggregate functions
ANY()
FIRST()
REGR_INTERCEPT()
ANY_VALUE()
FIRST_VALUE()
REGR_R2()
APPROX_COUNT_DISTINCT()
HLL_SKETCH_AGG()
REGR_SLOPE()
APPROX_PERCENTILE()
HLL_UNION_AGG()
REGR_SXX()
AVG()
KURTOSIS()
REGR_SXY()
BIT_AND()
LAST()
REGR_SYY()
BIT_OR()
LAST_VALUE()
SKEWNESS()
BIT_XOR()
MAX()
SOME()
BITMAP_CONSTRUCT_AGG()
MAX_BY()
STD()
BITMAP_OR_AGG()
MEAN()
STDDEV()
BOOL_AND()
MEDIAN()
STDDEV_POP()
BOOL_OR()
MIN()
STDDEV_SAMP()
CORR()
MIN_BY()
SUM()
COUNT()
MODE()
TRY_AVG()
COUNT_IF()
PERCENTILE()
TRY_SUM()
COUNT_MIN_SKETCH()
PERCENTILE_APPROX(
VAR_POP()
COVAR_POP()
REGR_AVGX()
VAR_SAMP()
COVAR_SAMP()
REGR_AVGY()
VARIANCE()
EVERY()
REGR_COUNT()
-
AI functions
AI_ANALYZE_SENTIMENT()
AI_GEN()
AI_SUMMARIZE()
AI_FIX_GRAMMAR()
AI_SIMILARITY()
AI_TRANSLATE()
Cast functions
BOOLEAN()
TO_NUMBER()
TRY_TO_NUMBER()
TO_CHAR()
TO_VARCHAR()
-
Date and time functions
ADD_MONTHS()
DAYOFYEAR()
TIMESTAMP()
CURDATE()
FROM_UNIXTIME()
TIMESTAMP_SECONDS()
CURRENT_DATE()
FROM_UTC_TIMESTAMP()
TO_DATE()
CURRENT_TIMESTAMP()
GETDATE()
TO_TIMESTAMP()
CURRENT_TIMEZONE()
HOUR()
TO_UNIX_TIMESTAMP()
DATE_ADD()
LAST_DAY()
TO_UTC_TIMESTAMP()
DATE_FORMAT()
MAKE_DATE()
TRUNC()
DATE_FROM_UNIX_DATE()
MAKE_TIMESTAMP()
TRY_TO_TIMESTAMP()
DATE_PART()
MINUTE()
UNIX_DATE()
DATE()
MONTH()
UNIX_MICROS()
DATE_SUB()
MONTHS_BETWEEN()
UNIX_MILLIS()
DATE_TRUNC()
NEXT_DAY()
UNIX_SECONDS()
DATEADD()
NOW()
UNIX_TIMESTAMP()
DATEDIFF()
QUARTER()
WEEKDAY()
DAY()
SECOND()
WEEKOFYEAR()
DAYOFMONTH()
TIMESTAMP_MICROS()
YEAR()
DAYOFWEEK()
TIMESTAMP_MILLIS()
-
Miscellaneous functions
ASSERT_TRUE()
IFF()
NVL()
COALESCE()
IFNULL()
RAISE_ERROR()
CURRENT_CATALOG()
INPUT_FILE_BLOCK_LENGTH()
REFLECT()
CURRENT_DATABASE()
INPUT_FILE_BLOCK_START()
SECRET()
CURRENT_METASTORE()
INPUT_FILE_NAME()
SESSION_USER()
CURRENT_SCHEMA()
IS_ACCOUNT_GROUP_MEMBER()
SPARK_PARTITION_ID()
CURRENT_USER()
IS_MEMBER()
TRY_REFLECT()
ELT()
ISNOTNULL()
TYPEOF()
EQUAL_NULL()
ISNULL()
USER()
GREATEST()
JAVA_METHOD()
UUID()
HASH()
LEAST()
VERSION()
HLL_SKETCH_ESTIMATE()
LUHN_CHECK()
XXHASH64()
HLL_UNION()
MONOTONICALLY_INCREASING_ID()
-
IF()
NULLIF()
-
Numeric functions
ABS()
DOUBLE()
RAND()
ACOSH()
E()
RANDN()
ACOS()
EXP()
RANDOM()
ASIN()
EXPM1()
RINT()
ASINH()
FACTORIAL()
ROUND()
ATAN()
FLOAT()
SEC()
ATAN2()
FLOOR()
SHIFTLEFT()
ATANH()
GETBIT()
SHIFTRIGHT()
BIGINT()
HYPOT()
SHIFTRIGHTUNSIGNED()
BIT_COUNT()
INT()
SIGN()
BIT_GET()
ISNAN()
SIGNUM()
BIT_REVERSE()
LN()
SIN()
BITMAP_BIT_POSITION()
LOG()
SINH()
BITMAP_BUCKET_NUMBER()
LOG10()
SMALLINT()
BROUND()
LOG1P()
SQRT()
CBRT()
LOG2()
TAN()
CEIL()
MOD()
TANH()
CEILING()
NANVL()
TINYINT()
CONV()
NEGATIVE()
TRY_ADD()
COS()
PI()
TRY_DIVIDE()
COSH()
PMOD()
TRY_MULTIPLY()
COT()
POSITIVE()
TRY_SUBTRACT()
CSC()
POW()
WIDTH_BUCKET()
DECIMAL()
POWER()
-
DEGREES()
RADIANS()
-
String and binary functions
AES_DECRYPT()
INSTR()
RPAD()
AES_ENCRYPT()
LCASE()
RTRIM()
ASCII()
LEFT()
SHA()
BASE64()
LEN()
SHA1()
BIN()
LENGTH()
SHA2()
BINARY()
LEVENSHTEIN()
SOUNDEX()
BIT_LENGTH()
LOCATE()
SPACE()
BITMAP_COUNT()
LOWER()
SPLIT_PART()
BTRIM()
LPAD()
STARTSWITH()
CHAR()
LTRIM()
STRING()
CHAR_LENGTH()
MASK()
SUBSTR()
CHARACTER_LENGTH()
MD5()
SUBSTRING()
CHARINDEX()
OCTET_LENGTH()
SUBSTRING_INDEX()
CHR()
OVERLAY()
TO_BINARY()
CONCAT()
PARSE_URL()
TRANSLATE()
CONCAT_WS()
POSITION()
TRIM()
CONTAINS()
REGEXP_COUNT()
TRY_AES_DECRYPT()
CRC32()
REGEXP_EXTRACT()
TRY_TO_BINARY()
DECODE()
REGEXP_INSTR()
UCASE()
ENCODE()
REGEXP_REPLACE()
UNBASE64()
ENDSWITH()
REGEXP_SUBSTR()
UNHEX()
FIND_IN_SET()
REPEAT()
UPPER()
FORMAT_NUMBER()
REPLACE()
URL_DECODE()
HEX()
REVERSE()
URL_ENCODE()
INITCAP()
RIGHT()
-
Window functions
ANY()
FIRST_VALUE()
REGR_AVGY()
ANY_VALUE()
HLL_SKETCH_AGG()
REGR_COUNT()
APPROX_COUNT_DISTINCT()
HLL_UNION_AGG()
REGR_INTERCEPT()
APPROX_PERCENTILE()
KURTOSIS()
REGR_R2()
AVG()
LAG()
REGR_SLOPE()
BIT_AND()
LAST()
REGR_SXX()
BIT_OR()
LAST_VALUE()
REGR_SXY()
BIT_XOR()
LEAD()
REGR_SYY()
BITMAP_CONSTRUCT_AGG()
MAX()
ROW_NUMBER()
BITMAP_OR_AGG()
MAX_BY()
SKEWNESS()
BOOL_AND()
MEAN()
SOME()
BOOL_OR()
MEDIAN()
STD()
CORR()
MIN()
STDDEV()
COUNT()
MIN_BY()
STDDEV_POP()
COUNT_IF()
MODE()
STDDEV_SAMP()
COUNT_MIN_SKETCH()
NTH_VALUE()
SUM()
COVAR_POP()
NTILE()
TRY_AVG()
COVAR_SAMP()
PERCENT_RANK()
TRY_SUM()
CUME_DIST()
PERCENTILE()
VAR_POP()
DENSE_RANK()
PERCENTILE_APPROX(
VAR_SAMP()
EVERY()
RANK()
VARIANCE()
FIRST()
REGR_AVGX()
-
For more information about the native functions and expression syntax of Databricks, see Databricks functions in Databricks documentation.
Operators in mappings in SQL ELT mode
When you use mappings in SQL ELT mode, Data Integration converts the expression in the transformation by determining equivalent operators in the database.
The table lists the operators that you can push to Databricks:
Operator
Operator
Operator
+
||
<=
-
>
!=
*
<
AND
/
=
OR
%
>=
NOT
Rules and guidelines in mappings in SQL ELT mode
Consider the following rules and guidelines when you run mappings in SQL ELT mode:
Functions
•When you use the following functions, ensure that all the arguments are of the same data type:
- COALESCE
- DECODE
- ELT
- GREATEST
- IF
- IFF
- IFNULL
- LEAST
- NVL
- NVL2
•When you use the following functions, the number of optional arguments must not exceed five:
- COALESCE
- CONCAT
- CONCAT_WS
- COUNT
- ELT
- GREATEST
- HASH
- JAVA_METHOD
- LEAST
- REFLECT
- TRY_REFLECT
- XXHASH64
In the following example of CONCAT_WS function, the first argument ',' is mandatory and the other arguments are optional.
The output of this function is This,is,a,Spark,SQL
•When you use the LEAD() or LAG() function and the default value that the expression returns is of Boolean data type, specify the function in the following format:
LEAD(flag,3,BOOLEAN(TRUE)) or LAG(flag,3,BOOLEAN(TRUE))
•Do not use keywords or function names as column names or output port names.
•When you configure a Filter transformation and specify a simple filter condition on columns of date or timestamp in a Databricks table, you must pass the data through the TO_DATE() or TO_TIMESTAMP() function as an expression in the filter condition.
•You cannot use nested aggregate functions in an Aggregator transformation. For example, you cannot specify SUM(AVG(col1)) in the aggregate function. Use nested aggregations by adding multiple Aggregator transformations in the mapping.
•When you use the COUNT_MIN_SKETCH() function in an Aggregator or Expression transformation, ensure that the columns that you pass as epsilon and confidence arguments are of double data type.
If the fields are not of double data type, use an Expression transformation to first convert the column data type to double, and then use the returned columns in a downstream Aggregator or Expression transformation.