Databricks Connector > SQL ELT with Databricks Connector > Mappings in SQL ELT mode for Databricks

Mappings in SQL ELT mode for Databricks

You can create mappings in SQL ELT mode to read data from Databricks, Amazon S3, or Microsoft Azure Data Lake Storage Gen2, load it to Databricks, and perform all of the data transformation within Databricks.

To create a mapping in SQL ELT mode, you create a mapping, and then select Mapping - SQL ELT as the mapping type. You're then prompted to choose Databricks as the cloud ecosystem and a Databricks target connection. If your organization doesn't have any Databricks connections, you're prompted to create one.

After you choose the target connection, the Mapping Designer opens. When you create a mapping, a Source transformation and a Target transformation are already on the canvas for you to configure.

Sources in mappings in SQL ELT mode

When you configure the source connection in the Source transformation, you can choose an Amazon S3 V2, a Databricks, or a Microsoft Azure Data Lake Storage Gen2 connection.

Amazon S3 V2 source properties

You can configure the following properties for an Amazon S3 V2 source:

•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, Parameter
•Format - Delimited, Avro, JSON, ORC, Parquet, and Delta
•Source Type - File and directory
•Folder Path
•File Name

For information on how to configure the supported properties, see the help for Amazon S3 V2 Connector.

Databricks source properties

You can configure the following properties for a Databricks source:

•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, query, parameter
•Database Name
•Table Name
•SQL Override

Google Cloud Storage source properties

You can configure the following properties for a Google Cloud Storage source:

•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, Parameter
•Format - Delimited, Avro, JSON, ORC, and Parquet
•Google Cloud Storage Path
•Source File Name
•Is Directory

For information on how to configure the supported properties, see the help for Google Cloud Storage V2 Connector.

Microsoft Azure Data Lake Storage Gen2 source properties

You can configure the following properties for a Microsoft Azure Data Lake Storage Gen2 source:

•Source connection - Parameter, Allow parameter to be overridden at run time
•Source Type - Single, Parameter
•Format - Delimited, Avro, JSON, ORC, Parquet, and Delta
•Filesystem Name Override
•Source Type - File and directory
•Allow Wildcard Characters
•Directory Override
•File Name Override

For information on how to configure the supported properties, see the help for Microsoft Azure Data Lake Storage Gen2 Connector.

Targets in mappings in SQL ELT mode

When you configure a Target transformation in a mapping in SQL ELT mode, you need to use only a Databricks connection.

You can configure the following properties in a Databricks target transformation in a mapping in SQL ELT mode:

•Target Type - Single Object, Parameter, Create New Target at Runtime
•Operation - Insert, Update, Delete, Data Driven
•Target Database Name
•Target Table Name
•Write Disposition - Append, Truncate
•Update Mode - Update as Update, Update else Insert

For information on how the targets in mappings in SQL ELT mode behaves differently from the targets in other types of mappings, see Targets in mappings in SQL ELT mode in Mappings in the Data Integration help.

Transformations in mappings in SQL ELT mode

A mapping in SQL ELT mode includes transformations that Databricks can process.

You can use the following transformations in a mapping in SQL ELT mode:

•Source
•Target
•Aggregator
•Expression
•Filter
•Joiner
•Lookup

You can configure Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage Gen2 as sources in a Lookup transformation.

•Rank
•Router
•Sorter
•Union

Functions in mappings in SQL ELT mode

When you create expressions within a mapping in SQL ELT mode, you must use the native functions and expression syntax of Databricks and not Informatica functions and expression syntax.

You can use the following native functions in a mapping in SQL ELT mode:

Aggregate functions

ANY()	FIRST()	REGR_INTERCEPT()
ANY_VALUE()	FIRST_VALUE()	REGR_R2()
APPROX_COUNT_DISTINCT()	HLL_SKETCH_AGG()	REGR_SLOPE()
APPROX_PERCENTILE()	HLL_UNION_AGG()	REGR_SXX()
AVG()	KURTOSIS()	REGR_SXY()
BIT_AND()	LAST()	REGR_SYY()
BIT_OR()	LAST_VALUE()	SKEWNESS()
BIT_XOR()	MAX()	SOME()
BITMAP_CONSTRUCT_AGG()	MAX_BY()	STD()
BITMAP_OR_AGG()	MEAN()	STDDEV()
BOOL_AND()	MEDIAN()	STDDEV_POP()
BOOL_OR()	MIN()	STDDEV_SAMP()
CORR()	MIN_BY()	SUM()
COUNT()	MODE()	TRY_AVG()
COUNT_IF()	PERCENTILE()	TRY_SUM()
COUNT_MIN_SKETCH()	PERCENTILE_APPROX(	VAR_POP()
COVAR_POP()	REGR_AVGX()	VAR_SAMP()
COVAR_SAMP()	REGR_AVGY()	VARIANCE()
EVERY()	REGR_COUNT()	-

AI functions

AI_ANALYZE_SENTIMENT()	AI_GEN()	AI_SUMMARIZE()
AI_FIX_GRAMMAR()	AI_SIMILARITY()	AI_TRANSLATE()

Cast functions

BOOLEAN()	TO_NUMBER()	TRY_TO_NUMBER()
TO_CHAR()	TO_VARCHAR()	-

Date and time functions

ADD_MONTHS()	DAYOFYEAR()	TIMESTAMP()
CURDATE()	FROM_UNIXTIME()	TIMESTAMP_SECONDS()
CURRENT_DATE()	FROM_UTC_TIMESTAMP()	TO_DATE()
CURRENT_TIMESTAMP()	GETDATE()	TO_TIMESTAMP()
CURRENT_TIMEZONE()	HOUR()	TO_UNIX_TIMESTAMP()
DATE_ADD()	LAST_DAY()	TO_UTC_TIMESTAMP()
DATE_FORMAT()	MAKE_DATE()	TRUNC()
DATE_FROM_UNIX_DATE()	MAKE_TIMESTAMP()	TRY_TO_TIMESTAMP()
DATE_PART()	MINUTE()	UNIX_DATE()
DATE()	MONTH()	UNIX_MICROS()
DATE_SUB()	MONTHS_BETWEEN()	UNIX_MILLIS()
DATE_TRUNC()	NEXT_DAY()	UNIX_SECONDS()
DATEADD()	NOW()	UNIX_TIMESTAMP()
DATEDIFF()	QUARTER()	WEEKDAY()
DAY()	SECOND()	WEEKOFYEAR()
DAYOFMONTH()	TIMESTAMP_MICROS()	YEAR()
DAYOFWEEK()	TIMESTAMP_MILLIS()	-

Miscellaneous functions

ASSERT_TRUE()	IFF()	NVL()
COALESCE()	IFNULL()	RAISE_ERROR()
CURRENT_CATALOG()	INPUT_FILE_BLOCK_LENGTH()	REFLECT()
CURRENT_DATABASE()	INPUT_FILE_BLOCK_START()	SECRET()
CURRENT_METASTORE()	INPUT_FILE_NAME()	SESSION_USER()
CURRENT_SCHEMA()	IS_ACCOUNT_GROUP_MEMBER()	SPARK_PARTITION_ID()
CURRENT_USER()	IS_MEMBER()	TRY_REFLECT()
ELT()	ISNOTNULL()	TYPEOF()
EQUAL_NULL()	ISNULL()	USER()
GREATEST()	JAVA_METHOD()	UUID()
HASH()	LEAST()	VERSION()
HLL_SKETCH_ESTIMATE()	LUHN_CHECK()	XXHASH64()
HLL_UNION()	MONOTONICALLY_INCREASING_ID()	-
IF()	NULLIF()	-

Numeric functions

ABS()	DOUBLE()	RAND()
ACOSH()	E()	RANDN()
ACOS()	EXP()	RANDOM()
ASIN()	EXPM1()	RINT()
ASINH()	FACTORIAL()	ROUND()
ATAN()	FLOAT()	SEC()
ATAN2()	FLOOR()	SHIFTLEFT()
ATANH()	GETBIT()	SHIFTRIGHT()
BIGINT()	HYPOT()	SHIFTRIGHTUNSIGNED()
BIT_COUNT()	INT()	SIGN()
BIT_GET()	ISNAN()	SIGNUM()
BIT_REVERSE()	LN()	SIN()
BITMAP_BIT_POSITION()	LOG()	SINH()
BITMAP_BUCKET_NUMBER()	LOG10()	SMALLINT()
BROUND()	LOG1P()	SQRT()
CBRT()	LOG2()	TAN()
CEIL()	MOD()	TANH()
CEILING()	NANVL()	TINYINT()
CONV()	NEGATIVE()	TRY_ADD()
COS()	PI()	TRY_DIVIDE()
COSH()	PMOD()	TRY_MULTIPLY()
COT()	POSITIVE()	TRY_SUBTRACT()
CSC()	POW()	WIDTH_BUCKET()
DECIMAL()	POWER()	-
DEGREES()	RADIANS()	-

String and binary functions

AES_DECRYPT()	INSTR()	RPAD()
AES_ENCRYPT()	LCASE()	RTRIM()
ASCII()	LEFT()	SHA()
BASE64()	LEN()	SHA1()
BIN()	LENGTH()	SHA2()
BINARY()	LEVENSHTEIN()	SOUNDEX()
BIT_LENGTH()	LOCATE()	SPACE()
BITMAP_COUNT()	LOWER()	SPLIT_PART()
BTRIM()	LPAD()	STARTSWITH()
CHAR()	LTRIM()	STRING()
CHAR_LENGTH()	MASK()	SUBSTR()
CHARACTER_LENGTH()	MD5()	SUBSTRING()
CHARINDEX()	OCTET_LENGTH()	SUBSTRING_INDEX()
CHR()	OVERLAY()	TO_BINARY()
CONCAT()	PARSE_URL()	TRANSLATE()
CONCAT_WS()	POSITION()	TRIM()
CONTAINS()	REGEXP_COUNT()	TRY_AES_DECRYPT()
CRC32()	REGEXP_EXTRACT()	TRY_TO_BINARY()
DECODE()	REGEXP_INSTR()	UCASE()
ENCODE()	REGEXP_REPLACE()	UNBASE64()
ENDSWITH()	REGEXP_SUBSTR()	UNHEX()
FIND_IN_SET()	REPEAT()	UPPER()
FORMAT_NUMBER()	REPLACE()	URL_DECODE()
HEX()	REVERSE()	URL_ENCODE()
INITCAP()	RIGHT()	-

Window functions

ANY()	FIRST_VALUE()	REGR_AVGY()
ANY_VALUE()	HLL_SKETCH_AGG()	REGR_COUNT()
APPROX_COUNT_DISTINCT()	HLL_UNION_AGG()	REGR_INTERCEPT()
APPROX_PERCENTILE()	KURTOSIS()	REGR_R2()
AVG()	LAG()	REGR_SLOPE()
BIT_AND()	LAST()	REGR_SXX()
BIT_OR()	LAST_VALUE()	REGR_SXY()
BIT_XOR()	LEAD()	REGR_SYY()
BITMAP_CONSTRUCT_AGG()	MAX()	ROW_NUMBER()
BITMAP_OR_AGG()	MAX_BY()	SKEWNESS()
BOOL_AND()	MEAN()	SOME()
BOOL_OR()	MEDIAN()	STD()
CORR()	MIN()	STDDEV()
COUNT()	MIN_BY()	STDDEV_POP()
COUNT_IF()	MODE()	STDDEV_SAMP()
COUNT_MIN_SKETCH()	NTH_VALUE()	SUM()
COVAR_POP()	NTILE()	TRY_AVG()
COVAR_SAMP()	PERCENT_RANK()	TRY_SUM()
CUME_DIST()	PERCENTILE()	VAR_POP()
DENSE_RANK()	PERCENTILE_APPROX(	VAR_SAMP()
EVERY()	RANK()	VARIANCE()
FIRST()	REGR_AVGX()	-

For more information about the native functions and expression syntax of Databricks, see Databricks functions in Databricks documentation.

Operators in mappings in SQL ELT mode

When you use mappings in SQL ELT mode, Data Integration converts the expression in the transformation by determining equivalent operators in the database.

The table lists the operators that you can push to Databricks:

Operator	Operator	Operator
+	\|\|	<=
-	>	!=
*	<	AND
/	=	OR
%	>=	NOT

Rules and guidelines for mappings in SQL ELT mode

Consider the following rules and guidelines when you run mappings in SQL ELT mode:

Functions

•When you use the following functions, ensure that all the arguments are of the same data type:

- COALESCE
- DECODE
- ELT
- GREATEST
- IF
- IFF
- IFNULL
- LEAST
- NVL
- NVL2

•When you use the following functions, the number of optional arguments must not exceed five:

- COALESCE
- CONCAT
- CONCAT_WS
- COUNT
- ELT
- GREATEST
- HASH
- JAVA_METHOD
- LEAST
- REFLECT
- TRY_REFLECT
- XXHASH64

In the following example of CONCAT_WS function, the first argument ',' is mandatory and the other arguments are optional.

concat_ws(',', 'This', 'is', 'a', 'Spark', 'SQL');

The output of this function is This,is,a,Spark,SQL

•When a String field with date or timestamp values in the source is mapped to a Date/Time field in the target, or a Date/Time field in the source is mapped to a String field with date or timestamp values in the target, ensure that the date and timestamp values are in the format supported by Databricks.

For example, use yyyy-MM-dd format for date values and yyyy-MM-dd HH:mm:ss format for timestamp values in a String field.

For more information about the date formats supported by Databricks, see Databricks datetime formats.

•When you connect to Databricks endpoints hosted on Google Cloud Platform, you cannot use the AI functions.
•When you use the LEAD() or LAG() function and the default value that the expression returns is of Boolean data type, specify the function in the following format:

LEAD(flag,3,BOOLEAN(TRUE)) or LAG(flag,3,BOOLEAN(TRUE))

•Do not use keywords or function names as column names or output port names.
•When you configure a Filter transformation and specify a simple filter condition on columns of date or timestamp in a Databricks table, you must pass the data through the TO_DATE() or TO_TIMESTAMP() function as an expression in the filter condition.
•You cannot use nested aggregate functions in an Aggregator transformation. For example, you cannot specify SUM(AVG(col1)) in the aggregate function. Use nested aggregations by adding multiple Aggregator transformations in the mapping.
•When you use the COUNT_MIN_SKETCH() function in an Aggregator or Expression transformation, ensure that the columns that you pass as epsilon and confidence arguments are of double data type.

If the fields are not of double data type, use an Expression transformation to first convert the column data type to double, and then use the returned columns in a downstream Aggregator or Expression transformation.