Transformation Data Types
The following table describes the transformation data types:
Data Type | Size in Bytes | Description |
---|
Array | Unlimited number of characters. | Complex data type. You can use arrays with complex sources and targets. |
Bigint | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Precision of 19, scale of 0 Integer value. |
Binary | Precision | 1 to 104,857,600 bytes You cannot use binary data for flat file sources. |
Date/Time | 16 bytes | Jan 1, 0001 A.D. to Dec 31, 9999 A.D. Precision of 29, scale of 9 (precision to the nanosecond) Combined date/time value. |
Decimal | 8 bytes (if high precision is off or precision is greater than 38) 16 bytes (if precision <= 18 and high precision is on) 20 bytes (if precision > 18 and <= 28) 24 bytes (if precision > 28 and <= 38) | Decimal value with declared precision and scale. Scale must be less than or equal to precision. For transformations that support precision up to 38 digits, the precision is 1 to 38 digits, and the scale is 0 to 38. For transformations that support precision up to 28 digits, the precision is 1 to 28 digits, and the scale is 0 to 28. If you specify the precision greater than the maximum number of digits, the Data Integration Service converts decimal values to double in high precision mode. |
Double | 8 bytes | Double-precision floating-point numeric value. You can edit the precision and scale. The scale must be less than or equal to the precision. |
Integer | 4 bytes | -2,147,483,648 to 2,147,483,647 Precision of 10, scale of 0 Integer value. |
Map | Unlimited number of characters. | Complex data type. You can use maps with complex sources and targets. |
String | Unicode mode: (precision + 1) * 2 ASCII mode: precision + 1 | 1 to 104,857,600 characters Fixed-length or varying-length string. |
Struct | Unlimited number of characters. | Complex data type You can use structs with complex sources and targets. |
Text | Unicode mode: (precision + 1) * 2 ASCII mode: precision + 1 | 1 to 104,857,600 characters Fixed-length or varying-length string. |
timestampWithTZ | 40 bytes | Aug. 1, 1947 A.D to Dec. 31, 2040 A.D. -12:00 to +14:00 Precision of 36 and scale of 9. (precision to the nanosecond) Timestamp with Time Zone data type does not support the following time zone regions: - - AFRICA_CAIRO
- - AFRICA_MONROVIA
- - EGYPT
- - AMERICA_MONTREAL
|
Integer Data Types
You can pass integer data from sources to targets and perform transformations on integer data. The transformation language supports Bigint and Integer data types.
The transformation integer data types represent exact values.
Integer Values in Calculations
When you use integer values in calculations, the Data Integration Service sometimes converts integer values to floating-point numbers before it performs the calculation. For example, to evaluate MOD( 12.00, 5 ), the Data Integration Service converts the integer value “5” to a floating-point number before it performs the division operation. The Data Integration Service converts integer values to double or decimal values depending on whether you enable high precision.
The Data Integration Service converts integer values in the following arithmetic operations:
Arithmetic Operation | High Precision Disabled | High Precision Enabled |
---|
Functions and calculations that cannot introduce decimal points. For example, integer addition, subtraction, and multiplication, and functions such as CUME, MOVINGSUM, and SUM. | No conversion | Decimal |
Non-scientific functions and calculations that can introduce decimal points. For example, integer division, and functions such as AVG, MEDIAN, and PERCENTILE. | Double | Decimal |
All scientific functions and the EXP, LN, LOG, POWER, and SQRT functions. | Double | Double |
The transformation Double data type supports precision of up to 15 digits, while the Bigint data type supports precision of up to 19 digits. Therefore, precision loss can occur in calculations that produce Bigint values with precision of more than 15 digits.
For example, an expression transformation contains the following calculation:
POWER( BIGINTVAL, EXPVAL )
Before it performs the calculation, the Data Integration Service converts the inputs to the POWER function to double values. If the BIGINTVAL port contains the Bigint value 9223372036854775807, the Data Integration Service converts this value to 9.22337203685478e+18, losing the last 4 digits of precision. If the EXPVAL port contains the value 1.0 and the result port is a Bigint, this calculation produces a row error since the result, 9223372036854780000, exceeds the maximum bigint value.
When you use an Integer data type in a calculation that can produce decimal values and you enable high precision, the Data Integration Service converts the integer values to decimal values.
For transformations that support the Decimal data type with precision up to 28 digits, precision loss does not occur in a calculation unless the result produces a value with precision greater than 28 digits in high precision mode. In this case, the Data Integration Service stores the result as a double. If the port precision is less than or equal to 28 digits and the result produces a value greater than 28 digits in high precision mode, the Data Integration Service rejects the row.
For transformations that support the Decimal data type with precision up to 38 digits, precision loss does not occur in a calculation unless the result produces a value with precision greater than 38 digits in high precision mode. In this case, the Data Integration Service stores the result as a double. If the port precision is less than or equal to 38 digits and the result produces a value greater than 38 digits in high precision mode, the Data Integration Service rejects the row.
Integer Constants in Expressions
The Integration Service interprets constants in an expression as floating-point values, even if the calculation produces an integer result. For example, in the expression INTVALUE + 1000, the Integration Service converts the integer value “1000” to a double value if high precision is not enabled. It converts the value 1000 to a decimal value if high precision is enabled. To process the value 1000 as an integer value, create a variable port with an Integer data type to hold the constant and modify the expression to add the two ports.
NaN Values
NaN (Not a Number) is a value that is usually returned as the result of an operation on invalid input operands, especially in floating-point calculations. For example, when an operation attempts to divide zero by zero, it returns a NaN result.
Operating systems and programming languages may represent NaN differently. For example the following list shows valid string representations of NaN:
nan
NaN
NaN%
NAN
NaNQ
NaNS
qNaN
sNaN
1.#SNAN
1.#QNAN
The Integration Service converts QNAN values to 1.#QNAN on Win64EMT platforms. 1.#QNAN is a valid representation of NaN.
Write Integer Values to Flat Files
When writing integer values to a fixed-width flat file, the file writer does not verify that the data is within range. For example, the file writer writes the result 3,000,000,000 to a target Integer column if the field width of the target column is at least 13. The file writer does not reject the row because the result is outside the valid range for Integer values.
Binary Data Type
If a mapping includes binary data, set the precision for the transformation binary data type so that the Integration Service can allocate enough memory to move the data from source to target.
You cannot use binary data types for flat file sources.
Date/Time Data Type
The Date/Time data type handles years from 1 A.D. to 9999 A.D. in the Gregorian calendar system. Years beyond 9999 A.D. cause an error.
The Date/Time data type supports dates with precision to the nanosecond. The data type has a precision of 29 and a scale of 9. Some native data types have a smaller precision. When you import a source that contains datetime values, the import process imports the correct precision from the source column. For example, the Microsoft SQL Server Datetime data type has a precision of 23 and a scale of 3. When you import a Microsoft SQL Server source that contains Datetime values, the Datetime columns in the mapping source have a precision of 23 and a scale of 3.
The Integration Service reads datetime values from the source to the precision specified in the mapping source. When the Integration Service transforms the datetime values, it supports precision up to 29 digits. For example, if you import a datetime value with precision to the millisecond, you can use the ADD_TO_DATE function in an Expression transformation to add nanoseconds to the date.
If you write a Date/Time value to a target column that supports a smaller precision, the Integration Service truncates the value to the precision of the target column. If you write a Date/Time value to a target column that supports a larger precision, the Integration Service inserts zeroes in the unsupported portion of the datetime value.
Timestamp with Time Zone
Timestamp with Time Zone is a variant of the Timestamp data type that includes a time zone offset, TIME ZONE_HOUR: TIME ZONE_MINUTE, with or without daylight savings or time zone region name. The time zone offset is the difference, in hours and minutes, between the local time zone and UTC (Coordinated Universal Time).
For example, '16-JUN-08 07.01.25.376000 PM -06:00'. In the example, -06:00 is the time zone offset.
Another example is, '05-JUN-2008 07:01:25.376000 PM America/Los_Angeles'. In the example, America/Los_Angeles is the time zone region name.
When you import the Timestamp with Time Zone data type into the Developer tool, the associated transformation data type is timestampWithTZ.
timestampWithTZ has a precision of 36 and a scale of 9. Timestamp with Time Zone displacement value range is from -12:00 < UTC < +14:00.
The Data Integration Service can process Timestamp with Time Zone data type in Oracle and flat file data objects. Timestamp with Time Zone data type is applicable only for the data within the range of Aug 1947 to Dec 2040.
Timestamp with Time Zone data type does not support the following time zone regions in the Developer tool:
- •AFRICA_CAIRO
- •AFRICA_MONROVIA
- •EGYPT
- •AMERICA_MONTREAL
Timestamp with Time Zone data type uses the IANA standard for the Time Zone Database version 2015b. To avoid data corruption when using the Timestamp with Time Zone data type, ensure that the Oracle DST patch 24 is present on the Oracle server and client. When you enable data object caching, ensure that the Oracle DST patch 24 is present on the database server used for data object caching.
If the patch is not present, download the Oracle DST patch 24 from the Oracle website. To verify that the Oracle server uses the Oracle DST patch 24, run the following command in the command prompt at the Oracle server:
SELECT VERSION FROM v$timezone_file;
You can verify that the version appears as 24. You can also verify the version when you run the following command at the Oracle server:
SELECT TZ_VERSION FROM registry$database;
You can verify that the TZ_VERSION appears as 24.
Timestamp with Local Time Zone
Timestamp with Local Time Zone is a variant of the Timestamp data type where the timestamp data is normalized to the database time zone. The time zone displacement is not part of the column data. When the Data Integration Service reads the data, Oracle returns the data in the time zone of the Data Integration Service.
When you import the Timestamp with Local Time Zone data type into the Developer tool, the associated transformation data type is date/time.
For example, '04-APR-10 10.27.451 AM'
Timestamp (9) with Local Time Zone has a precision of 29 and a scale of 9. It is mapped to the date/time (29,9) transformation data type.
To set the default session time zone when the Data Integration Service reads or writes the Timestamp with Local Time Zone data, specify the ORA_SDTZ environment variable. You can set the ORA_SDTZ environment variable to any of the following values:
- •Operating system local time zone ('OS_TZ')
- •Database time zone ('DB_TZ')
- •Absolute offset from UTC (for example, '-05:00')
- •Time zone region name (for example, 'America/Los_Angeles')
Supported Time Zones for Oracle
In the following install location, you can see that the time zones file contains the list of all supported time zones for Oracle except the four time zones of AFRICA_CAIRO, AFRICA_MONROVIA, EGYPT, and AMERICA_MONTREAL:
<Informatica installation directory>/services/shared/timezones/timezones.txt
If you want to add time zones later based on Oracle support for additional time zones, you can place a new file in the same install location that includes the new Time Zones.
Decimal and Double Data Types
You can pass decimal and double data from sources to targets and perform transformations on decimal and double data.
The transformation language supports the following data types:
- Decimal
- For transformations that support precision up to 38 digits, the precision is 1 to 38 digits, and the scale is 0 to 38. For transformations that support precision up to 28 digits, the precision is 1 to 28 digits, and the scale is 0 to 28. You cannot use decimal values with a scale greater than the precision or with a negative precision. Transformations display any range that you assign to the Decimal data type, but the Data Integration Service supports precision only up to 38 digits or 28 digits depending on the transformation.
- When you enable high precision and the port precision is greater than 38 digits or 28 digits, depending on the transformation, the Data Integration Service stores the result as a double.
- Double
Double-precision floating-point numeric value.
You can edit the precision and scale. The scale must be less than or equal to the precision.
Decimal and Double Values in Calculations
The following table lists how the Data Integration Service handles decimal values based on high precision configuration:
Port Data Type | Precision | High Precision Off | High Precision On |
---|
Decimal | 0 to 15 | Decimal | Decimal |
Decimal | 15 to 38 for transformations that support the Decimal data type with precision up to 38 digits. 15 to 28 for transformations that support the Decimal data type with precision up to 28 digits. | Double | Decimal |
Decimal | Over 38 for transformations that support the Decimal data type with precision up to 38 digits. Over 28 for transformations that support the Decimal data type with precision up to 28 digits. | Double | Double |
When you enable high precision, the Data Integration Service converts numeric constants in any expression function to Decimal. If you do not enable high precision, the Data Integration Service converts numeric constants to Double.
You can ensure the maximum precision for numeric values greater than 28 or 38 digits depending on the transformation. Before you perform any calculations or transformations with the transformation functions, truncate or round any large numbers.
For transformations that support Decimal data type of precision up to 38 digits, use the Decimal data type and enable high precision to ensure precision of up to 38 digits.
Do not use the Double data type for data that you use in an equality condition, such as a lookup or join condition.
Precision Loss for Decimal and Double Values
Precision loss can occur with decimal and double data types in a calculation when the result produces a value with a precision greater than the maximum allowed digits.
If the precision is greater than the maximum allowed digits, the Data Integration Service stores the result as a double.
Precision loss can also occur during intermediate calculations if there is a change in the precision or scale of intermediate results. The precision and scale of intermediate results can affect the precision or scale of the final result.
Precision Loss when High Precision is Disabled
If you disable high precision, the Data Integration Service converts decimal values to double. Precision loss occurs if the decimal value has a precision greater than 15 digits. For example, you have a mapping with Decimal (20,0) that passes the number 40012030304957666903. If you disable high precision, the Data Integration Service converts the decimal value to double and passes 4.00120303049577 x 1019.
Additional precision loss can occur if you use expressions in transformations that have intermediate calculations. If you disable high precision, the Data Integration Service converts the intermediate result to double and uses the intermediate result with lower precision to perform the following calculation in the expression. As a result, the Data Integration Service might produce a different output for expressions based on the order of the arguments. For example, when you disable high precision, the output for tan(90*$pi /180) is different from tan(90/180*$pi). The Data Integration Service performs the intermediate calculations in these expressions as follows:
tan(90*$pi/180) = tan(1.5707963267950003) = -9.649380295141232E12
tan(90/180*$pi) = tan(1.570796326795) = -9.670099380792184E12
Rounding Methods for Double Values
Due to differences in system run-time libraries and the computer system where the database processes double datatype calculations, the results may not be as expected. The double datatype conforms to the IEEE 794 standard. Changes to database client library, different versions of a database or changes to a system run-time library affect the binary representation of mathematically equivalent values. Also, many system run-time libraries implement the round-to-even or the symmetric arithmetic method. The round-to-even method states that if a number falls midway between the next higher or lower number it round to the nearest value with an even least significant bit. For example, with the round-to-even method, 0.125 is rounded to 0.12. The symmetric arithmetic method rounds the number to next higher digit when the last digit is 5 or greater. For example, with the symmetric arithmetic method 0.125 is rounded to 0.13 and 0.124 is rounded to 0.12.
To provide calculation results that are less susceptible to platform differences, the Integration Service stores the 15 significant digits of double datatype values. For example, if a calculation on Windows returns the number 1234567890.1234567890, and the same calculation on UNIX returns 1234567890.1234569999, the Integration Service converts this number to 1234567890.1234600000.
String Data Types
The transformation data types include the following string data types:
Although the String and Text data types support the same precision up to 104,857,600 characters, the Integration Service uses String to move string data from source to target and Text to move text data from source to target. Because some databases store text data differently than string data, the Integration Service needs to distinguish between the two types of character data. In general, the smaller string data types, such as Char and Varchar, display as String in transformations, while the larger text data types, such as Text, Long, and Long Varchar, display as Text.
Use String and Text interchangeably within transformations. However, in Lookup transformations, the target data types must match. The database drivers need to match the string data types with the transformation data types, so that the data passes accurately. For example, Varchar in a lookup table must match String in the Lookup transformation.
Complex Data Types
Use complex data types to represent multiple data values in a single row or column position in a transformation. You use complex data types to enable mappings that run on the Spark engine to directly read, process, and write hierarchical data.
You assign complex data types to complex ports in a mapping to process hierarchical data. You call ports that process hierarchical data complex ports. You specify the complex data type to use to process data from a complex port in the port type configuration in the Developer tool.
You can use the following complex data types in transformations in mappings that run on the Spark engine:
- array
- An array is an ordered collection of elements. The elements can be primitive data types such as integers or strings, or complex data types such as arrays, structs, or maps. All elements in the array must be of the same data type.
- map
- A map contains an unordered collection of key-value pairs. The value part can be a primitive data type, or a complex data type. The value can only be of a single data type.
- struct
- A struct is a collection of elements of different data types. A struct data type is conceptually similar to a table row. The data type contains a fixed number of named fields, each with a predefined data type.
Complex data types can contain primitive data types or complex data types. There are no restrictions on the number of characters contained within a complex data type.
You can use a nested data type in a complex port. A nested data type is a complex data type that contains other complex data types. For example, you can create an array of structs, or a struct containing an array of other structs. A nested data type can contain up to 10 levels of nesting.
Array Data Type
An array data type represents an ordered collection of elements. To pass, generate, or process array data, assign array data type to ports.
An array is a zero-based indexed list. An array index indicates the position of the array element. For example, the array index 0 indicates the first element in an array. The transformation language includes operators to access array elements and functions to generate and process array data.
An array can be one-dimensional or multidimensional. A one-dimensional array is a linear array. A multidimensional array is an array of arrays. Array transformation data types can have up to five dimensions.
Format
array <data_type> []
The following table describes the arguments for this data type:
Argument | Description |
---|
array | Name of the array column or port. |
data_type | Data type of the elements in an array. The elements can be primitive data types or complex data types. All elements in the array must be of the same data type. |
[] | Dimension of the array represented as subscript. A single subscript [] represents a one-dimensional array. Two subscripts [][] represent a two-dimensional array. Elements in each dimension are of the same data type. |
The elements of an array do not have names. The number of elements in an array can be different for each row.
Array Examples
- One-dimensional array
- The following array column represents a one-dimensional array of string elements that contains customer phone numbers:
custphone string[]
The following example shows data values for the custphone column:
custphone |
---|
[205-128-6478,722-515-2889] |
[107-081-0961,718-051-8116] |
[107-031-0961,NULL] |
- Two-dimensional array
- The following array column represents a two-dimensional array of string elements that contains customer work and personal email addresses.
email_work_pers string[][]
The following example shows data values for the email_work_pers column:
email_work_pers |
---|
[john_baer@xyz.com,jbaer@xyz.com][john.baer@fgh.com,jbaer@ijk.com] |
[bobbi_apperley@xyz.com,bapperl@xyz.com][apperlbob@fgh.com,bobbi@ijk.com] |
[linda_bender@xyz.com,lbender@xyz.com][l.bender@fgh.com,NULL] |
Map Data Type
A map data type represents an unordered collection of key-value pair elements. A map element is a key and value pair that maps one thing to another. To pass, generate, or process map data, assign map data type to ports.
The key must be of a primitive data type. The value can be of a primitive or complex data type. A map data type with values of a complex data type is a nested map. A nested map can contain up to three levels of nesting of map data type.
The transformation language includes subscript operator to access map elements. It also includes functions to generate and process map data.
Format
map <primitive_type -> data_type>
The following table describes the arguments for this data type:
Argument | Description |
---|
map | Name of the map column or port. |
primitive_type | Data type of the key in a map element. The key must be of a primitive data type. |
data_type | Data type of the value in a map element. The value can be of a primitive or complex data type. |
Map Example
The following map column represents map data with an integer key and a string value to map customer ids with customer names:
custid_name <integer -> string>
The following example shows data values for the custid_name column:
custid_name |
---|
<26745 -> 'John Baer'> |
<56743 -> 'Bobbi Apperley'> |
<32879 -> 'Linda Bender'> |
Struct Data Type
A struct data type represents a collection of elements of different data types. A struct data type has an associated schema that defines the structure of the data. To pass, generate, or process struct data, assign struct data type to ports.
The schema for the struct data type determines the element names and the number of elements in the struct data. The schema also determines the order of elements in the struct data. Informatica uses complex data type definitions to represent the schema of struct data.
The transformation language includes operators to access struct elements. It also includes functions to generate and process struct data and to modify the schema of the data.
Format
struct {element_name1:value1 [, element_name2:value2, ...]}
Schema for the struct is of the following format:
schema {element_name1:data_type1 [, element_name2:data_type2, ...]}
The following table describes the arguments for this data type:
Argument | Description |
---|
struct | Name of the struct column or port. |
schema | A definition of the structure of data. Schema is a name-type pair that determines the name and data type of the struct elements. |
element_name | Name of the struct element. |
value | Value of the struct element. |
data_type | Data type of the element value. The element values can be of a primitive or complex data type. Each element in the struct can have a different data type. |
Struct Example
The following schema is for struct data to store customer addresses:
address
{st_number:integer,st_name:string,city:string,state:string,zip:string}
The following example shows struct data values for the cust_address column:
cust_address |
---|
{st_number:154,st_name:Addison Ave,city:Redwood City,state:CA,zip:94065} |
{st_number:204,st_name:Ellis St,city:Mountain View,state:CA,zip:94043} |
{st_number:357,st_name:First St,city:Sunnyvale,state:CA,zip:94085} |