Complex Data Types
A complex data type is a transformation data type that represents multiple data values in a single column position. The data values are called elements. Elements in a complex data type can be of primitive or complex data types. Use complex data types to process hierarchical data in mappings that run on the Spark engine.
Transformation data types include the following data types:
- Primitive data type
- A transformation data type that represents a single data value in a single column position. Data types such as decimal, integer, and string are primitive data types. You can assign primitive data types to ports in any transformation.
- Complex data type
- A transformation data type that represents multiple data values in a single column position. Data types such as array, map, and struct are complex data types. You can assign complex data types to ports in some transformations for the Spark engine.
- Nested data type
- A complex data type that contains elements of complex data types. Complex data types such as an array of structs or a struct with an array of other structs are nested data types.
The following table lists the complex data types:
Complex Data Type | Description |
---|
array | An array is an ordered collection of elements. The elements can be of a primitive or complex data type. All elements in the array must be of the same data type. |
map | A map is an unordered collection of key-value pair elements. The key must be of a primitive data type. The value can be of a primitive or complex data type. |
struct | A struct is a collection of elements of different data types. The elements can be of primitive or complex data types. Struct has a schema that defines the structure of the data. |
The following image shows primitive, complex, and nested data types assigned to ports in a transformation:
- 1. Primitive data types
- 2. Complex data types
- 3. Nested data type
The ports emp_name and emp_sal are of primitive data types. The ports emp_phone, emp_id_dept, and emp_address are of complex data types. The port emp_bonus is of a nested data type. The array contains elements of type struct.
Array Data Type
An array data type represents an ordered collection of elements. To pass, generate, or process array data, assign array data type to ports.
An array is a zero-based indexed list. An array index indicates the position of the array element. For example, the array index 0 indicates the first element in an array. The transformation language includes operators to access array elements and functions to generate and process array data.
An array can be one-dimensional or multidimensional. A one-dimensional array is a linear array. A multidimensional array is an array of arrays. Array transformation data types can have up to five dimensions.
Format
array <data_type> []
The following table describes the arguments for this data type:
Argument | Description |
---|
array | Name of the array column or port. |
data_type | Data type of the elements in an array. The elements can be primitive data types or complex data types. All elements in the array must be of the same data type. |
[] | Dimension of the array represented as subscript. A single subscript [] represents a one-dimensional array. Two subscripts [][] represent a two-dimensional array. Elements in each dimension are of the same data type. |
The elements of an array do not have names. The number of elements in an array can be different for each row.
Array Examples
- One-dimensional array
- The following array column represents a one-dimensional array of string elements that contains customer phone numbers:
custphone string[]
The following example shows data values for the custphone column:
custphone |
---|
[205-128-6478,722-515-2889] |
[107-081-0961,718-051-8116] |
[107-031-0961,NULL] |
- Two-dimensional array
- The following array column represents a two-dimensional array of string elements that contains customer work and personal email addresses.
email_work_pers string[][]
The following example shows data values for the email_work_pers column:
email_work_pers |
---|
[john_baer@xyz.com,jbaer@xyz.com][john.baer@fgh.com,jbaer@ijk.com] |
[bobbi_apperley@xyz.com,bapperl@xyz.com][apperlbob@fgh.com,bobbi@ijk.com] |
[linda_bender@xyz.com,lbender@xyz.com][l.bender@fgh.com,NULL] |
Map Data Type
A map data type represents an unordered collection of key-value pair elements. A map element is a key and value pair that maps one thing to another. To pass, generate, or process map data, assign map data type to ports.
The key must be of a primitive data type. The value can be of a primitive or complex data type. A map data type with values of a complex data type is a nested map. A nested map can contain up to three levels of nesting of map data type.
The transformation language includes subscript operator to access map elements. It also includes functions to generate and process map data.
Format
map <primitive_type -> data_type>
The following table describes the arguments for this data type:
Argument | Description |
---|
map | Name of the map column or port. |
primitive_type | Data type of the key in a map element. The key must be of a primitive data type. |
data_type | Data type of the value in a map element. The value can be of a primitive or complex data type. |
Map Example
The following map column represents map data with an integer key and a string value to map customer ids with customer names:
custid_name <integer -> string>
The following example shows data values for the custid_name column:
custid_name |
---|
<26745 -> 'John Baer'> |
<56743 -> 'Bobbi Apperley'> |
<32879 -> 'Linda Bender'> |
Struct Data Type
A struct data type represents a collection of elements of different data types. A struct data type has an associated schema that defines the structure of the data. To pass, generate, or process struct data, assign struct data type to ports.
The schema for the struct data type determines the element names and the number of elements in the struct data. The schema also determines the order of elements in the struct data. Informatica uses complex data type definitions to represent the schema of struct data.
The transformation language includes operators to access struct elements. It also includes functions to generate and process struct data and to modify the schema of the data.
Format
struct {element_name1:value1 [, element_name2:value2, ...]}
Schema for the struct is of the following format:
schema {element_name1:data_type1 [, element_name2:data_type2, ...]}
The following table describes the arguments for this data type:
Argument | Description |
---|
struct | Name of the struct column or port. |
schema | A definition of the structure of data. Schema is a name-type pair that determines the name and data type of the struct elements. |
element_name | Name of the struct element. |
value | Value of the struct element. |
data_type | Data type of the element value. The element values can be of a primitive or complex data type. Each element in the struct can have a different data type. |
Struct Example
The following schema is for struct data to store customer addresses:
address
{st_number:integer,st_name:string,city:string,state:string,zip:string}
The following example shows struct data values for the cust_address column:
cust_address |
---|
{st_number:154,st_name:Addison Ave,city:Redwood City,state:CA,zip:94065} |
{st_number:204,st_name:Ellis St,city:Mountain View,state:CA,zip:94043} |
{st_number:357,st_name:First St,city:Sunnyvale,state:CA,zip:94085} |
Rules and Guidelines for Complex Data Types
Consider the following rules and guidelines when you work with complex data types:
- •A nested data type can contain up to 10 levels of nesting.
- •A nested map can contain up to three levels of nesting of map data types.
- •An array data type cannot directly contain an element of type array. Use multidimensional arrays to create a nested array. For example, an array with two dimensions is an array of arrays.
- •A multidimensional array can contain up to five levels of nesting. The array dimension determines the levels of nesting.
- •Each array in a multidimensional array must have elements of the same data type.