Big Data Management User Guide > Processing Hierarchical Data on the Spark Engine > Complex Data Types
  

Complex Data Types

A complex data type is a transformation data type that represents multiple data values in a single column position. The data values are called elements. Elements in a complex data type can be of primitive or complex data types. Use complex data types to process hierarchical data in mappings that run on the Spark engine.
Transformation data types include the following data types:
Primitive data type
A transformation data type that represents a single data value in a single column position. Data types such as decimal, integer, and string are primitive data types. You can assign primitive data types to ports in any transformation.
Complex data type
A transformation data type that represents multiple data values in a single column position. Data types such as array, map, and struct are complex data types. You can assign complex data types to ports in some transformations for the Spark engine.
Nested data type
A complex data type that contains elements of complex data types. Complex data types such as an array of structs or a struct with an array of other structs are nested data types.
The following table lists the complex data types:
Complex Data Type
Description
array
An array is an ordered collection of elements.
The elements can be of a primitive or complex data type. All elements in the array must be of the same data type.
map
A map is an unordered collection of key-value pair elements.
The key must be of a primitive data type. The value can be of a primitive or complex data type.
struct
A struct is a collection of elements of different data types.
The elements can be of primitive or complex data types. Struct has a schema that defines the structure of the data.
The following image shows primitive, complex, and nested data types assigned to ports in a transformation:
The Ports tab in the Properties view of a transformation contains ports of primitive, complex, and nested data types.
  1. 1. Primitive data types
  2. 2. Complex data types
  3. 3. Nested data type
The ports emp_name and emp_sal are of primitive data types. The ports emp_phone, emp_id_dept, and emp_address are of complex data types. The port emp_bonus is of a nested data type. The array contains elements of type struct.

Array Data Type

An array data type represents an ordered collection of elements. To pass, generate, or process array data, assign array data type to ports.
An array is a zero-based indexed list. An array index indicates the position of the array element. For example, the array index 0 indicates the first element in an array. The transformation language includes operators to access array elements and functions to generate and process array data.
An array can be one-dimensional or multidimensional. A one-dimensional array is a linear array. A multidimensional array is an array of arrays. Array transformation data types can have up to five dimensions.

Format

array <data_type> []
The following table describes the arguments for this data type:
Argument
Description
array
Name of the array column or port.
data_type
Data type of the elements in an array.
The elements can be primitive data types or complex data types. All elements in the array must be of the same data type.
[]
Dimension of the array represented as subscript.
A single subscript [] represents a one-dimensional array. Two subscripts [][] represent a two-dimensional array.
Elements in each dimension are of the same data type.
The elements of an array do not have names. The number of elements in an array can be different for each row.

Array Examples

One-dimensional array
The following array column represents a one-dimensional array of string elements that contains customer phone numbers:
custphone string[]
The following example shows data values for the custphone column:
custphone
[205-128-6478,722-515-2889]
[107-081-0961,718-051-8116]
[107-031-0961,NULL]
Two-dimensional array
The following array column represents a two-dimensional array of string elements that contains customer work and personal email addresses.
email_work_pers string[][]
The following example shows data values for the email_work_pers column:
email_work_pers
[john_baer@xyz.com,jbaer@xyz.com][john.baer@fgh.com,jbaer@ijk.com]
[bobbi_apperley@xyz.com,bapperl@xyz.com][apperlbob@fgh.com,bobbi@ijk.com]
[linda_bender@xyz.com,lbender@xyz.com][l.bender@fgh.com,NULL]

Map Data Type

A map data type represents an unordered collection of key-value pair elements. A map element is a key and value pair that maps one thing to another. To pass, generate, or process map data, assign map data type to ports.
The key must be of a primitive data type. The value can be of a primitive or complex data type. A map data type with values of a complex data type is a nested map. A nested map can contain up to three levels of nesting of map data type.
The transformation language includes subscript operator to access map elements. It also includes functions to generate and process map data.

Format

map <primitive_type -> data_type>
The following table describes the arguments for this data type:
Argument
Description
map
Name of the map column or port.
primitive_type
Data type of the key in a map element.
The key must be of a primitive data type.
data_type
Data type of the value in a map element.
The value can be of a primitive or complex data type.

Map Example

The following map column represents map data with an integer key and a string value to map customer ids with customer names:
custid_name <integer -> string>
The following example shows data values for the custid_name column:
custid_name
<26745 -> 'John Baer'>
<56743 -> 'Bobbi Apperley'>
<32879 -> 'Linda Bender'>

Struct Data Type

A struct data type represents a collection of elements of different data types. A struct data type has an associated schema that defines the structure of the data. To pass, generate, or process struct data, assign struct data type to ports.
The schema for the struct data type determines the element names and the number of elements in the struct data. The schema also determines the order of elements in the struct data. Informatica uses complex data type definitions to represent the schema of struct data.
The transformation language includes operators to access struct elements. It also includes functions to generate and process struct data and to modify the schema of the data.

Format

struct {element_name1:value1 [, element_name2:value2, ...]}
Schema for the struct is of the following format:
schema {element_name1:data_type1 [, element_name2:data_type2, ...]}
The following table describes the arguments for this data type:
Argument
Description
struct
Name of the struct column or port.
schema
A definition of the structure of data.
Schema is a name-type pair that determines the name and data type of the struct elements.
element_name
Name of the struct element.
value
Value of the struct element.
data_type
Data type of the element value.
The element values can be of a primitive or complex data type. Each element in the struct can have a different data type.

Struct Example

The following schema is for struct data to store customer addresses:
address
{st_number:integer,st_name:string,city:string,state:string,zip:string}
The following example shows struct data values for the cust_address column:
cust_address
{st_number:154,st_name:Addison Ave,city:Redwood City,state:CA,zip:94065}
{st_number:204,st_name:Ellis St,city:Mountain View,state:CA,zip:94043}
{st_number:357,st_name:First St,city:Sunnyvale,state:CA,zip:94085}

Rules and Guidelines for Complex Data Types

Consider the following rules and guidelines when you work with complex data types: