Directory-level partitioning for mappings in advanced mode
When you create a mapping in advanced mode, you can read from and write to partition columns.
You can organize tables or data sets into partitions to group the same type of data based on a column or partition key. You can select one or more partition columns in a table or data set.
To read from partition columns, select a partition directory and identify the partition columns. To write to partition columns, you can add partition columns from the list of fields and change the partition order, if required.
You can read data from or write data to partition columns for the following file formats:
•Avro
•Parquet
•ORC
•JSON
Importing partition folders
Consider the following rules and guidelines when you import partition folders:
•When you read or write data to a partition folder, you can import a directory that contains both partition folder and files. If the directory contains only files but no partition folders, a validation error occurs.
•To import a partition directory that contains only partition folders, ensure that the partition directory contains files and data. Otherwise, a validation error is encountered.
•You must import a directory that contains only partition folders and select the Is Directory option in the advanced source property.
•If you import a partition directory that has a partition folder but no files in the partition folder, a validation error is encountered.
•When you import a directory that has a partition folder, the data type for the partition column is imported as a String.
•When you import a Google Cloud Storage object that has partition columns, the partition fields are listed at the end of the list.
Reading from partition columns
You can select the directory source type and view the list of partition columns. You can also view the order in which the fields were selected for partitioning.
1Select a directory from the list of source objects.
2Select the Is Directory option in the Advanced Source Properties.
3Select the Fields tab to view the number of partitions.
The partitionOrder column appears for the list of partitioned fields.
The following image shows an example of the partitionOrder column:
The partitionOrder column specifies whether a column is partitioned.
In the above example, two partition columns are available. The partition order values 1 and 2 signify the order in which the year and month fields were selected for partitioning.
Writing to partition columns
You can add partition columns for a target and change the partition order of the columns.
1Click the icon in the Partitions tab to add the partition columns for a target.
The following image shows the Partitions tab where you can add the partition columns:
2On the Partitions tab, select the partitioning fields from the list of available fields:
3Click Select.
The Partitions tab shows the partition columns that you selected:
Note: You can change the partition order using the up and down arrows as shown in the following image:
Rules and guidelines for reading from and writing to a partition folder
Consider the following rules and guidelines when you read from and write to a partition folder:
•You can read or write data to partition folders with the Avro, Parquet, JSON, and ORC files.
•You cannot write data to partition folders with Avro, JSON, ORC, and Parquet files if the source contains hierarchical data.
•You can configure a Filter transformation on a partition column for a Google Cloud Storage source.
•When you pass a timestamp value in a partition column, the value gets encoded. For example, 03:26:01 gets encoded as 03%3A26%3A01.
•When you pass a special character in a partition column, the value gets encoded. For example, #" is encoded as %23%22
•When the partition column contains special characters, the special characters gets replaced with underscore.
•You cannot use the Edit Metadata option with partition columns.
•You cannot use the View Schema option for a partition directory at source and target side.
•You cannot use the Import from Schema File option for partition directory at source because the schema file does not have information for partition columns.
•You cannot use the Data Preview option with partition columns.
•You cannot select the partition columns in a mapping task if the target object is parameterized.
•When you create a target, you can add partition fields and arrange the partition columns in an order. You cannot add partition fields and arrange the partition columns in an order for an existing target.
•When you create a target, the Label column in the Partitions tab denotes the partition column name.
•If a partition column name contains more than 74 characters, the partition column name is truncated to only 74 characters and the Secure Agent writes NULL values or _HIVE_DEFAULT_PARTITION_ to the partition column.
•The value of the partition folder name formed using the combination of the partition column name and value must not exceed 1024 characters. Otherwise, the mapping fails.
If a partition folder name that has more than 1024 characters, you can perform one of the following task to truncate the partition folder name:
- Increase the precision of the source field to 256. The mapping runs successfully and the Secure Agent truncates the data in the target field to 255 characters.
- Specify a substr function through an Expression transformation on the partition column and truncate the value of the partition column.