Directory-level partitioning

You can read from and write to partition columns when you use mappings in advanced mode.

You can organize tables or data sets into partitions for grouping same type of data together based on a column or partition key. You can select one or more partition columns in a table or data set.

To read from partition columns, select a partition directory and identify the partition columns. To write to partition columns, you can add partition columns from the list of fields and change the partition order, if required.

You can read data from or write data to partition columns for the following file formats:

•Avro

•Parquet

•ORC

•JSON

Reading from partition columns

Perform the following steps to read data from partition columns:

1Select a directory from the list of source objects.

2Select the Source Type as Directory in the Advanced Source Properties.

3In the Fields tab, you can view the number of partitions. The partitionOrder column appears for the list of partitioned fields, as shown in the following image: The image shows the partition order column.

The image shows the partition order column.

The partitionOrder column specifies whether a column is partitioned.

In the above image, 2 partition columns are present. the partition order values 1 and 2 signify the order in which the Country and State fields were selected for partitioning. The FileName field has 0 as the partition order.

Writing to partition columns

Perform the following steps to write to partition columns:

1Click the

icon in the Partitions tab to add the partition columns for a target. The following image shows how you can add the partition columns: The image shows the Add button in the Partitions tab to add the partition columns.

The image shows the Add button in the Partitions tab to add the partition columns.

2In the Partitions tab, select the partitioning fields from the list of available fields.

Add the partitioning fields from the list of available fields.

3Click Select.

The Partitions tab shows the partition columns that you selected:

Note: You can change the partition order using the up and down arrows as shown in the following image:

Rules and guidelines for reading from and writing to a partition folder

Consider the following rules and guidelines when you read from and write to a partition folder:

•You must import a directory that contains only partition folders and select the source type as Directory in the advanced source property.

•If you import a partition directory that does not have data, a validation error is encountered.

•If you import a partition directory that contains only files but no partition folders, a validation error is encountered.

•If you import a partition directory that has a partition folder but no files in the partition folder, a validation error is encountered.

•You can read data from or write data to partition folders with Avro, Parquet, and Orc files.

•The FileName field has 0 as the partition order.

•The partitioned directory that you select cannot have a partitioned column named FileName. The name is case insensitive.

•When you import an existing target object or create a new target object with a partition directory, the FileName field does not appear for the target objects. The FileName field appears only when you import the source objects.

•You can push down a Filter transformation on a partition column for an Amazon S3 source.

•When you pass a timestamp value in a partition column, the value gets encoded. For example, 03:26:01 gets encoded as 03%3A26%3A01.

•When you pass a value with special characters in a partition column, the value gets encoded. For example, @#$#$%%?* gets encoded as @%23$%23$%25%25%3F%2A.

•When you import a directory that has a partition folder, the data type for the partition column is imported as a String.

•You cannot edit the data type for a partition column.

•You cannot use columns of hierarchical data type as partition columns.

•You cannot use the Edit Metadata option with partition columns.

•You cannot use the View Schema option for a partition directory at source and target side.

•You cannot use the Import from Schema File option for partition directory at source because the schema file does not have information for partition columns.

•You cannot use the Data Preview option with partition columns.

•You cannot select the partition columns in a mapping task if the target object is parameterized.

•For Create Target, you can add partition fields and arrange the partition columns in an order. You cannot add partition fields and arrange the partition columns in an order for an existing target.

•At Create Target, the Label column in the Partitions tab denotes the partition column name.

•When you import an Amazon S3 object that has partition columns, the partition fields are listed at the end of the list.

•If a partition column contains data that has more than 255 characters, the data is truncated and only 255 characters are written in the partition column.

•If a partition column name contains more than 74 characters, the name is truncated and only 74 characters are written in the partition column name.

•The value of the partition directory file path formed using the combination of the partition column name and the target file within the partition directory must not exceed 1024 characters. Otherwise, the mapping will fail.