Data compression in Google Cloud Storage V2 sources and targets
You can decompress the data when you read data from a Google Cloud Storage V2 source and compress the data when you write data to a Google Cloud Storage V2 target.
Configure the compression format in the Compression Format option under the advanced source and target properties.
The following table lists the supported compression formats in the source for different file formats:
Compression format
Avro File
Flat File
JSON File
Parquet File
Gzip
No
Yes
No
Yes
None
Yes
Yes
Yes
Yes
Note: Select the None compression format if you want to use Deflate or Snappy compression format for Avro and Parquet file formats.
The following table lists the supported compression formats in the target for different file formats:
Compression format
Avro File
Flat File
JSON File
Parquet File
Deflate
Yes
No
No
No
Gzip
No
Yes
No
Yes
None
Yes
Yes
Yes
Yes
Snappy
Yes
No
No
Yes
To read a compressed file from Google Cloud Storage V2, the compressed file must have specific extensions. If the extensions used to read the compressed file are not valid, the Secure Agent does not process the file. The following table describes the extensions that are appended based on the compression format that you use:
Compression format
File Name Extension
Deflate
.deflate
Gzip
.GZ
Snappy
.snappy
Use the following guidelines when you configure data compression:
•Data compression is supported at the file level. You cannot use data compression for a directory.
•When Is Directory property is selected at source, the files within the directory are read sequentially.
•When you download a compressed Gzip file for the Google Cloud Platform console, uncompressed file is downloaded by default. To download the compressed file, you need to remove the content encoding metadata of the object manually. Select Edit object metadata of the object and remove Gzip from the Content-Encoding field.
•When you configure Gzip compression format in the target and the mapping fails with a Java heap space error, update the staging optimization memory in the JVMOptions property to -Xmx2048m and -Xms512m. Google Cloud Storage requires a buffer size of 15 MB to upload the compressed files.