Data compression in Amazon S3 V2 sources

You can decompress data when you read data from Amazon S3 .

Configure the compression format in the Compression Format option under the advanced source properties.

Compression format doesn't apply to binary files.

The following table lists the compression formats that you can use for various file types when you read data from Amazon S3:

Compression format	File type
None	Avro, Delta, Flat, JSON, ORC, Parquet
Bzip2	JSON
Deflate	Avro
Gzip	Delta, Flat, Parquet
Lzo	Doesn't apply to any file type.
Snappy	Avro, Delta, ORC, Parquet
Zlib	ORC

For the Avro, ORC and Parquet file formats, the support for the following compression formats are implicit even though these compression formats do not appear in the Compression Format option under the advanced source property:

Compression format	File type
Deflate	Avro
Snappy	Avro, ORC, Parquet
Zlib	ORC

Reading a compressed flat file

When you read data from a compressed flat file, you must upload a schema file and select Gzip as the compression format. Use the .GZ file name extension when you use the Gzip compression format to read data from a flat file.

1Select the required compressed flat file.

2Navigate to Formatting Options property field.

3Select the Import from schema file option and upload the schema.

The following figure shows a sample schema file for a flat file:

{"Columns":[{"Name":"f_varchar","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_char","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_smallint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_integer","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_bigint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_decimal_default","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_real","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_double_precision","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_boolean","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_date","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_timestamp","Type":"string","Precision":"256","Scale":"0"}]}

4Select Compression Format as GZIP from the advanced source properties.

Reading a compressed JSON file

When you read data from a compressed JSON file, you must upload a schema file and select Bzip2 as the compression format. Use the .BZ2 file name extension when you use the Bzip2 compression format to read a JSON file.

1Select the required compressed JSON file.

2Navigate to Formatting Options property field.

3Select Import from schema file option and upload the schema.

The following figure shows a sample schema file for a JSON file:

{"Field1":"<string>","Field2":"<string>","Field3":<integer>}

Use a row that has data for all the columns as the JSON schema.

4Select Compression Format as Bzip2 from the advanced source properties.