Amazon S3 V2 Connector > Amazon S3 V2 sources > Data compression in Amazon S3 V2 sources
  

Data compression in Amazon S3 V2 sources

You can decompress data when you read data from Amazon S3 .
Configure the compression format in the Compression Format option under the advanced source properties.
Compression format doesn't apply to binary files.
The following table lists the compression formats that you can use for various file types when you read data from Amazon S3:
Compression format
File type
None
Avro, Delta, Flat, JSON, ORC, Parquet
Bzip2
JSON
Deflate
Avro
Gzip
Delta, Flat, Parquet
Lzo
Doesn't apply to any file type.
Snappy
Avro, Delta, ORC, Parquet
Zlib
ORC
For the Avro, ORC and Parquet file formats, the support for the following compression formats are implicit even though these compression formats do not appear in the Compression Format option under the advanced source property:
Compression format
File type
Deflate
Avro
Snappy
Avro, ORC, Parquet
Zlib
ORC

Reading a compressed flat file

When you read data from a compressed flat file, you must upload a schema file and select Gzip as the compression format. Use the .GZ file name extension when you use the Gzip compression format to read data from a flat file.
    1Select the required compressed flat file.
    2Navigate to Formatting Options property field.
    3Select the Import from schema file option and upload the schema.
    The following figure shows a sample schema file for a flat file:
    {"Columns":[{"Name":"f_varchar","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_char","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_smallint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_integer","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_bigint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_decimal_default","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_real","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_double_precision","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_boolean","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_date","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_timestamp","Type":"string","Precision":"256","Scale":"0"}]}
    4Select Compression Format as GZIP from the advanced source properties.

Reading a compressed JSON file

When you read data from a compressed JSON file, you must upload a schema file and select Bzip2 as the compression format. Use the .BZ2 file name extension when you use the Bzip2 compression format to read a JSON file.
    1Select the required compressed JSON file.
    2Navigate to Formatting Options property field.
    3Select Import from schema file option and upload the schema.
    The following figure shows a sample schema file for a JSON file:
    {"Field1":"<string>","Field2":"<string>","Field3":<integer>}
    Use a row that has data for all the columns as the JSON schema.
    4Select Compression Format as Bzip2 from the advanced source properties.