With Intelligent Structure Discovery, you can use the Simplify Excel mode to parse Excel workbooks.
Simplify Excel spreadsheets for better performance and to consume less heap memory and CPU cores.
When you use Simplify Excel mode in a Structure Parser transformation, you can use the following output types:
•Avro
•JSON
•JSON Lines
•ORC
•Parquet
•XML
•Relational
Consider the following rules and guidelines when you simplify spreadsheets:
•Simplify Excel mode doesn't detect merged cells in a spreadsheet.
•When you parse spreadsheets containing more than one table, the intelligent structure model doesn't detect the tables as separate tables.
•When each cell in a column contains data separated by a hyphen, such as 12-654, the intelligent structure model doesn't recognize the data as two separate columns.
•When each cell in a column contains a number that contains two hyphens, such as 12-345-678, the intelligent structure model doesn't identify it as three separate rows.
•You can't write Unassigned data to a different target.
•You can't parse data in cells that use the Advanced number format.