Creating a Column Profile on a Semi-structured Data Source
After you create a flat file data object or complex file data object from Avro, JSON, Parquet, or XML data sources, you can create and run a column profile on the data object.
1. In the Object Explorer view, select the data object for the Avro, JSON, Parquet, or XML file.
2. Click File > New > Profile.
The New dialog box appears.
3. Select Profile. Click Next.
The New Profile dialog box appears.
4. In the New Profile dialog box, add a name for the profile and an optional description.
5. Select Process Extended File Formats option. Click Next.
The following image shows the New Profile wizard with the Process Extended File Formats option:
Note: The Process Extended File Formats option does not appear for Avro and Parquet data sources when you choose the Resource Format as Avro or Parquet.
6. In the Single Data Object Profile page, select the columns and options under Column Selection and Data Domain Discovery as required. Click Finish.
Note: If the Developer tool is installed on a Linux machine and the JSON or XML physical data object is a flat file data object with a text file, then perform the following tasks:
- a. On the Overview tab, update the Precision value to include the number of characters in the file path of the data source in the server.
- b. Update the file path of the data source to the location in the server after you create a profile on the flat file data object. To update the file path, click Runtime: Read > Source file directory in the Advanced tab, and add the file path.
7. Right-click the profile, and select Run Profile.
The profile results appear.