You can create a custom profile or default profile. When you create a custom profile, you can configure the columns, sample rows, and drill-down options. When you create a default profile, the column profile and data domain discovery runs on the entire data set with all the data domains.
1. In the Discovery workspace, click Profile, or select New > Profile from the header area.
Note: You can right-click on the data object in the Library workspace and create a profile. In this profile, the profile name, location name, and data object are extracted from the data object properties. You can create a default profile or customize the settings to create a custom profile.
The New Profile wizard appears.
2. The Single source option is selected by default. Click Next.
3. In the Specify General Properties screen, enter a name and an optional description for the profile. In the Location field, select the project or folder where you want to create the profile. Click Next.
4. In the Select Source screen, click Choose to select a data object, or click New to import a data object. Click Next.
- - In the Choose Data Object dialog box, select a data object. Click OK.
The Properties pane displays the properties of the selected data object. The Data Preview pane displays the columns in the data object.
- - In the New Data Object dialog box, you can choose a connection, schema, table, or view to create a profile on, select a location, and create a folder to import the data object. Click OK.
5. In the Select Source screen, select the columns that you want to run a profile on. Optionally, select Name to select all the columns. Click Next.
All the columns are selected by default. The Analyst tool lists column properties, such as the name, data type, precision, scale, nullable, and participates in the primary key for each column.
6. In the Specify Settings screen, choose to run a column profile, data domain discovery, or a column profile and data domain discovery. By default, column profile option is selected.
- - Choose Run column profile to run a column profile.
- - Choose Run data domain discovery to perform data domain discovery. In the Data domain pane, select the data domains that you want to discover, select a conformance criteria, and select the columns for data domain discovery in the Edit columns selection for data domin discovery dialog box.
- - Choose Run column profile and Run data domain discovery to run the column profile and data domain discovery. Select the data domain options in the Data domain pane.
Note: By default, the columns that you select is for column profile and data domain discovery. Click Edit to select or deselect columns for data domain discovery.
- - Choose Data, Columns, or Data and Columns to run data domain discovery on.
- - Choose a sampling option. You can choose All rows (complete analysis), Sample first, Random sample, Random sample (auto), Limit n, or Random percentage as a sampling option in the Run profile on pane. The sampling option applies to column profile and data domain discovery.
- - Choose a drilldown option. You can choose Live or Staged drilldown option, or you can choose Off to disable drilldown in the Drilldown pane. Optionally, click Select Columns to select columns to drill down on. You can choose to omit data type and data domain inference for columns with an approved data type or data domain.
- - Choose Native, Blaze, or Spark as the run-time environment. If you choose Blaze or Spark, click Choose to select a Hadoop connection in the Select a Hadoop Connection dialog box.
7. Click Next.
The Specify Rules and Filters screen opens.
8. In the Specify Rules and Filters screen, you can perform the following tasks:
- - Create, edit, or delete a rule. You can apply existing rules to the profile.
- - Create, edit, or delete a filter.
Note: When you create a scorecard on this profile, you can reuse the filters that you create for the profile.
9. Click Save and Finish to create the profile, or click Save and Run to create and run the profile.