Use the test case wizard to create a test case that validates the data between the source and the target.
To create a test case, you perform the following tasks:
1Define the test case details.
2Select the two connections that contain the data to compare. For each connection, select the data source to compare. The data source can be a database table, view, or saved SQL query.
3Select the columns to compare and select primary keys for the data sources to identify where the records are unique.
4Data Validation maps the columns in the two data sources according to the keys. If any column remains unmapped, map it manually.
5Select a comparison method.
6Optionally, configure other test case properties related to bad record limit, string validation, data sampling, schedule, email notification, and logging.
Note: Verify that the runtime environment that runs the test contains sufficient disk space to run the test. The size of the tables or views that the test compares, the number of columns, the size of the data in the columns, and the sampling level determine the disk space that Data Validation requires to run the test.
Step 1. Define test case details
Define the general details of the test case on the General tab of the test case wizard.
1On the navigation bar, click New > Test Case.
The test case wizard appears.
2On the General tab, enter the test case name.
The name can contain ASCII, Chinese, Hebrew, and Japanese characters, digits, spaces, and the following characters:
, _ -
The name can't contain any of the following characters:
3Use the default project or folder location of the test case or select a custom location. To navigate to a different test case location, click Browse. Select a new location and click Select.
4Optionally, enter a description of the test case.
5From the Runtime Environment list, select the runtime environment to run the test in. At least one Secure Agent must be installed on the runtime environment.
6From the Report Location list, select a flat file connection to the location where Data Validation stores test reports.
Note: The code page of the connection must be UTF-8.
7Optionally, to parameterize the WHERE clause for data sampling, perform the following steps:
aClick Download to download a sample parameter file.
By default, the sample parameter file is downloaded to the following directory:
bDefine the parameter names and values in the file. Prefix the parameter names with the $ character.
c Save the file in a directory relative to your Secure Agent.
Default is <Secure Agent installation directory>/apps/DVProcessor/data/userparameters.
dIn the Parameterization section, enter the parameter file directory and parameter file name. For more information, see Parameterization for WHERE clause.
8Click Next.
The Connections tab appears.
Step. 2 Select connections and data sources
On the Connections tab of the test case wizard, select the connections and data sources that the test case compares.
1From the Connection 1 list, select the connection that contains the source.
If you use an Amazon Redshift v2 connection, enter a valid path for Connection 1 in the Path field and enter a valid S3 bucket name in the S3 Bucket Name field.
If you use an Amazon S3 v2 or Microsoft Azure Data Lake Storage Gen2 connection, in the Path field, optionally, enter a relative path to the location where the file is stored for Connection 1. If you do not enter a path, Data Validation lists all the files in the folder path specified in the connection.
If you use a Google BigQuery V2 connection, enter a schema name to get a list of objects.
The details of the connection appear below the selected connection.
2From the Connection 1 list, select the data source that you want to compare. The data source can be a database table, view, or saved SQL query.
If you use an Amazon S3 v2 connection, select a Parquet or flat file.
To use a saved SQL query, under the Object or Saved SQL Query Details section, click Saved SQL Query. Then, from the Saved SQL Query list, select a saved SQL query. The query appears in the SQL Query box. Click Validate to test whether the query is valid or not. If the query is not valid, fix the query and validate it again. If you change the connection or saved SQL query, you must validate the query again.
3From the Connection 2 list, select the connection that contains the target.
If you use an Amazon Redshift v2 connection, enter a path for Connection 2 in the Path field and enter a valid S3 bucket name in the S3 Bucket Name field.
If you use an Amazon S3 v2 or Microsoft Azure Data Lake Storage Gen2 connection, in the Path field, optionally, enter a relative path to the location where the file is stored for Connection 2. If you do not enter a path, Data Validation lists all the files in the folder path specified in the connection.
If you use a Google BigQuery V2 connection, enter a schema name to get a list of objects.
The details of the connection appear below the selected connection.
4From the Connection 2 list, select the data source that you want to compare. The data source can be a database table, view, or saved SQL query.
If you use an Amazon S3 v2 connection, select a Parquet or flat file.
To use a saved SQL query, under the Object or Saved SQL Query Details section, click Saved SQL Query. Then, from the Saved SQL Query list, select a saved SQL query. The query appears in the SQL Query box. Click Validate to test whether the query is valid or not. If the query is not valid, fix the query and validate it again. If you change the connection or saved SQL query, you must validate the query again.
5If the connections that contain the data to compare point to flat files, specify the delimited flat file formatting options.
The following table describes the flat file formatting parameters:
Parameters
Description
Delimiter
Indicates the boundary between two columns of data.
Select one of the following options:
- Comma
- Tab
- Colon
- Semicolon
- Other. Select this option and specify the character to use as a delimiter.
If you choose an escape character or a quote character as a delimiter, or if you use the same character as consecutive delimiter and qualifier, you might receive unexpected results.
Default is comma.
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character, Data Validation ignores delimiters within quotes.
Default is double quote (").
Escape Character
Character that immediately precedes a column delimiter character embedded in an unquoted string, or immediately precedes the quote character in a quoted string.
When you specify an escape character, Data Validation reads the delimiter character as a regular character.
First Data Row
Number of the row that Data Validation starts reading the file from during import. For example, if you enter 2, Data Validation skips the first row.
Note: If you set Field Labels to Import from Row, Data Validation sets the first data row automatically. For example, if you set the field labels row number to 10, Data Validation sets the first data row to 11.
Field Labels
Determines how Data Validation displays column names in profile results. Select one of the following options:
- Auto-generate. Data Validation generates the column names.
This option doesn't apply to flat files within an Amazon S3 v2 connection.
- Import from Row. Data Validation imports the column name from the specified row number.
Row Number
Row that contains the column name.
Applies when you choose Import From Row in the Field Labels option.
6Click Next.
The Configuration tab appears.
Step 3. Select table columns and primary keys
Select the columns to compare and the primary keys of the source and target on the Configuration tab of the test case wizard.
Based on the database requirements, you can select a single primary key, or select multiple keys to create a composite key.
You must select at least one primary key for each connection. The primary key must be a unique key in the database that can identify a row uniquely. If the primary key column has duplicates, the report might show inaccurate results.
The length of a composite key must not exceed 1,000 characters.
1From the Available Fields list in the Connection 1 area, select the fields to compare and use the right arrow to move them to the Selected Fields list, or use the double right arrow to move all available fields.
2From the Primary key for the table in Connection 1 list, select primary keys for the source.
3Optionally, select Data preview.
A preview of the Connection 1 table columns you selected appears. Optionally, modify the selection as needed.
4From the Available Fields list in the Connection 2 area, select the fields to compare and use the right arrow to move them to the Selected Fields list, or use the double right arrow to move all available fields.
5From the Primary key for the table in Connection 2 list, select primary keys for the target.
6Optionally, select Data preview.
A preview of the Connection 2 table columns you selected appears. Optionally, modify the selection as needed.
7Click Next.
The Mapping tab appears.
Step 4. Map unmapped columns
Data Validation maps the columns in the two tables or views based on the column names. The Mapping tab of the test case wizard shows the mapping status of each column including the data type, precision, and scale.
The Mapping tab shows if the data type matches in each pair of columns. If any columns remain unmapped, map them manually. You must map all unmapped columns.
If Data Validation mapped all the columns successfully, go on to the next step.
You can also choose a comparison method. For more information, see Comparison methods.
1Select one of the following comparison methods:
- Value Test - Compare Entire Table. Data Validation compares the entire table based on the actual data values. This is the default method.
- Aggregation Functions Test. Select the aggregation functions that Data Validation must use to compare the table.
2In the Map Columns table, if the value of Column Mapped? for a column is No, from the Table Column in Connection 2 list, select a column to map to the Connection 1 column.
3Repeat Step 1 to map all unmapped columns.
4Optionally, select Data preview.
A preview of the Connection 1 table columns and Connection 2 table columns you selected appears. Optionally, modify the selection as needed.
5Click Next.
The Settings tab appears. The Save and Run buttons are active.
6Click Save to save the test case, or perform the next optional steps.
Step 5. Configure test case properties (optional)
Configure optional properties on the Settings tab of the test case wizard.
1In the Bad Record Limit section, select the maximum number of unmatched, extra, and missing records to show in the detailed test results.
You can select one of the following values:
- 100
- 500
- 1000
2In the String Validation section, perform the following steps:
- Select the Ignore case option to ignore casing differences in the data.
- Select the Trim white spaces in a string value option to trim leading and trailing white spaces in string values.
3Configure data sampling for Data Validation to compare data between the source and the target. The less data that Data Validation samples, the faster the test runs.
This setting applies to all the columns you select to compare, in both tables.
In the Data Sampling section, select one of the following values:
- WHERE clause. Data Validation selects those rows for sampling that meet the condition specified in the WHERE clause. To create a WHERE clause, click Add Clause. Enter the WHERE clause for the connection. Optionally, click Validate. If there are no validation errors, click Continue. If there are validation errors, fix the clause and try again.
You can also parameterize the WHERE clause to provide different values at run time. Prefix the parameter name with the $ character. For more information, see Parameterization for WHERE clause.
- Sample the first. Select the number of rows at the beginning of the table that the test compares.
- Sample the last. Select the number of rows at the end of the table that the test compares.
- Percentage. Select a percentage of the rows that the test compares. The test report shows records that are missing or unmatched in the Connection 2 table, but doesn't show extra records if any exist.
4In the Schedule Test Case Run section, select a schedule to run the test case.
You can create schedules in Administrator.
5In the Email Notification Options section, specify whether you want to send email notifications for the test case run.
Select one of the following values:
- Do not set email notifications. Data Validation doesn't send email notifications.
- Use the default email notification options for my organization. Data Validation uses the default email notification options that are configured in Administrator.
- Use custom email notifications. Specify the email addresses that Data Validation must use for the success and failure email notifications. Separate multiple email addresses with a comma.
6In the Logging section, select one of the following logging levels:
- Standard. The log includes error messages.
- Verbose. The log includes messages of all logging levels. This option is useful for debugging.
7To save the Data Integration mappings and tasks that Data Validation creates when it runs the test case, select Save Data Integration assets after the test case run.
8Click Save.
The test case is ready to be run.
Rules and guidelines for data sampling
Consider the following rules and guidelines when you configure data sampling:
•If you use an aggregation functions test, you can select only the WHERE clause option for data sampling. The other sampling options are not available.
•If you select the Sample the first or Sample the last option, in addition to the mismatched records in the sampled data, Data Validation reports the missing rows and extra rows in the sampled data set.
• Data Validation uses string-based sorting for the data. Therefore, if the primary key column contains numeric values and you select the Sample the first or Sample the last option, you might see different results in the report. For example, the following table shows the values before and after sorting: