Data Profiling > Profiles > Schedule and advanced options
  

Schedule and advanced options

On the Schedules tab, you can configure schedules, insights, runtime environment, email notifications, and advanced options for the profile.

Schedule details

You can use a schedule to specify when and how often you want to run a profiling task. To create a schedule, you need to have the Create permission for schedules.
The following table lists the options that you can choose in the Schedule Details area:
Option
Description
Do not run this task on a schedule
Choose this option if you want to manually run the profile.
Run this task on a schedule
Choose an existing schedule to run a data profiling task or create and save a schedule in Data Profiling.
You can also create, view, edit, and delete schedules in Administrator.
To delete a schedule for a data profiling task, you must disassociate or delete the assets linked to the schedule.
Note: When you choose to run a profile on a schedule, the profile runs after the configured schedule offset time. For example, if you configure a schedule to run every hour from 8:00 a.m. to 12:00 p.m., and the schedule offset for your organization is 15 seconds. Your schedule runs at 8:00:15, 9:00:15, 10:00:15, 11:00:15, and 12:00:15. For information about schedule offset, see Administrator in the Administrator help.

Create a schedule

You can create and save a schedule in Data Profiling.
  1. 1From the Schedule Details area, click New.
  2. The New Schedule window appears.
  3. 2Enter a name and description for the schedule.
  4. 3In the Schedule Options section, specify the start date, time zone, and the frequency at which you want to run the data profiling task.
  5. 4Click Save to save the schedule.
  6. The data profiling task is immediately triggered on the scheduled time.

Runtime environment

You can choose a runtime environment to run the task. If you do not choose a runtime environment, the profile runs on the default runtime environment configured for the connection.
You can create, view, edit, or delete runtime environments in Administrator. Data Profiling displays runtime environments based on the source object that you select. For example, if the source object that you select is Avro, Parquet, or JSON, Data Profiling lists all the runtime environments that has the Elastic Server service enabled. If you select any other source object, Data Profiling lists all the runtime environments that has the Data Integration Server service enabled.

Serverless runtime environment

A serverless runtime environment is an advanced serverless deployment solution that does not require downloading, installing, configuring, and maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a runtime environment when you configure a connection or some types of tasks in Data Profiling.
The following table lists the options that you can choose in the Serverless Usage Properties area:
Option
Description
Max Compute Units
Maximum number of serverless compute units corresponding to machine resources that the task can use. Overrides the corresponding property in the serverless runtime environment. By default, for a data profiling task, the maximum number of compute units is set to two.
Task Timeout
Amount of time in minutes to wait for the task to complete before it is terminated. The timeout ensures that serverless compute units are not unproductive when the task hangs. By default, the timeout is the value that is configured in the serverless runtime environment.
For more information, see the Runtime environments document.

Advanced clusters

An advanced cluster is a Kubernetes cluster that provides a distributed processing environment on the cloud. Fully-managed and self-service clusters can run data logic using a scalable architecture, while local clusters use a single node to quickly onboard projects for advanced use cases.
To use an advanced cluster, you perform the following steps:
  1. 1Set up your cloud environment so that the Secure Agent can connect to and access cloud resources.
  2. 2In Administrator, create an advanced configuration to define the cluster and the cloud resources.
  3. 3In Monitor, monitor cluster health and activity while developers in your organization create and run jobs on the cloud.
To run a profile on an Avro, Parquet, or JSON file, you need to configure the Amazon S3 V2 or Azure Data Lake Store connection with the respective Advanced cluster.
For more information about setting up the AWS, Microsoft Azure, and local cluster, see Advanced Clusters help.

Email notification options

When you run a profile, you can choose to send email notifications based on the profile job status. The job status for which you can send notifications include warning, failure, and success. You can choose default and custom email addresses to send the notifications.
The following table lists the email notification options that you can choose for a profile:
Option
Description
Use the default email notification options for my organization
Data Profiling sends the email notification to the default email address of the logged-in user.
You can configure the default email addresses on the Organization page of Administrator. For more information, see Administrator in the Administrator help.
Use custom email notification options for this task
Choose to send the notifications to different email addresses based on the job status.
Enter one or more comma-separated, valid email addresses to receive email notification for the following job status:
  • - Failure
  • - Warning
  • - Success

Advanced options

You can configure the advanced options to detect outliers, infer the date and time, and infer other profile-related parameters.
The following table lists the advanced options that you can configure for a profile:
Option
Description
Maximum Number of Value Frequency Pairs
Number of column values with the highest frequencies appear in the profile results. Default is 500.
For example, if you set the value to 100, only the top 100 values appear in the profile results.
Note: If you do not want to save the value frequency information of a profile in the profiling warehouse, set the value to 0.
Maximum Number of Patterns
Number of patterns with the maximum number of occurrences appear in the profile results. The rest of the patterns appear under the Patterns > Others category on the Results area. Default is 10.
For example, if you set the value to 3, the top 3 patterns appear with their statistics, and the rest of the patterns are consolidated under the Others category.
Pattern Threshold Percentage
Maximum percentage of values used to derive a pattern in the profile results. Default is 5.
For example, when you set the value to 4, the patterns that are 4% and higher appear individually with their statistics and the rest of the patterns are consolidated under the Others category.
Infer Date and Time
Infers the date and time for a column of date or time data type. Default is Yes.
Detect Outliers
Detects pattern and value frequency outliers in the source object. Default is Yes.
Minimum Number of Rows for Split Process per Column
If the source object contains more rows than the minimum number of rows that you enter here, Data Profiling uses one subtask for each source column when the profile is run. Default is 100,000,000.
Maximum Number of Columns per Mapping
Number of columns for each mapping when the number of source rows is fewer than the Minimum Number of Rows for Split Processing per Column value. Default is 50.
Maximum Memory per Mapping*
Maximum amount of memory that you want to allocate for each mapping. Default is 512 MB.
Default buffer block size
Size of buffer blocks used to move data blocks from sources to targets. Default is Auto.
Enter one of the following options:
  • - Auto. Uses automatic memory settings. When you use Auto, configure Maximum Memory per Mapping.
  • - A numeric value. Enter the numeric value that you want to use. The default unit of measure is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For example, 512MB.
DTM Buffer Size
Amount of memory allocated to the task from the DTM process. Default is Auto.
By default, a minimum of 12 MB is allocated to the buffer at run time.
Use one of the following options:
  • - Auto. Uses automatic memory settings. When you use Auto, configure Maximum Memory per Mapping.
  • - A numeric value. Enter the numeric value that you want to use. The default unit of measure is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For example, 512MB.
Line Sequential Buffer Length
Number of bytes that the task reads for each row in a flat file source. Default is 1024.
* The mapping is a type of subtask. Data Profiling creates and runs for a data profiling task to process the data concurrently.
The default values for the advanced options have been derived to provide the best performance. However, you can configure the values based on your requirements. To optimize the data profiling task performance, see Tuning data profiling task performance.
Note: You can configure the following advanced options for a profile with Avro or Parquet source objects:

Execution mode

You can run a profile in Standard or Verbose execution mode.
By default, you run profiles in standard execution mode. If you run a profile in verbose execution mode, profiling mapping writes additional data to the log file. You might use verbose execution mode for troubleshooting purposes, as the generation of additional log data will increase mapping run times.

Session options

You can configure the session options to specify the number of non-fatal errors the data profiling tasks can encounter before Data Profiling stops the session.
You can configure the following session option for a profile: