Enterprise Data Preparation Service Properties
To view the Enterprise Data Preparation Service properties, select the service in the Domain Navigator and click the Properties view. You can edit the properties by clicking the pencil icon in the respective area, while the service is running, but you must restart the service for the properties to take effect. You can configure the following Enterprise Data Preparation Service properties:
- •General Properties
- •Model Repository Service Options
- •Interactive Data Preparation Service Options
- •Data Integration Service Options
- •Catalog Service Options
- •Data Lake Security Options
- •Data Lake Options
- •Event Logging Options
- •Upload and Download Options
- •Export Options
- •Data Asset Recommendation Options
- •Sampling Options
- •Apache Zeppelin Options
- •Logging Options
- •Execution Options
- •Advanced Options
- •Custom Options
General Properties
General properties for the Enterprise Data Preparation Service include the name, description, license, and the node in the Informatica domain that the Enterprise Data Preparation Service runs on.
To edit the general properties, click the pencil icon in the general properties area. In the Edit General Properties window, edit the required fields.
The following table describes the general properties for the service:
Property | Description |
---|
Name | Name of the Enterprise Data Preparation Service. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [ |
Description | Description of the Enterprise Data Preparation Service. The description cannot exceed 765 characters. |
License | License object with the data lake option that allows the use of the Enterprise Data Preparation Service. |
Node Assignment | Type of node in the Informatica domain on which the Enterprise Data Preparation Service runs. Select Single Node if a single service process runs on the node or Primary and Backup Nodes if a service process is enabled on each node for high availability. However, only a single process runs at any given time, and the other processes maintain standby status. The Primary and Backup Nodes option will be available for selection based on the license configuration. Default is Single Node. |
Node | Name of the node on which the Enterprise Data Preparation Service runs. |
Backup Nodes | If your license includes high availability, nodes on which the service can run if the primary node is unavailable. |
Model Repository Service Options
The Model Repository Service is an application service that manages the Model repository. When an analyst creates projects, the Model Repository Service connects to the Model repository to store the project metadata. When you create the Enterprise Data Preparation Service, you must associate it with a Model Repository Service using the Model Repository Service Options properties.
To edit the Model Repository Service options, click the pencil icon. In the Edit Model Repository Service Optionswindow, edit the required fields.
The following table describes the Model Repository Service options:
Property | Description |
---|
Model Repository Service | Name of the Model Repository Service associated with the Enterprise Data Preparation Service. |
Model Repository Service User Name | User account to use to log in to the Model Repository Service. |
Model Repository Service Password | Password for the Model Repository Service user account. |
Modify Repository Password | Select the checkbox to modify the Model Repository Service user password. |
Security Domain | LDAP security domain for the Model repository user. The field appears when the domain contains an LDAP security domain. |
Interactive Data Preparation Service Options
The Interactive Data Preparation Service is an application service that manages data preparation within the Enterprise Data Preparation application. When you create the Enterprise Data Preparation Service, you must associate it with a Interactive Data Preparation Service using the Interactive Data Preparation Service options.
To edit the Interactive Data Preparation Service options, click the pencil icon. In the Edit Data Preparation Service Optionswindow, edit the required fields.
The following table describes the service options:
Property | Description |
---|
Interactive Data Preparation Service | Name of the Interactive Data Preparation Service associated with the Enterprise Data Preparation Service. |
Data Integration Service Options
The Data Integration Service is an application service that performs data integration tasks for Enterprise Data Preparation. When you create the Enterprise Data Preparation Service, you must associate it with a Data Integration Service using the Data Integration Service options.
To edit the Data Integration Service options, click the pencil icon. In the Edit Data Integration Service Options window, edit the required fields.
The following table describes the Data Integration Service options:
Property | Description |
---|
Data Integration Service | Name of the Data Integration Service associated with the Enterprise Data Preparation Service. |
Catalog Service Options
The catalog represents an indexed inventory of all the configured assets in an enterprise. You can find metadata and statistical information, such as profile statistics, data asset ratings, data domains, and data relationships, in the catalog. The catalog options will be based on the Catalog Service configuration you have set up when you installed Enterprise Data Catalog.
To edit the catalog service options, click the pencil icon in the Catalog Service Options area. In the Edit Catalog Service Options window, edit the required fields.
The following table describes the catalog service options:
Property | Description |
---|
Catalog Service | Name of the Catalog Service associated with the Enterprise Data Preparation Service. |
Catalog Service User Name | User account to use to log in to the Catalog Service. |
Catalog Service User Password | Password for the Catalog Service user account. |
Modify Catalog Service User Password | Select this checkbox to modify the Catalog Service user password. |
Security Domain | LDAP security domain for the Catalog Service user. The field appears when the domain contains an LDAP security domain. |
Data Lake Security Options
Select the security mode and specify the related details using the Data Lake Security Options.
To edit the data lake security options, click the pencil icon in the Data Lake Security Options area. In the Edit Data Lake Security Options window, edit the required fields.
The following table describes the data lake security options:
Property | Description |
---|
Hadoop Authentication Mode | Security mode of the Hadoop cluster for the data lake. If the Hadoop cluster uses Kerberos authentication, you must set the required Hadoop security properties for the cluster. |
Principal Name for User Impersonation | Service principal name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. The user account for impersonation must be set in the Hadoop connection properties. Use the Administrator tool to view Hadoop connection properties. |
SPN Keytab File for User Impersonation | Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine on which the Enterprise Data Preparation Service runs. |
Data Lake Options
The data lake options include the Hive, HDFS, and Hadoop configuration details.
To edit the data lake options, click the pencil icon in the Data Lake Options area. In the Edit Data Lake Options window, edit the required fields.
The following table describes the data lake options:
Property | Description |
---|
HDFS Connection | HDFS connection for the data lake. |
HDFS Working Directory | HDFS directory where the Enterprise Data Preparation Service copies temporary data and files necessary for the service to run. This directory must have permissions to enable users to upload data. |
Hive Connection | Hive connection for the data lake. |
Hive Table Storage Format | Data storage format for the Hive tables. Select from the following options: - - DefaultFormat
- - Parquet
- - ORC
|
Hadoop Connection | Hadoop connection for the data lake. |
Event Logging Options
Use the Event Logging Options area to configure user activity event logging options.
To edit the event logging options, click the pencil icon. In the Edit Event Logging Options window, edit the required fields.
The following table describes the event logging options:
Property | Description |
---|
Log User Activity Events | Indicates whether the Enterprise Data Preparation Service logs user activity events for auditing. |
JDBC Port | JDBC port to use to get audit events. |
Upload and Download Options
After the project is published to the data lake, you can export the data to a .csv file and save it on the local drive. You can also upload flat files into the application.
To edit the upload and download options, click the pencil icon in the Upload and Download Options area. In the Edit Upload and Download Options window, edit the required fields.
The following table describes the upload and download options:
Property | Description |
---|
Maximum File Size for Uploads (MB) | Maximum size of the files that the users upload. The default value is 1 GB. |
Download Rows Size | Number of rows to export to a .csv file. You can specify a maximum of 2,000,000,000 rows. Enter a value of -1 to export all rows. |
MaxConcurrentUploadPlusDownloadActivities | Maximum number of threads in the thread pool executing upload and download activities. |
Data Asset Recommendation Options
The recommendation options are used to define the number of recommended data assets that can be displayed on the Projects page in the Enterprise Data Preparation application.
To edit the data asset recommendation options, click the pencil icon in the Data Asset Recommendation Options area. In the Edit Data Asset Recommendation Options window, edit the required fields.
The following table describes the data asset recommendation options:
Property | Description |
---|
Number of Recommendations to Display | The number of recommended data assets to display on the Projects page. You can specify a maximum of 50 recommendations. A value of 0 means no recommendations will be displayed. You can use recommended alternate or additional data assets to improve productivity. |
Sampling Options
You can specify the sample size to be retrieved in the Enterprise Data Preparation application in the Enterprise Data Preparation Service properties.
To edit the sampling options, click the pencil icon in the Sampling Options area. In the Edit Sampling Options window, edit the required fields.
The following table describes the sampling options:
Property | Description |
---|
Maximum Data Preparation Sample Size | The maximum number of sample rows to fetch for data preparation. You can specify a maximum number of 1,000,000 rows. |
Default Data Preparation Sample Size | The default number of sample rows to fetch for data preparation. You can specify a maximum number of 1,000,000 rows and a minimum of 1,000 rows. |
Apache Zeppelin Options
You can specify the Apache Zeppelin URL in the Enterprise Data Preparation Service properties.
To edit the Zeppelin options, click the pencil icon in the Zeppelin Options area. In the Edit Zeppelin Options window, edit the required fields.
The following table describes the Zeppelin options:
Property | Description |
---|
Zeppelin URL | The URL to access the Zeppelin framework. The URL should be formatted as: http[s]://<Zepplin host name>:<port> |
Note that if Apache Zeppelin uses a Spark 1.x version, you must specify the Spark version in an environment variable named
sparkVersion in the
Enterprise Data Preparation Service process properties. For more information, see
Apache Zeppelin Options.
Logging Options
Logging options include properties for the severity level for service logs. Configure the Log Severity property to set the logging level.
To edit the logging options, click the pencil icon in the Logging Options area. In the Edit Logging Options window, edit the required fields.
The following table describes the logging options:
Property | Description |
---|
Log Severity | Severity of messages to include in the logs. Select from one of the following values: - - FATAL. Writes FATAL messages to the log. FATAL messages include nonrecoverable system failures that cause the service to shut down or become unavailable.
- - ERROR. Writes FATAL and ERROR code messages to the log. ERROR messages include connection failures, failures to save or retrieve metadata, service errors.
- - WARNING. Writes FATAL, WARNING, and ERROR messages to the log. WARNING errors include recoverable system failures or warnings.
- - INFO. Writes FATAL, INFO, WARNING, and ERROR messages to the log. INFO messages include system and service change messages.
- - TRACE. Write FATAL, TRACE, INFO, WARNING, and ERROR code messages to the log. TRACE messages log user request failures.
- - DEBUG. Write FATAL, DEBUG, TRACE, INFO, WARNING, and ERROR messages to the log. DEBUG messages are user request logs.
|
Log Directory | Location of the directory of log files. |
Execution Options
Execution options include properties for the execution engine and the local system directory.
To edit the execution options, click the pencil icon in the Execution Options area. In the Edit Execution Options window, edit the required fields.
The following table describes the execution options:
Property | Description |
---|
Hive Execution Engine | Engine for running the mappings in the Hadoop environment. |
Local System Directory | Local directory that contains the files downloaded from the Enterprise Data Preparation application, such as .csv or .tde files |
Advanced Options
Use the Advanced Options area to configure optional Solr and NFS directory properties.
To edit the advanced options, click the pencil icon. In the Edit Advanced Options window, edit the required fields.
The following table describes the advanced options:
Property | Description |
---|
Solr JVM Options | Solr JVM options required to connect to the specified JDBC port used to retrieve data from Zookeeper. Set to connect to Zookeeper from an external client. |
Index Directory | Location of a shared NFS directory used by primary and secondary nodes in a multiple node installation. |