Data Preparation Service
The Data Preparation Service manages data preparation within Enterprise Data Lake. When an analyst prepares data in a project, the Data Preparation Service stores worksheet metadata in the Data Preparation repository. When you create the service, you can associate other application services with it.
The following table summarizes the dependencies for products, services, and databases that are associated with the Data Preparation Service:
Dependency | Summary |
---|
Products | The following products use the Data Preparation Service: |
Services | If you plan to use rules during data preparation, you can provide a direct association with the following services: - - Model Repository Service
- - Data Integration Service
|
Databases | The Data Preparation Service uses the following database: - - Data Preparation repository. Stores worksheet metadata created when users prepare data assets for publication.
|
Installer | You can create the Data Preparation Service when you run the installer. |
Rules and Guidelines
Consider the following rules and guidelines for Data Preparation Service creation:
- •If you use the installer to create the Enterprise Data Lake and Data Preparation Service, you must create both of the application services on the same node.
- •If you plan to use rules, you must associate a Data Integration Service and a Model Repository Service with the Data Preparation Service.
- •If you want to create and enable the Data Preparation Service when you run the installer, the domain must contain connections associated with the Hadoop environment. For more information, see Prepare to Create the Enterprise Data Lake Services.
Data Preparation Repository Database Requirements
Set up the MySQL database or the Oracle database to use as the Data Preparation repository. The Data Preparation Service stores recipe and mapping metadata in the repository.
Allow 5 GB of disk space for the repository database. Allocate more space based on the amount of metadata you want to store.
MySQL and MariaDB Database Requirements
You can use a MySQL database or a MariaDB database as the Data Preparation repository.
Set the following system variables on the database server:
- •For MySQL version 5.6.26 and higher, set lower_case_table_names=1.
- •For MySQL version 5.7 and higher, set explicit_defaults_for_timestamp=1.
Set the same system variable values for a MariaDB database.
Ensure that the MySQL or MariaDB database has the following permissions:
- •Create tables and views.
- •Drop tables and views.
- •Insert, update, and delete data.
The MySQL connector .jar file is not included with the installer. You must download the file and copy it to the following directory before you start the installer:
$USER_INSTALL_DIR$/services/shared/jars/thirdparty/
Make sure the name of the file is in the following format:
mysql-connector-java-<versiondetails>.jar
You must also set the mysql_connector_jar_path environment variable to the location of the MySQL connector .jar file.
Oracle Database Requirements
You can use an Oracle database as the Data Preparation repository.
Ensure that the database has the following permissions:
- •Create tables and views.
- •Create sequence, session, and synonyms.
- •Drop tables and views.
- •Insert, update, and delete data.