Installation and Configuration Guide > Part IV: After You Install the Services > Create and Configure Application Services > Create and Configure the Data Preparation Service
  

Create and Configure the Data Preparation Service

The Data Preparation Service manages data preparation within Enterprise Data Lake. When an analyst prepares data in a project, the Data Preparation Service stores worksheet metadata in the Data Preparation repository.
The service connects to the Hadoop cluster to read sample data from Hive tables. The service connects to the HDFS system in the Hadoop cluster to store the sample data being prepared in the worksheet.
Create the Data Preparation Service before you create the Enterprise Data Lake Service. You must associate the Enterprise Data Lake Service with a Data Preparation Service.

Creating the Data Preparation Service

Use the service creation wizard in the Administrator tool to create the service.
    1. In the Administrator tool, click the Managetab.
    2. Click the Services and Nodes view.
    3. In the Domain Navigator, select the domain.
    4. Click Actions > New > Data Preparation Service.
    5. Enter the following properties:
    Property
    Description
    Name
    Name of the Data Preparation service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
    Description
    Description of the Data Preparation service. The description cannot exceed 765 characters.
    Location
    Location of the Data Preparation Service in the Informatica domain. You can create the service within a folder in the domain.
    License
    License object with the data lake option that allows the use of the Data Preparation Service.
    Node Assignment
    Type of node in the Informatica domain on which the Data Preparation Service runs. Select Single Node if a single service process runs on the node or Primary and Backup Nodes if a service process is enabled on each node for high availability. However, only a single process runs at any given time, and the other processes maintain standby status.
    The Primary and Backup Nodes option will be available for selection based on the license configuration.
    Select the Grid option to ensure horizontal scalability by using grid for the Data Preparation Service with multiple Data Preparation Service nodes. Improved scalability supports high performance, interactive data preparation during increased data volumes and number of users. Each user is assigned a node in the grid using round-robin method to distribute the load across the nodes.
    Default is Single Node.
    Node
    Name of the node on which the Data Preparation Service runs.
    Backup Nodes
    If your license includes high availability, nodes on which the service can run if the primary node is unavailable.
    Select each backup node on which the service runs.
    Grid
    Select the grid that you want to use for the Data Preparation Service.
    6. Click Next.
    7. If you plan to use rules, you must associate the Data Preparation Service with the Model Repository Service that manages the Model repository that contains the rule objects and metadata. You must also associate a Data Integration Service with the Data Preparation Service that runs rules during data preparation.
    Enter the following properties for the Model Repository Service and the Data Integration Service required to enable rules:
    Property
    Description
    Model Repository Service Name
    Name of the Model Repository Service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [You cannot change the name of the service after you create it.
    Model Repository Service User Name
    User name to access the Model Repository Service.
    Model Repository Service Password
    Password to access the Model Repository Service.
    Data Integration Service Name
    Name of the Data Integration Service.
    8. Click Next.
    9. Enter the following communication properties:
    Property
    Description
    HTTP Port
    Port number for the HTTP connection to the Data Preparation Service.
    Enable Secure Communication
    Use a secure connection to connect to the Data Preparation Service. If you enable secure communication, you must set all required HTTPS properties, including the keystore and truststore properties.
    HTTPS Port
    Port number for the HTTPS connection to the Data Preparation Service.
    Keystore File
    Path and the file name of keystore file that contains key and certificates required for HTTPS communication.
    Keystore Password
    Password for the keystore file.
    Truststore File
    Path and the file name of truststore file that contains authentication certificates for the HTTPS connection.
    Truststore Password
    Password for the truststore file.
    10. Click Next.
    11. Enter the following Data Preparation repository database connection properties:
    Property
    Description
    Database Type
    Type of database to use for the Data Preparation repository.
    Host Name
    Host name of the machine that hosts a MySQL database.
    Port Number
    Port number for a MySQL database.
    Connection String
    Connection string used to access an Oracle database.
    Use the following connection string format:
    jdbc:informatica:oracle://<database host name>:<port>;ServiceName=<database name>
    Secure JDBC Parameters
    Secure JDBC parameters required to access a secure Oracle database.
    If the database is secure, information such as TrustStore and TrustStorePassword can be included in this field. The information is saved in an encrypted format. Parameters usually configured include the following:
    EncryptionMethod=<encryption method>;HostNameInCertificate=<host name>;TrustStore=<truststore file name and path>;TrustStorePassword=<truststore password>;KeyStore==<keystore file name and path>;KeyStorePassword=<keystore password>;ValidateServerCertificate=<true}false>
    Database User Name
    Database user account to use to connect to the database.
    Database User Password
    Password for the database user account.
    Schema Name
    Schema or database name for a MySQL database.
    12. Click Next.
    13. Enter the following rules execution property:
    Property
    Description
    Rules Server Port
    Port used by the rules server managed by the Data Preparation Service. Set the value to an available port on the node where the Data Preparation Service runs.
    14. Click Next.
    15. Enter the following Solr property:
    Property
    Description
    Solr Port
    Port number for the Apache Solr server used to provide data preparation recommendations.
    16. Click Next.
    17. Enter the following data preparation properties:
    Property
    Description
    Local Storage Location
    Directory for data preparation file storage on the node where the Data Preparation Service runs.
    HDFS Connection
    HDFS connection for data preparation file storage.
    HDFS Storage Location
    HDFS location for data preparation file storage. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS location.
    18. Click Next.
    19. Enter the following Hive security properties:
    Property
    Description
    Hadoop Authentication Mode
    Security mode enabled for the Hadoop cluster for data preparation storage. If the Hadoop cluster uses Kerberos authentication, you must set the required Hadoop security properties for the cluster.
    HDFS Service Principal Name
    Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format:
    user/_HOST@REALM
    Hadoop Impersonation User Name
    User name to use in Hadoop impersonation as set in the Hadoop connection properties. Use the Administrator tool to view Hadoop connection properties.
    SPN Keytab File for User Impersonation
    Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs.
    20. Click Next.
    21. Enter the following logging configuration property:
    Property
    Description
    Log Severity
    Severity of messages to include in the logs. Select from the following values:
    • - FATAL. Writes FATAL messages to the log. FATAL messages include nonrecoverable system failures that cause the service to shut down or become unavailable.
    • - ERROR. Writes FATAL and ERROR code messages to the log. ERROR messages include connection failures, failures to save or retrieve metadata, service errors.
    • - WARNING. Writes FATAL, WARNING, and ERROR messages to the log. WARNING errors include recoverable system failures or warnings.
    • - INFO. Writes FATAL, INFO, WARNING, and ERROR messages to the log. INFO messages include system and service change messages.
    • - TRACE. Write FATAL, TRACE, INFO, WARNING, and ERROR code messages to the log. TRACE messages log user request failures.
    • - DEBUG. Write FATAL, DEBUG, TRACE, INFO, WARNING, and ERROR messages to the log. DEBUG messages are user request logs.
    Default value is INFO.
    22. Click Finish.
    23. Select the Data Integration Service in the Domain Navigator, and then select Actions > Create Repository to create the repository contents.
    24. Select Actions > Enable Service to enable the Data Preparation Service.