Installation and Configuration Guide > Part III: Run the Big Data Suite Installer > Create a Domain and Install All Big Data Products > Configure Enterprise Data Lake
  

Configure Enterprise Data Lake

This task includes installer prompts to configure Enterprise Data Lake. You will provide basic information for configuring the application services, Hadoop cluster, and creating the Enterprise Data Lake service .
When you complete the tasks, you will complete the installation.

Configure the Model Repository Database Details

Choose to create or associate a Model Repository Service and Data Integration Service with the Enterprise Data Lake Service.
If you choose to create a Model Repository Service, specify the connection details for the Model repository database.
    1. Choose to create Model Repository Service and Data Integration Service instances to associate with the Enterprise Data Lake Service, or to associate existing application services with Enterprise Data Lake.
    2. If you create a Model Repository Service, specify the connection details for the Model repository database.
    The following table describes the parameters you set:
    Property
    Description
    Database Type
    Database for the Model repository managed by the Model Repository Service.
    Database User ID
    User name of the database user account to use to log in to the Model repository database.
    User Password
    Password for the Model repository database user account.
    Tablespace
    Configure for a IBM DB2 database. Name of the tablespace in which to create the tables. The tablespace must be defined on a single node, and the page size must be 32K.
    This option is required for a multi-partition database. If this option is not selected for a single-partition database, the installer creates the tables in the default tablespace.
    Schema Name
    Configure for a Microsoft SQL Server database. Name of the schema that will contain domain configuration tables. If not selected, the installer creates the tables in the default schema.
    3. Specify the truststore details required to access a secure Model repository database.
    The following table describes the properties you set:
    Property
    Description
    Database truststore file
    Path and file name of the truststore file for the secure database.
    Database truststore password
    Password for the truststore.
    4. Choose whether to configure the database connection using a JDBC URL or a custom JDBC connection string.
The Application Service Details section appears.

Configure the Application Service Properties

If you create a Model Repository Service and a Data Integration Service to associate with Enterprise Data Lake during installation, specify the properties required to create each application service.
    1. Specify the name of the Model Repository Service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
    2. Specify the name of the Data Integration Service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
    3. Specify the HTTP protocol type for the Data Integration Service, and then enter the port for each protocol you select.
    4. If you select HTTPS or both HTTP and HTTPS, select the SSL certificate to use.
The Cluster Configuration section appears.

Create the Cluster Configuration

Create the cluster configuration, which contains configuration information about the Hadoop cluster. The cluster configuration enables the Data Integration Service to push jobs to the Hadoop environment.
You import configuration properties from the Hadoop cluster to create a cluster configuration. You can import the properties from an archive file that the Hadoop administrator creates, or you can import the properties directly from the cluster.
When you create the cluster configuration, you can also choose to create Hadoop, Hive, HBase, and HDFS connections to the Hadoop environment. If you want the installer to create and enable the Enterprise Data Lake services, you must create the connections.
    1. Enter the name of the cluster configuration to create.
    2. Specify the Hadoop distribution for the cluster.
    The following table describes the options you can specify:
    Option
    Description
    1
    Select to create a cluster configuration for a Cloudera cluster.
    2
    Select to create a cluster configuration for a Hortonworks cluster.
    3
    Select to create a cluster configuration for a Azure HDInsight cluster.
    4
    Select to create a cluster configuration for a MapR cluster. You must import the MapR cluster configuration properties from an archive file.
    5
    Select to create a cluster configuration for an Amazon EMR cluster. You must import the Amazon EMR cluster configuration properties from an archive file.
    3. Import configuration properties from the Hadoop cluster to create the cluster configuration.
    4. If you choose to import the properties directly from the cluster, specify the connection properties.
    The following table describes the properties you specify:
    Property
    Description
    Host
    The host name or IP address of the cluster manager.
    Port
    Port of the cluster manager.
    User ID
    Cluster user name.
    Password
    Password for the cluster user.
    Cluster Name
    Name of the cluster. Use the display name if the cluster manager manages multiple clusters. If you do not provide a cluster name, the wizard imports information based on the default cluster.
    5. To create the Hadoop, Hive, HDFS, and HBase connections associated with the cluster, press 1.
The Data Preparation Repository Database section appears.

Configure the Data Preparation Repository Database Details

Specify the Data Preparation repository database connection details. You can choose to use an Oracle database or a MySQL database for the Data Preparation repository database.
If you do not have the database details, you can enter placeholder values, and then create the Data Preparation Service. If you continue without specifying the database connection details, you cannot enable the Data Preparation Service.

Oracle

  1. 1. To use an Oracle database for the Data Preparation repository, press 1.
  2. 2. Choose whether to connect to a non-secure database or a secure database.
    1. a. To connect to a non-secure database, press 1, and then enter the required properties.
    2. The following table describes the non-secure connection properties:
      Property
      Description
      Database Host Name
      Host name of the machine that hosts the Data Preparation repository database. This entry appears only if your Database Type is Oracle.
      Database Port Number
      Port number for the database.
      JDBC Parameters
      Parameters required to connect to the database.
      Custom JDBC Connection String
      Valid custom JDBC connection string. It should include the following parameters:
      (jdbc:informatica:oracle://host_name:port_no;ServiceName=
    3. b. To connect to a secure database, press 2, and then enter the required properties.
    4. The following table describes the secure connection properties:
      Property
      Description
      Truststore File
      Path and file name for the database truststore file.
      Truststore Password
      Password for the database truststore file.
      Secure JDBC Parameters
      List of secure database parameters to connect to the database. Format the parameters as follows:
      EncryptionMethod=SSL;HostNameInCertificate=;ValidateServerCertificate=
      Custom JDBC Connection String
      Valid custom JDBC connection string. Format the string as follows:
      (jdbc:informatica:oracle://host_name:port_no;ServiceName=
  3. 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.

MySQL

  1. 1. To use a MySQL database for the Data Preparation repository, press 2.
  2. 2. Enter the connection properties for the MySQL database.
  3. The following table describes the connection properties:
    Property
    Description
    Database Host Name
    Host name of the machine that hosts the Data Preparation repository database. This entry appears only if your Database Type is MySQL.
    Database User Name
    Database user account to use to connect to the Data Preparation repository.
    Database User Password
    Password for the Data Preparation repository database user account.
    Database Port Number
    Port number for the database.
    Database Name
    Schema or database name of the Data Preparation repository database.
  4. 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.

Create the Data Preparation Service

When you install Enterprise Data Lake on the master gateway node for the domain, you can create the Enterprise Data Lake Service and the Data Preparation Service during installation.
If you do not create the Enterprise Data Lake Service and the Data Preparation Service during installation, or if you install Enterprise Data Lake on another gateway node in the domain, you can use the Administrator tool to create the services after you install the Enterprise Data Lake binaries.
    1. Specify the name of the Data Preparation Service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
    2. If you plan to use rules, you must associate a Model Repository Service and a Data Integration Service with the Data Preparation Service.
    3. To create the Data Preparation Service during installation, enter the name of the current node.
    If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation.
    If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node.
    4. Choose whether to enable secure communication for the Data Preparation Service.
    5. If you enable secure communication for the service, select the SSL certificate to use.
    6. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
    7. Select the Hadoop authentication mode.
    8. If you select Kerberos, enter the authentication parameters.
    The following table describes the authentication parameters that you must set if you select Kerberos:
    Property
    Description
    HDFS Principal Name
    Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user/_HOST@REALM.
    Hadoop Impersonation User Name
    User name to use in Hadoop impersonation as shown in the Impersonation User Name property for the Hadoop connection in the Administrator tool.
    If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder.
    Kerberos Keytab File
    Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs.
    9. Specify the HDFS storage location, HDFS connection, local storage location, and Solr port number details.
    The following table describes the properties you must set:
    Property
    Description
    HDFS Storage Location
    HDFS location for data preparation file storage. If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder.
    HDFS Connection
    HDFS connection for data preparation file storage.
    Local Storage Location
    Directory for data preparation file storage on the node on which the Data Preparation Service runs. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS storage location.
    Solr port
    Solr port number for the Apache Solr server used to provide data preparation recommendations.
    10. Choose whether to enable the Data Preparation Service.
The Enterprise Data Lake Service section appears.

Create the Enterprise Data Lake Service

When you install Enterprise Data Lake on the master gateway node for the domain, you can create the Enterprise Data Lake Service and the Data Preparation Service during installation.
If you do not create the Enterprise Data Lake Service and the Data Preparation Service during installation, or if you install Enterprise Data Lake on another gateway node in the domain, you can use the Administrator tool to create the services after you install the Enterprise Data Lake binaries.
    1. Specify the details for the Enterprise Data Lake Service.
    The following table describes the properties that you set:
    Property
    Description
    Enterprise Data Lake Service Name
    Name of the Enterprise Data Lake Service.
    The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
    Data Preparation Service Name
    Name of the Data Preparation Service to associate with the Enterprise Data Lake Service.
    Model Repository Service Name
    Name of the Model Repository Service to associate with the Enterprise Data Lake Service.
    Data Integration Service Name
    Name of the Data Integration Service associated with the Enterprise Data Lake Service.
    Node Name
    To create the Enterprise Data Lake Service during installation, enter the name of the current node.
    If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation.
    If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node.
    2. Choose whether to enable secure communication for the Data Preparation Service.
    3. If you enable secure communication for the service, select the SSL certificate to use.
    4. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
    5. Specify the data lake connection properties.
    The following table describes the properties that you set for the data lake connections:
    Property
    Description
    HDFS Connection
    HDFS connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection.
    HDFS Working Directory
    HDFS directory where the Enterprise Data Lake Service copies temporary data and files necessary for the service to run.
    Hadoop Connection
    Hadoop connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection.
    Hive Connection
    Hive connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection.
    Hive Table Storage Format
    Data storage format for the Hive tables.
    Local System Directory
    Local directory that contains the files downloaded from Enterprise Data Lake application, such as .csv and .tde files.
    6. Choose whether to enable logging of user activity events.
    7. Select the Hadoop authentication mode.
    8. If you select Kerberos, enter the authentication parameters.
    9. The following table describes the authentication parameters that you set if you select Kerberos:
    The following table describes the authentication properties that you set if you select Kerberos:
    Property
    Description
    Kerberos Principal
    If the Hadoop cluster uses Kerberos authentication, specify the Service Principal Name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster.
    Kerberos KeyTab File
    If the Hadoop cluster uses Kerberos authentication, specify the path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Enterprise Data Lake Service runs.