Configure Enterprise Data Lake
This task includes installer prompts to configure Enterprise Data Lake. You will provide basic information for configuring the application services, Hadoop cluster, and creating the Enterprise Data Lake service .
When you complete the tasks, you will complete the installation.
Configure the Model Repository Database Details
Choose to create or associate a Model Repository Service and Data Integration Service with the Enterprise Data Lake Service.
If you choose to create a Model Repository Service, specify the connection details for the Model repository database.
1. Choose to create Model Repository Service and Data Integration Service instances to associate with the Enterprise Data Lake Service, or to associate existing application services with Enterprise Data Lake.
2. If you create a Model Repository Service, specify the connection details for the Model repository database.
The following table describes the parameters you set:
Property | Description |
---|
Database Type | Database for the Model repository managed by the Model Repository Service. |
Database User ID | User name of the database user account to use to log in to the Model repository database. |
User Password | Password for the Model repository database user account. |
Tablespace | Configure for a IBM DB2 database. Name of the tablespace in which to create the tables. The tablespace must be defined on a single node, and the page size must be 32K. This option is required for a multi-partition database. If this option is not selected for a single-partition database, the installer creates the tables in the default tablespace. |
Schema Name | Configure for a Microsoft SQL Server database. Name of the schema that will contain domain configuration tables. If not selected, the installer creates the tables in the default schema. |
3. Specify the truststore details required to access a secure Model repository database.
The following table describes the properties you set:
Property | Description |
---|
Database truststore file | Path and file name of the truststore file for the secure database. |
Database truststore password | Password for the truststore. |
4. Choose whether to configure the database connection using a JDBC URL or a custom JDBC connection string.
- - Press 1 to configure the database connection using a JDBC URL.
The following table describes the properties you set:
Property | Description |
---|
Database address | Host name and port number for the database in the format <host name>:<port>. |
Database service name | Service name for Oracle and IBM DB2 databases, or database name for Microsoft SQL Server. |
JDBC parameters | Optional parameters to include in the database connection string. Use the parameters to optimize database operations for the Model repository. You can use the default parameters or add or modify the parameters based on your database requirements. Verify that the parameter string is valid. The installer does not validate the parameter string before it adds the string to the JDBC URL. If not selected, the installer creates the JDBC URL without additional parameters. The following examples show the default connection strings for each database: - - Oracle. jdbc:Informatica:oracle://host_name:port_no ;ServiceName=
- - IBM DB2. jdbc:Informatica:db2://host_name:port_no ;DatabaseName=
- - Microsoft SQL Server. jdbc:Informatica:sqlserver://host_name:port_no ;SelectMethod=cursor;DatabaseName=
- - Azure SQL Server. jdbc:informatica:sqlserver://host_name:port_number ;DatabaseName=<database_name>;SnapshotSerializable=true; SnapshotSerializable=true;EncryptionMethod=SSL;HostNameInCertificate=*.<hostnameincertificate>;ValidateServerCertificate=true
|
- - Press 2 to configure the database connection using a custom JDBC connection string.
The following table describes the properties you set:
Property | Description |
---|
EncryptionMethod | Indicates whether data is encrypted when transmitted over the network. This parameter must be set to SSL. |
ValidateServerCertificate | Indicates whether Informatica validates the certificate that is sent by the database server. If this parameter is set to True, Informatica validates the certificate that is sent by the database server. If you specify the HostNameInCertificate parameter, Informatica also validates the host name in the certificate. If this parameter is set to false, Informatica does not validate the certificate that is sent by the database server. Informatica ignores any truststore information that you specify. |
HostNameInCertificate | Host name of the machine that hosts the secure database. If you specify a host name, Informatica validates the host name included in the connection string against the host name in the SSL certificate. |
The Application Service Details section appears.
Configure the Application Service Properties
If you create a Model Repository Service and a Data Integration Service to associate with Enterprise Data Lake during installation, specify the properties required to create each application service.
1. Specify the name of the Model Repository Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
2. Specify the name of the Data Integration Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
3. Specify the HTTP protocol type for the Data Integration Service, and then enter the port for each protocol you select.
- - To select HTTP only, press 1.
- - To select HTTPS only, press 2.
- - To select both HTTP and HTTPS, press 3.
4. If you select HTTPS or both HTTP and HTTPS, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in custom keystore and truststore files, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
The Cluster Configuration section appears.
Create the Cluster Configuration
Create the cluster configuration, which contains configuration information about the Hadoop cluster. The cluster configuration enables the Data Integration Service to push jobs to the Hadoop environment.
You import configuration properties from the Hadoop cluster to create a cluster configuration. You can import the properties from an archive file that the Hadoop administrator creates, or you can import the properties directly from the cluster.
When you create the cluster configuration, you can also choose to create Hadoop, Hive, HBase, and HDFS connections to the Hadoop environment. If you want the installer to create and enable the Enterprise Data Lake services, you must create the connections.
1. Enter the name of the cluster configuration to create.
2. Specify the Hadoop distribution for the cluster.
The following table describes the options you can specify:
Option | Description |
---|
1 | Select to create a cluster configuration for a Cloudera cluster. |
2 | Select to create a cluster configuration for a Hortonworks cluster. |
3 | Select to create a cluster configuration for a Azure HDInsight cluster. |
4 | Select to create a cluster configuration for a MapR cluster. You must import the MapR cluster configuration properties from an archive file. |
5 | Select to create a cluster configuration for an Amazon EMR cluster. You must import the Amazon EMR cluster configuration properties from an archive file. |
3. Import configuration properties from the Hadoop cluster to create the cluster configuration.
- - To import the properties from an archive file, press 1. If you create a cluster configuration for an Amazon EMR cluster or for a MapR cluster, you must import the properties from an archive file.
- - To import the properties directly from the cluster, press 2.
4. If you choose to import the properties directly from the cluster, specify the connection properties.
The following table describes the properties you specify:
Property | Description |
---|
Host | The host name or IP address of the cluster manager. |
Port | Port of the cluster manager. |
User ID | Cluster user name. |
Password | Password for the cluster user. |
Cluster Name | Name of the cluster. Use the display name if the cluster manager manages multiple clusters. If you do not provide a cluster name, the wizard imports information based on the default cluster. |
5. To create the Hadoop, Hive, HDFS, and HBase connections associated with the cluster, press 1.
The Data Preparation Repository Database section appears.
Configure the Data Preparation Repository Database Details
Specify the Data Preparation repository database connection details. You can choose to use an Oracle database or a MySQL database for the Data Preparation repository database.
If you do not have the database details, you can enter placeholder values, and then create the Data Preparation Service. If you continue without specifying the database connection details, you cannot enable the Data Preparation Service.
Oracle
- 1. To use an Oracle database for the Data Preparation repository, press 1.
- 2. Choose whether to connect to a non-secure database or a secure database.
- a. To connect to a non-secure database, press 1, and then enter the required properties.
The following table describes the non-secure connection properties:
Property | Description |
---|
Database Host Name | Host name of the machine that hosts the Data Preparation repository database. This entry appears only if your Database Type is Oracle. |
Database Port Number | Port number for the database. |
JDBC Parameters | Parameters required to connect to the database. |
Custom JDBC Connection String | Valid custom JDBC connection string. It should include the following parameters: (jdbc:informatica:oracle://host_name:port_no;ServiceName= |
- b. To connect to a secure database, press 2, and then enter the required properties.
The following table describes the secure connection properties:
Property | Description |
---|
Truststore File | Path and file name for the database truststore file. |
Truststore Password | Password for the database truststore file. |
Secure JDBC Parameters | List of secure database parameters to connect to the database. Format the parameters as follows: EncryptionMethod=SSL;HostNameInCertificate=;ValidateServerCertificate= |
Custom JDBC Connection String | Valid custom JDBC connection string. Format the string as follows: (jdbc:informatica:oracle://host_name:port_no;ServiceName= |
- 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.
MySQL
- 1. To use a MySQL database for the Data Preparation repository, press 2.
- 2. Enter the connection properties for the MySQL database.
The following table describes the connection properties:
Property | Description |
---|
Database Host Name | Host name of the machine that hosts the Data Preparation repository database. This entry appears only if your Database Type is MySQL. |
Database User Name | Database user account to use to connect to the Data Preparation repository. |
Database User Password | Password for the Data Preparation repository database user account. |
Database Port Number | Port number for the database. |
Database Name | Schema or database name of the Data Preparation repository database. |
- 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.
Create the Data Preparation Service
When you install Enterprise Data Lake on the master gateway node for the domain, you can create the Enterprise Data Lake Service and the Data Preparation Service during installation.
If you do not create the Enterprise Data Lake Service and the Data Preparation Service during installation, or if you install Enterprise Data Lake on another gateway node in the domain, you can use the Administrator tool to create the services after you install the Enterprise Data Lake binaries.
1. Specify the name of the Data Preparation Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
2. If you plan to use rules, you must associate a Model Repository Service and a Data Integration Service with the Data Preparation Service.
- - To skip associating a Model Repository Service and a Data Integration Service with the Enterprise Data Lake Service, press 1.
- - To associate a Model Repository Service and a Data Integration Service with the Data Preparation Service, press 2, and then enter the service names.
3. To create the Data Preparation Service during installation, enter the name of the current node.
If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation.
If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node.
4. Choose whether to enable secure communication for the Data Preparation Service.
- - To enable secure communication for the Data Preparation Service, press 1.
- - To disable secure communication, press 2.
5. If you enable secure communication for the service, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in a custom keystore and truststore, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
6. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
7. Select the Hadoop authentication mode.
- - To select the non-secure authentication mode, press 1.
- - To select Kerberos authentication, press 2.
8. If you select Kerberos, enter the authentication parameters.
The following table describes the authentication parameters that you must set if you select Kerberos:
Property | Description |
---|
HDFS Principal Name | Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user/_HOST@REALM. |
Hadoop Impersonation User Name | User name to use in Hadoop impersonation as shown in the Impersonation User Name property for the Hadoop connection in the Administrator tool. If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder. |
Kerberos Keytab File | Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs. |
9. Specify the HDFS storage location, HDFS connection, local storage location, and Solr port number details.
The following table describes the properties you must set:
Property | Description |
---|
HDFS Storage Location | HDFS location for data preparation file storage. If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder. |
HDFS Connection | HDFS connection for data preparation file storage. |
Local Storage Location | Directory for data preparation file storage on the node on which the Data Preparation Service runs. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS storage location. |
Solr port | Solr port number for the Apache Solr server used to provide data preparation recommendations. |
10. Choose whether to enable the Data Preparation Service.
- - To enable the service at a later time using the Administrator tool, press 1.
- - To enable the service after you complete the installation process, press 2.
The Enterprise Data Lake Service section appears.
Create the Enterprise Data Lake Service
When you install Enterprise Data Lake on the master gateway node for the domain, you can create the Enterprise Data Lake Service and the Data Preparation Service during installation.
If you do not create the Enterprise Data Lake Service and the Data Preparation Service during installation, or if you install Enterprise Data Lake on another gateway node in the domain, you can use the Administrator tool to create the services after you install the Enterprise Data Lake binaries.
1. Specify the details for the Enterprise Data Lake Service.
The following table describes the properties that you set:
Property | Description |
---|
Enterprise Data Lake Service Name | Name of the Enterprise Data Lake Service. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [ |
Data Preparation Service Name | Name of the Data Preparation Service to associate with the Enterprise Data Lake Service. |
Model Repository Service Name | Name of the Model Repository Service to associate with the Enterprise Data Lake Service. |
Data Integration Service Name | Name of the Data Integration Service associated with the Enterprise Data Lake Service. |
Node Name | To create the Enterprise Data Lake Service during installation, enter the name of the current node. If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation. If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node. |
2. Choose whether to enable secure communication for the Data Preparation Service.
- - To enable secure communication for the Data Preparation Service, press 1.
- - To disable secure communication, press 2.
3. If you enable secure communication for the service, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in a custom keystore and truststore, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
4. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
5. Specify the data lake connection properties.
The following table describes the properties that you set for the data lake connections:
Property | Description |
---|
HDFS Connection | HDFS connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection. |
HDFS Working Directory | HDFS directory where the Enterprise Data Lake Service copies temporary data and files necessary for the service to run. |
Hadoop Connection | Hadoop connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection. |
Hive Connection | Hive connection for the data lake. If you selected the option to create the connection when creating the cluster configuration, the installer sets this value to the name created for the connection. |
Hive Table Storage Format | Data storage format for the Hive tables. |
Local System Directory | Local directory that contains the files downloaded from Enterprise Data Lake application, such as .csv and .tde files. |
6. Choose whether to enable logging of user activity events.
- - To disable logging of user activity events, press 1.
- - To enable logging of user activity events, press 2.
7. Select the Hadoop authentication mode.
- - To select the non-secure authentication mode, press 1.
- - To select Kerberos authentication, press 2.
8. If you select Kerberos, enter the authentication parameters.
9. The following table describes the authentication parameters that you set if you select Kerberos:
The following table describes the authentication properties that you set if you select Kerberos:
Property | Description |
---|
Kerberos Principal | If the Hadoop cluster uses Kerberos authentication, specify the Service Principal Name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. |
Kerberos KeyTab File | If the Hadoop cluster uses Kerberos authentication, specify the path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Enterprise Data Lake Service runs. |
- - To enable the service at a later time using the Administrator tool, press 1.
- - To enable the service immediately after you create the service, press 2.