Install Enterprise Data Lake on a Node with Enterprise Data Catalog
You can use the installer to install Enterprise Data Lake on a domain node where Enterprise Data Catalog is installed.
You can create the Enterprise Data Lake Service and the Data Preparation Service during the installation process. If you want the installer to create the services, it creates both services on a node. The installer prompts for connection objects associated with the Hadoop environment.
To create the services, the domain must be integrated with the Hadoop environment before you run the installer. For more information about integrating the domain with the Hadoop environment, see the Informatica 10.2.1 Big Data Management Hadoop Integration Guide.
Informatica recommends that you associate dedicated Model Repository Service and Data Integration Service instances with the Enterprise Data Lake Service. You can create a Model Repository Service and Data Integration Service to associate with the Enterprise Data Lake Service during installation, or you can associate existing services with the Enterprise Data Lake Service.
If you create a Model Repository Service, you must specify the details for the Model repository database used by the Model Repository Service.
Install the Enterprise Data Lake Binaries
When you install Enterprise Data Lake on a domain node on which Enterprise Data Catalog is already installed, you indicate that domain services and Enterprise Data Catalog are already installed on the node.
1. On a shell command line, run the install.sh file from the root directory.
2. Press 1 to install the Informatica Big Data products.
3. Press 3 to run the installer.
4. Press 2 to agree to the terms and conditions.
5. Press 2 to continue with the installation.
6. Press 3 to install Enterprise Data Lake.
7. Press 2 to indicate that the Informatica services are installed on the node.
8. Press 2 to indicate that Enterprise Data Catalog is installed on the node.
9. Press 2 to tune the application services for better performance based on your deployment type.
10. Enter the directory where you want to install Enterprise Data Lake.
The first time you install Enterprise Data Lake, enter the Enterprise Data Catalog installation directory.
11. Choose how to proceed if Enterprise Data Lake is already installed in the specified directory.
- - Press 1 to change the installation directory.
- - Press 2 to overwrite the existing installation.
12. Review the pre-installation summary, then click Enter.
13. Ensure the current node is shut down, then click Enter.
The Domain Details section appears.
Configure the Domain Details
Configure the domain details.
1. Press 2 if the current node is the master gateway node for the domain.
2. Enter the domain name.
3. Enter the name of the current node.
4. Enter the domain administrator user name and password.
5. Press 1 to continue with the installation.
The Associated Services section appears.
Configure the Associated Services
Configure the application services required by Enterprise Data Lake.
1. Enter the name of the Catalog Service to associate with Enterprise Data Lake.
2. Enter the name of the Model Repository Service associated with the Catalog Service.
3. Enter the name of the Data Integration Service associated with the Catalog Service.
4. Choose whether to enable the Content Management Service.
- - To skip enablement of the Content Management Service, press 1.
- - To enable the Content Management Service, press 2, and then enter the name of the service.
5. Choose whether to create a Model Repository Service and a Data Integration Service to associate with the Enterprise Data Lake Service, or to associate existing application services with Enterprise Data Lake.
The Application Services for Enterprise Data Lake section appears.
Configure the Model Repository Database Connection Details
If you choose to create a Model Repository Service to associate with Enterprise Data Lake, specify the connection details for the Model repository database.
1. If you create a Model Repository Service, specify the connection details for the Model repository database.
The following table describes the parameters you set:
Property | Description |
---|
Database Type | Database for the Model repository managed by the Model Repository Service. |
Database User ID | User name of the database user account to use to log in to the Model repository database. |
User Password | Password for the Model repository database user account. |
Tablespace | Configure for a IBM DB2 database. Name of the tablespace in which to create the tables. The tablespace must be defined on a single node, and the page size must be 32K. This option is required for a multi-partition database. If this option is not selected for a single partition database, the installer creates the tables in the default tablespace. |
Schema Name | Configure for a Microsoft SQL Server database. Name of the schema that will contain domain configuration tables. If not selected, the installer creates the tables in the default schema. |
2. Specify the truststore details required to access a secure Model repository database.
The following table describes the properties you set:
Property | Description |
---|
Database Truststore File | Path and file name of the truststore file for the secure database. |
Database Truststore Password | Password for the truststore. |
3. Choose whether to configure the database connection using a JDBC URL or a custom JDBC connection string.
- - Press 1 to configure the database connection using a JDBC URL.
The following table describes the properties you set:
Property | Description |
---|
Database Address | Host name and port number for the database in the format <host name>:<port>. |
Database Service Name | Service name for Oracle and IBM DB2 databases, or database name for Microsoft SQL Server. |
JDBC Parameters | The JDBC connection string used to connect to the Model repository database. You can use the default parameters or add or modify the parameters based on your database requirements. Verify that the parameter string is valid. The installer does not validate the parameter string before it adds the string to the JDBC URL. If not selected, the installer creates the JDBC URL without additional parameters. Use the following JDBC connect string syntax for each supported database: - - IBM DB2. jdbc:informatica:db2://<host name>:<port>;DatabaseName=<database name>;BatchPerformanceWorkaround=true;DynamicSections=3000
- - Microsoft SQL Server that uses the default instance. jdbc:informatica:sqlserver://<host name>:<port>;DatabaseName=<database name>;SnapshotSerializable=true
- - Microsoft SQL Server that uses a named instance. jdbc:informatica:sqlserver://<host name>\<named instance name>;DatabaseName=<database name>;SnapshotSerializable=true
- - Azure SQL Server. jdbc:informatica:sqlserver://<host name>:<port>;DatabaseName=<database name>;SnapshotSerializable=true; SnapshotSerializable=true;EncryptionMethod=SSL;HostNameInCertificate=*.<host name in certificate>;ValidateServerCertificate=true
- - Oracle. jdbc:informatica:oracle://<host name>:<port>;SID=<database name>;MaxPooledStatements=20;CatalogOptions=0;BatchPerformanceWorkaround=true
|
- - Press 2 to configure the database connection using a custom JDBC connection string.
The following table describes the properties you set:
Property | Description |
---|
EncryptionMethod | Required. Indicates whether data is encrypted when transmitted over the network. This parameter must be set to SSL. |
ValidateServerCertificate | Optional. Indicates whether Informatica validates the certificate that is sent by the database server. If this parameter is set to True, Informatica validates the certificate that is sent by the database server. If you specify the HostNameInCertificate parameter, Informatica also validates the host name in the certificate. If this parameter is set to false, Informatica does not validate the certificate that is sent by the database server. Informatica ignores any truststore information that you specify. |
HostNameInCertificate | Optional. Host name of the machine that hosts the secure database. If you specify a host name, Informatica validates the host name included in the connection string against the host name in the SSL certificate. |
The Service Parameters section appears.
Configure the Application Service Properties
If you create a Model Repository Service and a Data Integration Service to associate with Enterprise Data Lake during installation, specify the properties required to create each application service.
1. Specify the name of the Model Repository Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
2. Specify the name of the Data Integration Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
3. Specify the HTTP protocol type for the Data Integration Service, and then enter the port for each protocol you select.
- - To select HTTP only, press 1.
- - To select HTTPS only, press 2.
- - To select both HTTP and HTTPS, press 3.
4. If you select HTTPS or both HTTP and HTTPS, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in custom keystore and truststore files, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
The Data Preparation Repository Database section appears.
Configure the Data Preparation Repository Database Details
Specify the Data Preparation repository database connection details. You can choose to use an Oracle database or a MySQL database for the Data Preparation repository database.
If you do not have the database details, you can enter placeholder values, and then create the Data Preparation Service. If you continue without specifying the database connection details, you cannot enable the Data Preparation Service.
Oracle
- 1. To use an Oracle database for the Data Preparation repository, press 1.
- 2. Choose whether to connect to a non-secure database or a secure database.
- a. To connect to a non-secure database, press 1, and then enter the required properties.
The following table describes the non-secure connection properties:
Property | Description |
---|
Database Host Name | Host name of the machine that hosts the Data Preparation repository database. This entry appears only if your Database Type is Oracle. |
Database Port Number | Port number for the database. |
JDBC Connection String | JDBC connection string to connect to an Oracle database. Use the following connection string format: jdbc:informatica:oracle://<database host name>:<port>;ServiceName=<database name> |
JDBC Parameters | Additional parameters required to connect to an Oracle database. |
- b. To connect to a secure database, press 2, and then enter the required properties.
The following table describes the secure connection properties:
Property | Description |
---|
Truststore File | Path and file name for the database truststore file. |
Truststore Password | Password for the database truststore file. |
Connection String | JDBC connection string to connect to an Oracle database. Use the following connection string format: jdbc:informatica:oracle://<database host name>:<port>;ServiceName=<database name> |
Secure JDBC Parameters | Additional parameters required to connect to a secure Oracle database. Format the parameters as follows: EncryptionMethod=SSL;HostNameInCertificate=<secure database host name>;ValidateServerCertificate=true |
- 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.
MySQL
- 1. To use a MySQL database for the Data Preparation repository, press 2.
- 2. Enter the connection properties for the MySQL database.
The following table describes the connection properties:
Property | Description |
---|
Database Host Name | Host name of the machine that hosts the Data Preparation repository database. The entry appears only if your database type is MySQL. |
Database User Name | Database user account to use to connect to the Data Preparation repository. |
Database User Password | Password for the Data Preparation repository database user account. |
Database Port Number | Port number for the database. |
Schema Name | Schema or database name of the Data Preparation repository database. |
- 3. Press Enter to continue if the database connection fails. You can use the Administrator tool to update the database details and enable the Data Preparation Service later.
The Data Preparation Service Details section appears.
Create the Data Preparation Service
Create the Data Preparation Service.
1. Specify the name of the Data Preparation Service.
The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [
2. If you plan to use rules, you must associate a Data Integration Service and a Model Repository Service with the Data Preparation Service.
- - To skip associating a Model Repository Service and a Data Integration Service with the Enterprise Data Lake Service, press 1.
- - To associate a Model Repository Service and a Data Integration Service with the Data Preparation Service, press 2, and then enter the service names.
3. To create the Data Preparation Service during installation, enter the name of the current node.
If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation.
If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node.
4. Choose whether to enable secure communication for the Data Preparation Service.
- - To enable secure communication for the Data Preparation Service, press 1.
- - To disable secure communication, press 2.
5. If you enable secure communication for the service, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in a custom keystore and truststore, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
6. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
7. Select the Hadoop authentication mode.
- - To select the non-secure authentication mode, press 1.
- - To select Kerberos authentication, press 2.
8. If you select Kerberos, enter the authentication parameters.
The following table describes the authentication parameters that you set if you select Kerberos:
Property | Description |
---|
HDFS Principal Name | Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user/_HOST@REALM. |
Hadoop Impersonation User Name | User name to use in Hadoop impersonation as shown in the Impersonation User Name property for the Hadoop connection in the Administrator tool. If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder. |
Kerberos Keytab File | Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs. |
9. Specify the HDFS storage location, HDFS connection, local storage location, and Solr port number details.
The following table describes the properties you set:
Property | Description |
---|
HDFS Storage Location | HDFS location for data preparation file storage. If the Hadoop cluster uses Kerberos authentication, the Hadoop impersonation user must have read, write, and execute permissions on the HDFS storage location folder. |
HDFS Connection | HDFS connection for data preparation file storage. |
Local Storage Location | Directory for data preparation file storage on the node on which the Data Preparation Service runs. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS storage location. |
Solr port | Solr port number for the Apache Solr server used to provide data preparation recommendations. |
10. Choose whether to enable the Data Preparation Service.
- - To enable the service at a later time using the Administrator tool, press 1.
- - To enable the service after you complete the installation process, press 2.
The Enterprise Data Lake Service Details section appears.
Create the Enterprise Data Lake Service
Create the Enterprise Data Lake Service.
1. Specify the details for the Enterprise Data Lake Service.
The following table describes the properties that you set:
Property | Description |
---|
Enterprise Data Lake Service Name | Name of the Enterprise Data Lake Service. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [ |
Data Preparation Service Name | Name of the Data Preparation Service to associate with the Enterprise Data Lake Service. |
Model Repository Service Name | Name of the Model Repository Service to associate with the Enterprise Data Lake Service. |
Data Integration Service Name | Name of the Data Integration Service associated with the Enterprise Data Lake Service. |
Node Name | To create the Enterprise Data Lake Service during installation, enter the name of the current node. If you do not want to create the service during installation, do not enter a value. You can use the Administrator tool to create the service after installation. If you create the Enterprise Data Lake Service and the Data Preparation Service during installation, you must create both services on the same node. |
2. Choose whether to enable secure communication for the Data Preparation Service.
- - To enable secure communication for the Data Preparation Service, press 1.
- - To disable secure communication, press 2.
3. If you enable secure communication for the service, select the SSL certificate to use.
- - To use the default Informatica SSL certificate contained in the default keystore and the default truststore, press 1.
- - To use a custom SSL certificate contained in a custom keystore and truststore, press 2, and then enter the path and file name for the keystore and truststore files. You must also enter the keystore and truststore passwords.
4. If you enable secure communication for the service, enter the port number for the HTTPS connection. If you enable non-secure communication for the service, enter the port number for the HTTP connection.
5. Specify the data lake connection options.
The following table describes the properties that you set for the data lake connections:
Property | Description |
---|
HDFS Connection | HDFS connection for the data lake. |
HDFS Working Directory | HDFS directory where the Enterprise Data Lake Service copies temporary data and files necessary for the service to run. |
Hadoop Connection | Hadoop connection for the data lake. |
Hive Connection | Hive connection for the data lake. |
Hive Table Storage Format | Data storage format for the Hive tables. |
Local System Directory | Local directory that contains the files downloaded from Enterprise Data Lake application, such as .csv and .tde files. The default directory is /home/toolprod. |
6. Choose whether to enable logging of user activity events.
- - To disable logging of user activity events, press 1.
- - To enable logging of user activity events, press 2.
7. Select the Hadoop authentication mode.
- - To select the non-secure authentication mode, press 1.
- - To select Kerberos authentication, press 2.
8. If you select Kerberos, enter the authentication parameters.
9. Choose whether to enable the Enterprise Data Lake Service immediately after you create the service.
The following table describes the authentication properties that you must set if you select Kerberos:
Property | Description |
---|
Kerberos Principal | If the Hadoop cluster uses Kerberos authentication, specify the Service Principal Name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. |
Kerberos KeyTab File | If the Hadoop cluster uses Kerberos authentication, specify the path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Enterprise Data Lake Service runs. |
- - To enable the service at a later time using the Administrator tool, press 1.
- - To enable the service immediately after you create the service, press 2.