Create and Configure the Enterprise Data Lake Service

1. In the Administrator tool, click the Manage tab.

2. Click the Services and Nodes view.

3. In the Domain Navigator, select the domain.

4. Click Actions > New > Enterprise Data Lake Service.

5. Enter the following properties:

Property	Description
Name	Name of the Enterprise Data Lake Service. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > \| ! ( ) ] [
Description	Description of theEnterprise Data Lake Service. The description cannot exceed 765 characters.
Location	Location of the Enterprise Data Lake Service in the Informatica domain. You can create the service within a folder in the domain.
License	License object that allows the use of the Enterprise Data Lake Service.
Node Assignment	Type of node in the Informatica domain on which the Enterprise Data Lake Service runs. Select Single Node if a single service process runs on the node or Primary and Backup Nodes if a service process is enabled on each node for high availability. However, only a single process runs at any given time, and the other processes maintain standby status. The Primary and Backup Nodes option is available based on the license configuration. Default is Single Node.
Node	Name of the node on which theEnterprise Data Lake Service runs.
Backup Nodes	If your license includes high availability, nodes on which the service can run if the primary node is unavailable.

6. Click Next.

7. Enter the following properties for the Model Repository Service:

Property	Description
Model Repository Service	Name of the Model Repository Service associated with the Enterprise Data Lake Service.
Model Repository Service User Name	User account to use to log in to the Model Repository Service.
Model Repository Service User Password	Password for the Model Repository Service user account.

8. Click Next.

9. Enter the following properties for the Data Preparation Service, Data Integration Service, and Catalog Service:

Property	Description
Data Preparation Service	Name of the Data Preparation Service associated with the Enterprise Data Lake Service.
Data Integration Service	Name of the Data Integration Service associated with the Enterprise Data Lake Service.
Catalog Service	Name of the Catalog Service associated with the Enterprise Data Lake Service.
Catalog Service User Name	User account to use to log in to the Catalog Service.
Catalog Service User Password	Password for the Catalog Service user account.

10. Click Next.

11. Enter the following data lake security properties:

Property	Description
Hadoop Authentication Mode	Security mode of the Hadoop cluster for the data lake. If the Hadoop cluster uses Kerberos authentication, you must set the required Hadoop security properties for the cluster.
Principal Name for User Impersonation	Service principal name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. The user account for impersonation must be set in the Hadoop connection properties. Use the Administrator tool to view Hadoop connection properties.
SPN Keytab File for User Impersonation	Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Enterprise Data Lake Service runs.

12. Click Next.

13. Enter the following connection properties:

Property	Description
HDFS Connection	HDFS connection for the data lake.
HDFS Working Directory	HDFS directory where the Enterprise Data Lake Service copies temporary data and files necessary for the service to run.
Hive Connection	Hive connection for the data lake.
Hive Table Storage Format	Data storage format for the Hive tables. Select from the following options: - DefaultFormat - Parquet - ORC
Hadoop Connection	Hadoop connection for the data lake.

14. Click Next.

15. Enter the following user event logging properties:

Property	Description
Log User Activity Events	Indicates whether the Enterprise Data Lake Service logs user activity events.
JDBC Port	JDBC port to use to retrieve user activity event data.

16. Click Next.

17. Enter the following data asset upload and download properties:

Property	Description
Maximum File Size for Uploads (MB)	The maximum size of the files that can be uploaded.
Maximum Number of Rows to Download	Number of rows to export to a .csv file. You can specify a maximum of 2,000,000,000 rows. Enter a value of -1 to export all rows.

18. Click Next.

19. Enter the following asset recommendation property, which configures how Enterprise Data Lake displays recommendations during project creation:

Property	Description
Number of Recommendations to Display	The number of recommended data assets to display on the Projects page. You can specify a maximum of 50 recommendations. A value of 0 means no recommendations will be displayed.

20. Click Next.

21. Enter the following sampling properties, which configure how Enterprise Data Lake displays sampling data during data preparation:

Property	Description
Maximum Data Preparation Sample Size	The maximum number of sample rows to fetch for data preparation. You can specify a maximum number of 1,000,000 rows.
Default Data Preparation Sample Size	Number of sample rows to fetch for data preparation. You can specify a maximum number of 1,000,000 rows and a minimum of 1,000 rows.

22. Click Next.

23. Enter the following Apache Zeppelin property:

Property	Description
Zeppelin URL	The URL to access the Zeppelin framework. The URL should be in the following format: http[s]://<zeppelin host name>:<port>

24. Click Next.

25. Enter the following logging configuration property:

Property	Description
Log Severity	Severity of messages to include in the logs. Select from one of the following values: - FATAL. Writes FATAL messages to the log. FATAL messages include nonrecoverable system failures that cause the service to shut down or become unavailable. - ERROR. Writes FATAL and ERROR code messages to the log. ERROR messages include connection failures, failures to save or retrieve metadata, service errors. - WARNING. Writes FATAL, WARNING, and ERROR messages to the log. WARNING errors include recoverable system failures or warnings. - INFO. Writes FATAL, INFO, WARNING, and ERROR messages to the log. INFO messages include system and service change messages. - TRACE. Write FATAL, TRACE, INFO, WARNING, and ERROR code messages to the log. TRACE messages log user request failures. - DEBUG. Write FATAL, DEBUG, TRACE, INFO, WARNING, and ERROR messages to the log. DEBUG messages are user request logs. Default value is INFO.
Log Directory	Location of the directory to save the log files.

Property

Description

Log Severity

Severity of messages to include in the logs. Select from one of the following values:

- FATAL. Writes FATAL messages to the log. FATAL messages include nonrecoverable system failures that cause the service to shut down or become unavailable.
- ERROR. Writes FATAL and ERROR code messages to the log. ERROR messages include connection failures, failures to save or retrieve metadata, service errors.
- WARNING. Writes FATAL, WARNING, and ERROR messages to the log. WARNING errors include recoverable system failures or warnings.
- INFO. Writes FATAL, INFO, WARNING, and ERROR messages to the log. INFO messages include system and service change messages.
- TRACE. Write FATAL, TRACE, INFO, WARNING, and ERROR code messages to the log. TRACE messages log user request failures.
- DEBUG. Write FATAL, DEBUG, TRACE, INFO, WARNING, and ERROR messages to the log. DEBUG messages are user request logs.

Default value is INFO.

Log Directory

Location of the directory to save the log files.

26. Click Next.

27. Enter the following Hive execution engine and the local system directory properties:

Property	Description
Hive Execution Engine	The Hive execution engine for Enterprise Data Lake Service, which runs mappings in the Hadoop environment.
Local System Directory	Local directory that contains the files downloaded from Enterprise Data Lake, such as .csv or .tde files.

28. Click Next.

29. Enter the following advanced properties:

Property	Description
Solr JVM Options	Solr JVM options required to connect to the specified JDBC port used to retrieve user event logs. Set the property to connect from an external client.
Index Directory	Location of a shared NFS directory used by primary and secondary nodes in a multiple node Enterprise Data Lake installation.

30. Click Next.

31. Enter the following properties:

Property	Description
HTTP Port	Port number for the HTTP connection to the Enterprise Data Lake Service.
Enable Secure Communication	Use a secure connection to connect to the Enterprise Data Lake Service. If you enable secure communication, you must enter all required HTTPS options.
HTTPS Port	Port number for the HTTPS connection to the Enterprise Data Lake Service.
Keystore File	Path and the file name of keystore file that contains key and certificates required for the HTTPS connection.
Keystore Password	Password for the keystore file.
Truststore File	Path and the file name of the truststore file that contains authentication certificates for the HTTPS connection.
Truststore Password	Password for the truststore file.

32. Select Enable Service if you want to enable the service immediately after you create the service.

If you want to enable the service at a later time, in the Domain Navigator, select the service and then select Actions > Enable Service.

33. Click Finish.

Create and Configure the Enterprise Data Lake Service

Creating the Enterprise Data Lake Service