Installation and Configuration Guide > Part III: Run the Big Data Suite Installer > Create a Domain and Install All Big Data Products > Configure Enterprise Data Catalog
  

Configure Enterprise Data Catalog

This task includes installer prompts to configure Enterprise Data Catalog. You will provide basic information for configuring the application services and Hadoop cluster.
When you complete the preliminary tasks, you will continue with the installer prompts to configure Enterprise Data Lake.

Configure Profiling Warehouse Database Details

If you chose to configure the service parameters, you can provide warehouse information.
    1. Select the database type for the profiling warehouse.
    The following table lists the database type for the profiling warehouse.
    Prompt
    Description
    Database type
    Type of database for the profiling warehouse connection. Select from the following options:
    1 - Oracle
    2 - Microsoft SQL Server
    3 - IBM DB2
    2. Enter the properties for the database user account.
    The following table lists the properties for the database user account:
    Property
    Description
    Database user ID
    Name for the profiling warehouse database user account.
    User password
    Password for the profiling warehouse database user account.
    3. Based on the database type selected, enter the parameters for the database.
    1. a. If you select IBM DB2, select whether to configure a tablespace and enter the tablespace name.
    2. The following table describes the properties that you must configure for the IBM DB2 database:
      Property
      Description
      Configure tablespace
      Select whether to specify a tablespace:
      1 - No
      2 - Yes
      In a single-partition database, if you select No, the installer creates the tables in the default tablespace. In a multi-partition database, you must select Yes.
      Tablespace
      Name of the tablespace in which to create the tables. Specify a tablespace that meets the pageSize requirement of 32768 bytes.
      In a single-partition database, if you select Yes to configure the tablespace, enter the name of the tablespace in which to create the tables.
      In a multi-partition database, specify the name of the tablespace that resides in the catalog partition of the database.
    3. b. If you select Microsoft SQL Server, enter the schema name for the database.
    4. The following table describes the properties that you must configure for the Microsoft SQL Server database:
      Property
      Description
      Schema name
      Name of the schema that will contain domain configuration tables. If this parameter is blank, the installer creates the tables in the default schema.
    5. c. To enter the JDBC connection information using the JDBC URL information, press 1. To enter the JDBC connection information using a custom JDBC connection string, press 2.
    6. d. Enter the JDBC connection information.
    7. e. Enter the data access connection string.
The Content Management Service Parameters and Database section appears.

Configure the Content Management Service Parameters and Database

After you configure the profiling warehouse, you can configure the content management service parameters and database properties.
    1. Enter configuration parameters for the Content Management Service.
    The following table lists the parameters for the Content Management Service:
    Prompt
    Description
    Content Management Service name
    Name of the Content Management Service.
    2. Enter the following service parameter information:
    Port
    Description
    HTTP protocol type
    Type of connection to the Data Integration Service. Select one of the following options:
    • - HTTP. Requests to the service uses an HTTP connection.
    • - HTTPS. Requests to the service uses a secure HTTP connection.
    HTTP port
    Port number to used for the Data Integration Service. Default is 9085.
    HTTPS port
    Port number to used for the Data Integration Service. Default is 9085.
    3. Enter database information for the reference data warehouse.
    The following table lists the database information for the reference data warehouse.
    Prompt
    Description
    Database type
    Type of database for reference data warehouse. Select from the following options:
    1 - Oracle
    2 - Microsoft SQL Server
    3 - IBM DB2
    4. Enter the properties for the database user account.
    The following table lists the properties for the database user account:
    Property
    Description
    Database user ID
    Name for the reference data warehouse database user account.
    User password
    Password for the profiling warehouse database user account.
    5. Based on the database type selected, enter the parameters for the database .
    1. a. If you select IBM DB2, select whether to configure a tablespace and enter the tablespace name.
    2. The following table describes the properties that you must configure for the IBM DB2 database:
      Property
      Description
      Configure tablespace
      Select whether to specify a tablespace:
      1 - No
      2 - Yes
      In a single-partition database, if you select No, the installer creates the tables in the default tablespace. In a multi-partition database, you must select Yes.
      Tablespace
      Name of the tablespace in which to create the tables. Specify a tablespace that meets the pageSize requirement of 32768 bytes.
      In a single-partition database, if you select Yes to configure the tablespace, enter the name of the tablespace in which to create the tables.
      In a multi-partition database, specify the name of the tablespace that resides in the catalog partition of the database.
    3. b. If you select Microsoft SQL Server, enter the schema name for the database.
    4. The following table describes the properties that you must configure for the Microsoft SQL Server database:
      Property
      Description
      Schema name
      Name of the schema that will contain domain configuration tables. If this parameter is blank, the installer creates the tables in the default schema.
    5. c. To enter the JDBC connection information using the JDBC URL information, press 1. To enter the JDBC connection information using a custom JDBC connection string, press 2.
    6. d. Enter the JDBC connection information.
    7. e. Enter the data access connection string.
The Cluster and Application Service Options section appears.

Configure External Cluster Details

After you configure the parameters for the Content Management Service, you can configure the cluster and application service options.
    1. Select the cluster type to configure.
    The following table describes the options you can select:
    Option
    Description
    Cloudera
    Select to create a cluster configuration for a Cloudera cluster.
    Hortonworks
    Select to create a cluster configuration for a Hortonworks cluster.
    Azure HDInsight
    Select to create a cluster configuration for a Azure HDInsight cluster.
    2. Select an option to confirm if the cluster uses Kerberos authentication
    1. a. Press 1 if the cluster does not use Kerberos authentication.
    2. b. Press 2 if the cluster uses Kerberos authentication.
The Catalog Service Parameters for the Existing Cluster section appears.

Configure the Catalog Service for the External Cluster

After you can configure the external cluster, you can configure the catalog service parameters for the existing cluster.
    1. Enter the information to configure the Catalog Service parameters for the existing cluster, if the cluster uses Kerberos authentication.
    The following table describes the properties you need to set for configuring the Catalog Service parameters for the existing cluster.
    Option
    Description
    Catalog Service name
    Name of the Catalog Service.
    Catalog Service port
    Port number of the Catalog Service.
    Cluster Hadoop distribution URL
    URL to access the Hadoop cluster.
    Cluster Hadoop distribution URL user
    User name to access the Hadoop cluster.
    Cluster Hadoop distribution URL password
    Password to access the Hadoop cluster.
    Service cluster name
    Name of the service cluster.
    KDC domain name
    Domain name of the Kerberos Key Distribution Center.
    Keytab location
    Location of the Kerberos Key Distribution Center (KDC).
    Fully qualified path to the Kerberos configuration file
    Location of the fully qualifies path to the Kerberos configuration file.
    YARN Queue Name
    The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
    2. If you chose cluster type as Others, enter the information to configure the Catalog Service parameters for the existing cluster.
    The following table describes the properties you need to set for configuring the Catalog Service parameters for the existing cluster.
    Option
    Description
    Catalog Service name
    Name of the Catalog Service.
    Catalog Service port
    Port number of the Catalog Service.
    Yarn resource manager URI
    Applies to external cluster. The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster.
    Use the following format:<Hostname>:<Port>
    Where
    <host name> is the name or IP address of the Yarn resource manager.-
    <port number> is the port number on which Yarn resource manager listens for Remote Procedure Calls (RPC).
    Yarn resource manager HTTPS or HTTP URI
    Applies to external cluster. https or http URI value for the Yarn resource manager.
    Yarn resource manager scheduler URII
    Applies to external cluster. Scheduler URI value for the Yarn resource manager.
    Zookeeper Addresses
    Multiple ZooKeeper addresses in a comma-separated list.
    HDFS Nodename URI
    Applies to external cluster. The URI to access HDFS.
    Use the following format to specify the NameNode URI in the Cloudera distribution:<Hostname>:<Port>
    Where
    • - <host name> is the host name or IP address of the NameNode
    • - <port number> is the port number that the NameNode listens for Remote Procedure Calls (RPC).
    History Server HTTP URI
    Applies to external cluster. Specify a value to generate YARN allocation log files for scanners. Catalog Administrator displays the log URL as part of task monitoring.
    Service cluster name
    Name of the service cluster.
    HDFS Service Name for High Availability
    Applies to highly available external cluster. Specify the HDFS service namApplies to both internal and external clusters. Name of the service cluster. Ensure that you have a directory /Informatica/LDM/<ServiceClusterName> in HDFS.
    Note: If you do not specify a service cluster name, Enterprise Data Catalog considers DomainName_CatalogServiceName as the default value. You must then have the /Informatica/LDM/<DomainName>_<CatalogServiceName> directory in HDFS. Otherwise, Catalog Service might fail.
    HDFS Service Principal Name
    Applies to Kerberos authentication. Principal name for the HDFS Service.
    YARN Service Principal Name
    Applies to Kerberos authentication. Principal name for the YARN Service.
    KDC domain name
    The domain name of the Kerberos Key Distribution Center (KDC).
    Keytab location
    The location of the Kerberos Key Distribution Center (KDC).
    Fully qualified path to the Kerberos configuration file
    Location of the fully qualifies path to the Kerberos configuration file.
    YARN Queue Name
    The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
    3. Select the load type.
    The following table describes the options you can choose.
    Option
    Description
    Demo
    Represents single datastore. Used for demo purpose.
    Low
    Represents one million assets or 30-40 datastores.
    Medium
    Represents 20 million assets or 200-400 datastores.
    High
    Represents 50 million assets or 500-100 datastores.
The Model Repository Database section appears.

Configure the Catalog Service for the Embedded Cluster

If you chose to use the Embedded cluster, you can configure the Catalog Service for the embedded cluster.
    bullet Configure the Hadoop cluster properties in the dialog box.
    The following table describes the properties:
    Property
    Description
    Gateway User
    User name for the Apache Ambari server.
    Informatica Cluster Service Name
    Name of the Informatica Cluster Service for the internal cluster.
    Informatica Cluster Service Port
    Port number for the Informatica Cluster Service.
    Informatica Hadoop Gateway Host
    Host where Apache Ambari server runs.
    Informatica Hadoop Nodes
    Hosts where the Apache Ambari agents run.
    Informatica Hadoop Gateway Port
    Web port for the Apache Ambari server.
    Override default password
    Select this option if you want to change the default password for the cluster.
    New Hadoop Ambari Password
    Password for the Ambari Hadoop cluster.
    Confirm Hadoop Ambari Password
    Confirm the password for the Ambari Hadoop cluster.
    KDC Type
    Select one of the following Kerberos Key Distribution Center (KDC) types if you had selected the Enable Kerberos Authentication option:
    • - MIT KDC. Select this option if you want to use MIT KDC.
    • - Active Directory. Select this option if you want to use Active Directory KDC.
    Specify the following options after you select the KDC Type
    • - KDC Host. Name of the KDC host machine.
    • - Administrator Server Host. The name of the administrator server machine that hosts the KDC server.
    • - Realm. Name of the Kerberos realm on the machine that hosts the KDC server.
    • - Administrator Principal. The Kerberos administrator principal.
    • - Administrator Password. The Kerberos administrator password.
    • - LDAP URL. This property applies to Microsoft Active Directory and represents the URL to the LDAP server directory.
    • - Container DN. This property applies to Microsoft Active Directory and represents the Distinguished Name of the container to which the user belongs.
    • - KDC Certificate Path. Path to the KDC certificate on the Informatica domain machine.