Operational Insights > Part III: Monitoring on-premises applications > Register and manage domains > Configure the Data Engineering Integration collector
  

Configure the Data Engineering Integration collector

Configure the Data Engineering Integration collector to collect and upload statistics on Hadoop clusters used by the domain, including statistics on jobs run on the clusters. You must select Data Engineering Integration in the list of products used by the domain on the Domain Connection panel to configure the collector.
The default collection frequency is every 1 hour. You can create a custom schedule to suit your requirements.
The collector is enabled by default. Clear the Enabled checkbox to disable the collector.
Click Finish when you finish configuring the collector.

Collecting historical data

You can configure the collector to populate Operational Insights with up to 60 days of historical data.
Historical data is collected for the previous 30 days by default. However you can specify any number of days between 1 and 60.
Data collection begins at the time the domain is added to Operational Insights. Roughly 24 hours worth of data is collected every hour, meaning that approximately 30 hours are required to populate Operational Insights with data for the prior month.
Historical data collection is enabled by default. Clear the Collect Historical Data checkbox to disable historical data collection.

Selecting the cluster configuration

If the domain is a Data Engineering Integration domain, select the cluster configuration used by the domain to connect to the Hadoop cluster. The Data Engineering Integration collector uses the cluster configuration to gather job execution statistics and operational metrics for the cluster.
You can view the cluster configurations created in the domain in the Connections tab in Informatica Administrator (the Administrator tool).
    1Click Select Cluster Configuration.
    2Select the cluster configuration to use to connect to the Hadoop cluster from the menu.
    3Select the Enable Cluster Configuration checkbox to enable the collector to collect data from the cluster.
    4To connect to a secure cluster, click TLS Enabled, and then specify the path and password for the cluster truststore file.
    5Click Save to save the configuration.

Configure the Collector Schedule

You can configure a custom schedule for the collector. The schedule you create overrides the default collector schedule.
Enter the following properties:
Property
Description
Repeats
The interval at which to repeat collection.
Repeats Frequency
The frequency at which to perform collection.
The frequency is based on the repetition value you select. For example, to collect data every two hours, select Hourly as the repetition, and then set the frequency value to 2.
Starts on
The date and time the custom schedule takes effect.
Timezone
The timezone the schedule is based on.

Connecting to a cluster secured using Kerberos authentication

If the Data Engineering Integration collector collects analytics from a cluster secured using Kerberos authentication, you must add custom Kerberos properties to the configuration for the Secure Agent the Data Engineering Integration domain uses.
To find the Secure Agent the collector uses,
    1Log in to Operational Insights.
    2Select the domain, and then click the Details tab.
    3Locate the name of the Secure Agent the domain uses in the Secure Agent Group property.
    4Click Secure Agents in the left hand navigation bar.
    5Select the Secure Agent, then click Manage.
    The Details page for the Secure Agent opens in the Administrator application.
    6Click Edit.
    7Click the + symbol next to a property in the Custom Configuration section of the page to add a new custom property.
    8For each property, select OpsInsights Data Collector from the Service menu, and then select OpsInsights from the Type menu.
    9Enter the following custom properties. The table below describes the properties to add:
    Name
    Value
    kerberosPrincipal
    The Service Principal Name (SPN) assigned in Active Directory to the user that runs Data Integration Service jobs on the cluster.
    kerberosKeyTabFile
    The path and file name of the keytab file on the node where the Secure Agent runs.
    On both Linux and Windows hosts, specify the value as follows:
    /<Secure Agent installation directory>/<file name>.keytab
    kerberosConfFile
    The path to the krb5.conf file on the node where the Secure Agent runs.
    On both Linux and Windows hosts, specify the value as follows:
    /<Secure Agent installation directory>/krb5.conf
    10Click Save.