Enterprise Data Catalog Scanner Configuration Guide > Configuring Streaming Source Resources > Apache Kafka
  

Apache Kafka

Apache Kafka is an open-source distributed data streaming platform that provides a unified, high throughput, and low latency environment for handling real-time data.

Objects Extracted

The Apache Kafka resource extracts metadata from the schema details of the messages published to Kafka topics in an Apache Kafka data source.

Connect to an Apache Kafka Data Source Enabled for SSL

To connect to an Apache Kafka data source with a SSL enabled Schema Registry, perform the following steps:
  1. 1. Download the SSL certificates from the Schema Registry for Apache Kafka using a web browser.
  2. Note: Make sure that you import the SSL enabled certificate of the Schema Registry for Apache Kafka in the Certificates directory.
  3. 2. Copy the certificates to the <INFA_HOME>/services/shared/security/ directory.
  4. 3. Go to the <INFA_HOME>/source/java/jre/bin directory and then run the following keytool command to import each copied certificate as a trusted certificate in to the Informatica domain keystore:
  5. keytool -import -file <INFA_HOME>/services/shared/security/<certificate>.cer -alias <alias name> -keystore <INFA_HOME>/services/shared/security/infa_truststore.jks -storepass <Informatica domain keystore password>

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

The General tab includes the following properties:
Property
Description
Confluent Platform
Indicates if the selected platform is a stream data platform.
Schema Registry URL
A URL to access the schema registry. The URL syntax is: http://host1:port1
Note: If the schema registry is enabled for SSL, the public certificate of the Confluent Schema Registry must be imported in to the Informatica truststore.
Broker List
Specify the list of brokers in the following format: <host number>:<port number>. To specify multiple brokers, use a comma as the delimiter.
Security Protocol
Security protocol for communication with brokers. Some of the valid values are PLAINTEXT, SSL, SASL_PLAINTEXT, and SASL_SSL.
SSL Truststore File Path
Location of the default infa_truststore file along with the file name, from the domain node. Default path: INFA_HOME/services/shared/security/infa_truststore.jks
SSL Truststore Password
Default password for the SSL truststore.
SSL KeyStore File Path
Location of the default infa_keystore file along with the file name, from the domain node. Default path: INFA_HOME/services/shared/security/infa_keystore.jks
SSL KeyStore Password
Default password for the SSL keystore.
SSL Key Password
SSL key password.
SASL Authentication Mechanism
SASL mechanism used for client connections. Specify GSSAPI for Kerberos and PLAIN for a simple user authentication.
SASL JAAS Configuration
Specify the JAAS login context parameters for SASL connections in the format supported by the JAAS configuration file.
The format used is loginModuleClass controlFlag (optionName=optionValue)*;
For brokers, the JAAS configuration must be prefixed with the listener prefix and the name of the SASL mechanism must be in lowercase.
Kerberos Configuration File Path
Fully qualified path of the Kerberos configuration file, if you use Kerberos authentication for the resource.
API Timeout in milliseconds
Timeout duration for the Apache Kafka API. Specify the timeout in milliseconds.
The following table describes the properties that you can configure in the Source Metadata section of the Metadata Load Settings tab:
Property
Description
Enable Source Metadata
Enables metadata extraction.
Topic Names or Patterns
List of topic names or wildcard patterns that must be included or excluded from the metadata scan. An empty list indicates that all available topics are included in the metadata scan. Use a comma to separate the topic names.
For example, you can use the following wildcard patterns or regular expressions to include or exclude topics from the metadata scan:
  • - ^(abc).* Selects the topics or patterns that start with abc.
  • - <Topic name>_[^de]<Topic name>. Selects the topics that do not have the d and e characters in the example string.
  • - <Topic name>[iJK]<Topic name>\d. Selects the topics that have the i,J, and K characters, which end with a digit in the example string.
Polling Strategy
Select one of the polling strategies:
  • - Poll from Beginning
  • - Poll from End
Memory
The memory value required to run a scanner job.
Specify one of the following memory values:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options
JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, INFO, or ERROR. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the multiple key value pairs.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.
  • - -Djava.security.krb5.conf=<Path of the Kerberos configuration file>. Extracts metadata from the data source enabled for Kerberos when the security protocol is SASL.
Track Data Source Changes
View metadata source change notifications in Enterprise Data Catalog.

Running an Apache Kafka resource on Kerberos-enabled Cluster

If you want to run an Apache Kafka resource on a Kerberos-enabled cluster, perform the following steps:
  1. 1. Copy the krb.conf file to the following location: <Install Directory>/data/ldmbcmev/Informatica/LDM20_309/source/services/shared/security/krb5.conf
  2. 2. Copy the krb.conf file to /etc location on all the clusters where the Catalog Service is running.
  3. 3. Copy the keytab file to the /opt directory in the following locations:
  4. 4. Add the machine details of the kdc host in the etc/hosts location of the domain machine and the cluster machine where the Catalog Service is running.