Apache Kafka
Apache Kafka is an open-source distributed data streaming platform that provides a unified, high throughput, and low latency environment for handling real-time data.
Objects Extracted
The Apache Kafka resource extracts metadata from the schema details of the messages published to Kafka topics in an Apache Kafka data source.
Connect to an Apache Kafka Data Source Enabled for SSL
To connect to an Apache Kafka data source with a SSL enabled Schema Registry, perform the following steps:
- 1. Download the SSL certificates from the Schema Registry for Apache Kafka using a web browser.
Note: Make sure that you import the SSL enabled certificate of the Schema Registry for Apache Kafka in the Certificates directory.
- 2. Copy the certificates to the <INFA_HOME>/services/shared/security/ directory.
- 3. Go to the <INFA_HOME>/source/java/jre/bin directory and then run the following keytool command to import each copied certificate as a trusted certificate in to the Informatica domain keystore:
keytool -import -file <INFA_HOME>/services/shared/security/<certificate>.cer -alias <alias name> -keystore <INFA_HOME>/services/shared/security/infa_truststore.jks -storepass <Informatica domain keystore password>
Resource Connection Properties
The General tab includes the following properties:
Property | Description |
---|
Confluent Platform | Indicates if the selected platform is a confluent platform. |
Schema Registry URL | URL to access the schema registry. URL syntax is http://host1:port1 Note: The public certificate of the Confluent Schema Registry must be imported in to the Informatica truststore if the schema registry is enabled for SSL. |
Broker List | Specify the list of brokers in the following format: <host number>:<port number>. Use a comma as the delimiter to specify multiple brokers. |
Security Protocol | Security protocol for communication with brokers. Some of the valid values are PLAINTEXT, SSL, SASL_PLAINTEXT, and SASL_SSL. |
SSL Truststore File Path | Location of the SSL-enabled truststore file. |
SSL Truststore Password | Password for the SSL truststore. |
SSL KeyStore File Path | Location of the SSL-enabled keystore file. |
SSL KeyStore Password | Password for the SSL keystore. |
SSL Key Password | SSL key password. |
SASL Authentication Mechanism | SASL mechanism used for client connections. Specify GSSAPI for Kerberos and PLAIN for a simple user authentication. |
SASL JAAS Configuration | Specify the JAAS login context parameters for SASL connections in the format supported by the JAAS configuration file. The format used is loginModuleClass controlFlag (optionName=optionValue)*; For brokers, the JAAS configuration must be prefixed with the listener prefix and the name of the SASL mechanism must be in lowercase. |
Kerberos Configuration File Path | Fully qualified path of the Kerberos configuration file if you use Kerberos authentication for the resource. |
API Timeout in milliseconds | Timeout duration for the Apache Kafka API. Specify the timeout in milliseconds. |
The following table describes the properties that you can configure in the Source Metadata section of the Metadata Load Settings tab:
Property | Description |
---|
Enable Source Metadata | Enables metadata extraction. |
Topic Names or Patterns | List of topic names or wildcard patterns that must be included or excluded from the metadata scan. An empty list indicates that all available topics are included in the metadata scan. Use a comma to separate the topic names. For example, you can use the following wildcard patterns or regular expressions to include or exclude topics from the metadata scan: - - ^(abc).* Selects the topics or patterns that start with abc.
- - <Topic name>_[^de]<Topic name>. Selects the topics that do not have the d and e characters in the example string.
- - <Topic name>[iJK]<Topic name>\d. Selects the topics that have the i,J, and K characters, which end with a digit in the example string.
|
Polling Strategy | Select one of the polling strategies: - - Poll from Beginning
- - Poll from End
|
Memory | The memory value required to run a scanner job. Specify one of the following memory values: Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal |
JVM Options | JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, INFO, or ERROR. Default value is INFO.
- - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number.
- - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the multiple key value pairs.
- - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.
- - -Djava.security.krb5.conf=<Path of the Kerberos configuration file>. Extracts metadata from the data source enabled for Kerberos when the security protocol is SASL.
|
Track Data Source Changes | View metadata source change notifications in Enterprise Data Catalog. |
Running an Apache Kafka resource on Kerberos-enabled Cluster
If you want to run an Apache Kafka resource on a Kerberos-enabled cluster, perform the following steps:
- 1. Copy the krb.conf file to the following location: <Install Directory>/data/ldmbcmev/Informatica/LDM20_309/source/services/shared/security/krb5.conf
- 2. Copy the krb.conf file to /etc location on all the clusters where the Catalog Service is running.
- 3. Copy the keytab file to the /opt directory in the following locations:
- - Common location for all clusters where Catalog Service is running.
- - The domain machine.
- - The Kerberos cluster machine.
- 4. Add the machine details of the kdc host in the etc/hosts location of the domain machine and the cluster machine where the Catalog Service is running.