Python Transformation Setup Requirements
Before you can use the Python transformation, you must prepare the Spark engine to process the Python transformation.
Complete the following tasks:
- •Install Python and Jep on the Data Integration Service machine.
- •Configure Spark execution parameters in the Hadoop connection.
Install Python, JEP, and Third-Party Libraries
Install Python to run the Python code in the Python transformation. When you install Python, you must install the Jep package. Optionally, you can install additional third-party libraries.
Install Python with the --enable-shared option to ensure that shared libraries are accessible by Jep.
The Python transformation supports the following Python versions:
To install Jep, consider the following installation options:
- •Run pip install jep. Use this option if Python is installed with the pip package.
- •Configure the Jep binaries. Ensure that jep.jar can be accessed by Java classloaders, the shared Jep library can be accessed by Java, and Jep Python files can be accessed by Python.
Optionally, you can install third-party libraries such as numpy, scikit-learn, and cv2. You can access the third-party libraries in the Python transformation.
After you install Python, Jep, and any third-party libraries, copy the Python installation folder to the following location on the Data Integration Service machine:
<Informatica installation directory>/services/shared/spark/python
Changes take effect after you restart the Data Integration Service.
Configure Spark Execution Parameters
To configure the Spark engine to run the Python transformation, configure the following Spark execution parameters:
- infaspark.pythontx.executorEnv.LD_PRELOAD
The location of the Python shared library in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.executorEnv.LD_PRELOAD=
<Informatica installation directory>/services/shared/spark/python/lib/libpython3.6m.so
- infaspark.pythontx.submit.lib.JEP_HOME
The location of the Jep package in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.submit.lib.JEP_HOME=
<Informatica installation directory>/services/shared/spark/python/lib/python3.6/site-packages/jep/