Python Transformation Overview
Use the Python transformation to execute Python code in a mapping that runs on the Spark engine.
The Python transformation is a passive transformation that provides an interface to define transformation functionality using Python code. You reference the Python code and the resource files that you use in the Python code within the Python transformation.
You can use a Python transformation to implement a machine model on the data that you pass to the transformation. For example, you can use the Python transformation to write Python code that loads a pre-trained model. You can use the pre-trained model to classify input data or create predictions.
Before you can use the Python transformation, you must install Python and Jep and prepare the Spark engine to run the Python transformation.
Note: The Python transformation is available for technical preview. Technical preview functionality is supported but is not production-ready. Informatica recommends that you use in non-production environments only.
Data Type Conversion
A Python transformation converts Developer tool data types to Python data types, based on the Python transformation port type.
When a Python transformation reads input rows, it converts input port data types to Python data types. When a Python transformation writes output rows, it converts Python data types to output port data types.
For example, the following processing occurs for an input port with the decimal data type in a Python transformation:
- •The Python transformation converts the decimal data type in the input port to the Python float data type.
- •The transformation uses the value in the input port as the value for the Python float data type.
- •To generate the output row, the Python transformation converts the Python float data type to the decimal data type.
The following table shows how the Python transformation maps Developer tool data types to Python data types:
Developer Tool Data Type | Python Data Type |
---|
Integer | Int |
Decimal | Float |
Double | Float |
Binary | PyJArray |
Timestamp | Datetime |
String | Str |
The Python transformation does not support data types that are not listed in this table. |
When you write code in the Python transformation, the data types of the output ports in the Python transformation must be compatible with the data types in the Python code. Thus, if you configure an output port in the Python transformation to be a decimal data type, the corresponding variable in the Python code must be a float.
Data Types in Input and Output Ports
The data types in corresponding input and output ports in the Python transformation must be the same. If the data types are not the same, convert the data type in the Python code.
For example, you create an input port with the integer data type and an output port with the string data type. You define the Python code to process the data in the input port and write the data to the output port. In the Python code, you can use the Python function str() to convert the integer data type in the input port and write the output as a string data type in the output port.
Note: When you pass a binary data type to the Python transformation, the Python transformation converts the binary data type to a PyJArray. In the Python code, you can convert the PyJArray to a different Python data type such as a byte, a bytearray, or a struct that you can use in the code. When you define the output variable, you must convert the Python data type to a data type that is supported in the Python transformation.