Python Transformation Components
The Python transformation includes the components that allow you to run a Python script on the data that you pass to the transformation.
The Python transformation contains the following components:
- Resource File
A file that contains the resources that you access in the Python code.
The file can be a pre-trained model that has been trained on a larger data set outside the Developer tool. You can use the pre-trained model to classify data or make predictions based on the data that you pass to the Python transformation. You can access the pre-trained model in the Python code.
- Python Code
- The Python script that the Python transformation uses to process data that you pass to the Python transformation. When you write Python code, you might reconstruct input variables, load a pre-trained model, and define output variables.
Resource File
A resource file is a file that contains the resources that you use in the Python code. If you use a pre-trained model, you specify the pre-trained model as a resource file in the Python transformation.
In the Python transformation, list the resource file path on the Data Integration Service machine. Separate multiple resource file paths using a comma.
When you access the resource file in the Python code, you reference the list resourceJepFile. To reference the list, you must specify a resource file in the Python transformation. When you reference the list, you specify an index to locate the resource file according to the order that the resource file appears in the Python transformation.
For example, you specify several resource files in the Python transformation. To reference the first resource file in the Python code, you can use the variable resourceJepFile[0].
The following image shows how you can specify a resource file in the Python transformation and access the resource file in the Python code:
The resource files foxgloveDataMLmodel.pkl and irisDataMLmodel.pkl are listed using the resource file paths on the Data Integration Service machine. The Python code accesses the resource files to reconstruct the respective Python objects. To access the first resource file, the Python code references the resource file using resourceJepFile[0]. To access the second resource file irisDataMLmodel.pkl, the Python code references the resource file using resourceJepFile[1].
Python Code
The Python code is the Python script that you write in the Python transformation to define how the transformation processes data. When you write Python code, you might reconstruct input variables, load a pre-trained model, and define output variables.
Use the following rules to write Python code:
- •To access input ports, call the input port name.
- •To set output ports, set the output port to a value. You must set the output port to a value for each output port that you define in the Python transformation.
- •To define how the transformation writes data from the input ports to output ports, set the output port to the value of the input port.
For example, write output_port = input_port to write the data from the input port input_port to the output port output_port.
- •To access the resource file path, use the variable resourceJepFile. Specify the resource file using an index such as resourceJepFile[0].
When you run the Python transformation, the Data Integration Service does not validate the Python code.