Quickstart

Read and write end-to-end example

Import the INFACore Python SDK

To import the INFACore Python SDK, run the following code:

>>> import informatica.infacore as ic

You can invoke all the INFACore APIs using the alias ic.

Verify the installation

Use the help() function to print the details and verify if the installation is proper. If the installation is proper, the function prints details such as the SDK version, Python version, operating system name, and the INFACore log file path along with the API short descriptions.

Verify the installation by utilizing the help() function to print the necessary details.

>>> ic.help()
...
...
INFACORE SDK Version: 1.0.0
PYTHON Version: 3.7.1
OS Name: Windows
Log File Path: c:\Users\infacore\infacore.log


INFACore available methods:
--------------------------
help(): Lists the available INFACore methods with their short descriptions.
register(): Redirects to the INFACore registration page.
...
...

Create an INFACore account

To create an INFACore account, register using the URL provided by the register() API:

>>> ic.register()
https://now.informatica.com/Build-Robust-Data-Applications-With-INFACore-RegPage.html

After you register and activate your account, you will receive an email with the details required to set up your INFACore environment.

Set up the INFACore environment

To set up your INFACore environment, run the following code:

>>> user_config = {
        "login": {"username": "my-test-user", "password": "my-test-password"},
        "compute_engine": {"type": "local", "install": True}
    }
>>> ic.setup(config=user_config)

List all data sources

To get the list of all the data sources available to read from or write data, run the following code:

>>> ic.list_data_sources()
['Amazon S3', 'Databricks Delta', ...'Snowflake'...]

You need to pass a data source name obtained from the list_data_sources() to create a DataSource object.

Fetch the data source object

To fetch the DataSource object, pass the data source name:

>>> orcl_ds = ic.get_data_source(datasource_name="Oracle")
>>> print(orcl_ds)
DataSource object of Oracle

You can call methods such as config(), connect(), list_connections(), and get_connection() for a DataSource object.

Fetch the data source configurations

To fetch the connection attributes to create a new connection with the data source instance, call the config() method on the DataSource object:

>>> orcl_ds.config()
{
    "oracleSubType": "",
    "username": "",
    "password": "",
    "host": "",
    "port": "",
    "database": "",
    ...
    ...
}

Create a connection to the data source

To create and test the connection with the data source instance, call the connect() method on the DataSource object:

>>> orcl_params = {
        "oracleSubType": "oracleonpremise",
        "username": "<Username of Oracle database>",
        "password": "<Password of Oracle database>",
        "host": "<Hostname of Oracle database>",
        "port": "1521",
        "database": "<Database name>",
        "codePage": "UTF-8",
        "encryptionMethod": "NoEncryption",
        "CryptoProtocolVersion": "TLSv1",
        "ValidateServerCertificate": "False"
    }
>>> orcl_cnx = orcl_ds.connect(connection_name="Oracle Sandbox", connection_parameters=orcl_params)

You can call methods such as test(), config(), list_data_objects(), and get_data_object() on a Connection object.

List all data objects

To fetch a list of data objects for the specified connection instance, run the following code:

>>> orcl_cnx.list_data_objects()

You need to pass one of the data object names obtained from the list_data_objects() method to create a DataObject instance.

Fetch the data object instance

To fetch the DataObject instance, pass the data object name:

>>> orcl_do = orcl_cnx.get_data_object("Customers")

You can call methods such as read(), write(), and schema() on a DataObject instance.

Read data from the data source

To extract data from a data source, call the read() method on the DataObject instance:

>>> table = orcl_do.read().collect()

Note

Call the action method collect() along with read() to extract actual data. If you use only the read() method, it returns the INFACore DataFrame object.

Convert to Pandas DataFrame

To apply the Pandas functions and perform data analysis, convert the Ecosystem DataFrame to the Pandas DataFrame:

>>> df_reader = ic.DataFrameReader(table)
>>> pandas_df = df_reader.to_pandas()

Write data to the data source

Use the write() method to load data to the data source:

>>> orcl_do = ic.get_data_source("Oracle").get_connection("Oracle Prod").get_data_object("Customers")
>>> snow_do = ic.get_data_source("Snowflake").get_connection("Snowflake US").get_data_object("Prod/Sales/Customers")
>>> i_df = orcl_do.read()
>>> stats = snow_do.write(i_df)

Parsing semi-structured and unstructured data

To parse unstructured data in JSON format, you can specify the path to the JSON input file and the sample schema file.

See the sample python code that you can run for the structure parser function where the input file is json_input.json and the sample log file is sample_log.txt:

>>> import informatica.infacore as ic
>>> pf = ic.ParserFunctions()
>>> test_df = pf.parse_unstructured_data("C:/Users/json_input.json", "C:/Users/sample_log.txt")
>>> df_reader = ic.DataFrameReader(test_df)
>>> p_df = df_reader.to_pandas()
>>> p_df.head()

State       Account Length  Area Code       Phone   Int'l Plan      VMail Plan      VMail Message   token   Mins    Calls   Charge  CustServ Calls  Churn
PA  163     806     403-2562        no      yes     300     Day     8.162204        3       7.579174        3       True.
PA  163     806     403-2562        no      yes     300     Eve     3.933035        4       6.508639        3       True.
PA  163     806     403-2562        no      yes     300     Night   4.065759        100     5.111624        3       True.
PA  163     806     403-2562        no      yes     300     Intl    4.92816 6       5.673203        3       True.
SC  15      836     158-8416        yes     no      0       Day     10.018993       4       4.226289        8       False.

Next Steps

To explore the available methods for each class and their parameters, see the API Reference.

You can refer to additional Python examples in the API Reference.