Quickstart
Read and write end-to-end example
Import the INFACore Python SDK
To import the INFACore Python SDK, run the following code:
>>> import informatica.infacore as ic
You can invoke all the INFACore APIs using the alias ic.
Verify the installation
Use the help() function to print the details and verify if the installation is proper. If the installation is proper, the function prints details such as the SDK version, Python version, operating system name, and the INFACore log file path along with the API short descriptions.
Verify the installation by utilizing the help() function to print the necessary details.
>>> ic.help()
...
...
INFACORE SDK Version: 1.0.0
PYTHON Version: 3.7.1
OS Name: Windows
Log File Path: c:\Users\infacore\infacore.log
INFACore available methods:
--------------------------
help(): Lists the available INFACore methods with their short descriptions.
register(): Redirects to the INFACore registration page.
...
...
Create an INFACore account
To create an INFACore account, register using the URL provided by the register() API:
>>> ic.register()
https://now.informatica.com/Build-Robust-Data-Applications-With-INFACore-RegPage.html
After you register and activate your account, you will receive an email with the details required to set up your INFACore environment.
Set up the INFACore environment
To set up your INFACore environment, run the following code:
>>> user_config = {
"login": {"username": "my-test-user", "password": "my-test-password"},
"compute_engine": {"type": "local", "install": True}
}
>>> ic.setup(config=user_config)
List all data sources
To get the list of all the data sources available to read from or write data, run the following code:
>>> ic.list_data_sources()
['Amazon S3', 'Databricks Delta', ...'Snowflake'...]
You need to pass a data source name obtained from the list_data_sources() to create a DataSource object.
Fetch the data source object
To fetch the DataSource object, pass the data source name:
>>> orcl_ds = ic.get_data_source(datasource_name="Oracle")
>>> print(orcl_ds)
DataSource object of Oracle
You can call methods such as config(), connect(), list_connections(), and get_connection() for a DataSource object.
Fetch the data source configurations
To fetch the connection attributes to create a new connection with the data source instance, call the config() method on the DataSource object:
>>> orcl_ds.config()
{
"oracleSubType": "",
"username": "",
"password": "",
"host": "",
"port": "",
"database": "",
...
...
}
Create a connection to the data source
To create and test the connection with the data source instance, call the connect() method on the DataSource object:
>>> orcl_params = {
"oracleSubType": "oracleonpremise",
"username": "<Username of Oracle database>",
"password": "<Password of Oracle database>",
"host": "<Hostname of Oracle database>",
"port": "1521",
"database": "<Database name>",
"codePage": "UTF-8",
"encryptionMethod": "NoEncryption",
"CryptoProtocolVersion": "TLSv1",
"ValidateServerCertificate": "False"
}
>>> orcl_cnx = orcl_ds.connect(connection_name="Oracle Sandbox", connection_parameters=orcl_params)
You can call methods such as test(), config(), list_data_objects(), and get_data_object() on a Connection object.
List all data objects
To fetch a list of data objects for the specified connection instance, run the following code:
>>> orcl_cnx.list_data_objects()
You need to pass one of the data object names obtained from the list_data_objects() method to create a DataObject instance.
Fetch the data object instance
To fetch the DataObject instance, pass the data object name:
>>> orcl_do = orcl_cnx.get_data_object("Customers")
You can call methods such as read(), write(), and schema() on a DataObject instance.
Read data from the data source
To extract data from a data source, call the read() method on the DataObject instance:
>>> table = orcl_do.read().collect()
Note
Call the action method collect() along with read() to extract actual data. If you use only the read() method, it returns the INFACore DataFrame object.
Convert to Pandas DataFrame
To apply the Pandas functions and perform data analysis, convert the Ecosystem DataFrame to the Pandas DataFrame:
>>> df_reader = ic.DataFrameReader(table)
>>> pandas_df = df_reader.to_pandas()
Write data to the data source
Use the write() method to load data to the data source:
>>> orcl_do = ic.get_datasource("Oracle").get_connection("Oracle Prod").get_data_object("Customers")
>>> snow_do = ic.get_datasource("Snowflake").get_connection("Snowflake US").get_data_object("Prod/Sales/Customers")
>>> i_df = orcl_do.read()
>>> stats = snow_do.write(i_df)
Parsing semi-structured and unstructured data
To parse unstructured data in JSON format, you can specify the path to the JSON input file and the sample schema file.
See the sample python code that you can run for the structure parser function where the input file is json_input.json and the sample log file is sample_log.txt:
>>> import informatica.infacore as ic
>>> pf = ic.ParserFunctions()
>>> test_df = pf.parse_unstructured_data("C:/Users/json_input.json", "C:/Users/sample_log.txt")
>>> df_reader = ic.DataFrameReader(test_df)
>>> p_df = df_reader.to_pandas()
>>> p_df.head()
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message token Mins Calls Charge CustServ Calls Churn
PA 163 806 403-2562 no yes 300 Day 8.162204 3 7.579174 3 True.
PA 163 806 403-2562 no yes 300 Eve 3.933035 4 6.508639 3 True.
PA 163 806 403-2562 no yes 300 Night 4.065759 100 5.111624 3 True.
PA 163 806 403-2562 no yes 300 Intl 4.92816 6 5.673203 3 True.
SC 15 836 158-8416 yes no 0 Day 10.018993 4 4.226289 8 False.
Next Steps
To explore the available methods for each class and their parameters, see the API Reference.
You can refer to additional Python examples in the API Reference.