Example - Administration Process
You are an administrator in a multinational retail organization. Your organization has configured a data lake on a Hadoop cluster to store operational data, transactional data, log file data, and social media data from stores worldwide. You use Informatica Big Data Management to read data from the disparate enterprise systems and write the data to Hive in the Hadoop data lake.
The data analysts in the Marketing department need to discover the social media data stored in the data lake and prepare it for further analysis. The data analysts in the Sales department need to discover and prepare the transactional data. As the analysts search for data in the lake, they want to view the lineage of the data across the various enterprise systems. They also want to view how the data is related to other assets in the enterprise catalog. After finding the data they are interested in, the analysts want to combine and transform the data on their own, without requiring much involvement from the administrators on your team. They need to prepare the data so that it can be analyzed further using a more advanced third-party business intelligence tool.
As the administrator, you perform the following tasks to enable data analysts to discover data across all enterprise systems and to prepare data stored in the Hadoop data lake for further analysis:
- •Use Live Data Map Administrator to create the following resources for the catalog:
- - Hive resource for the data lake
- - Domain User resource
- - Additional resources for enterprise systems that are outside the data lake
Verify that Live Data Map successfully extracts metadata from these resources so that the metadata exists in the catalog. Create schedules for the resources so that Live Data Map regularly scans the resources.
- •Use the Administrator tool to configure the Intelligent Data Lake Service and the Data Preparation Service.
- •Use the Administrator tool to import LDAP users from an LDAP directory service. Assign the user accounts in the Marketing department to a Marketing group, and the user accounts in the Sales department to a Sales group. Assign each group the privilege to access the Intelligent Data Lake application.
- •Use Hadoop tools to grant the user accounts in the Marketing and Sales groups access to the Hive tables in the data lake.
- •Use the Administrator tool to regularly monitor the Hadoop jobs that run when analysts upload and publish data in the Intelligent Data Lake application.
- •Use Live Data Map Administrator to regularly monitor the metadata extraction from the resources for the Hadoop data lake, the domain users, and the enterprise systems outside the data lake.
- •When an analyst requests that you operationalize an Informatica mapping converted from the preparation recipe during the publication process, use the Developer tool to review and deploy the mapping. Use the Administrator tool to schedule and run the deployed mapping to regularly load data with the new structure into the data lake.