Intelligent Data Lake Overview
Intelligent Data Lake is part of the Informatica Big Data Management product portfolio which enables organizations to turn big data into business value by exploiting relationships between data, machines, and people.
Data professionals, such as data analysts and data scientists, have previously been reliant on IT to access data that they require for analysis and decision making. In turn, IT has been reliant on manual, labor-intensive approaches to integrate, govern, and secure big data in a fragmented technology ecosystem.
In response, data analysts have turned to self-service tools to perform their analysis. However, self-service tools can inadvertently create security and compliance risks because of data proliferation and lack of governance. The tools can also be hard to use and might not facilitate collaboration.
Intelligent Data Lake balances self-service with governance to achieve sustainable business value for organizations. Intelligent Data Lake uses a data lake to bridge the gap between the business and IT by providing managed self-service for business with IT governance.
A data lake is a centralized repository of large volumes of structured and unstructured data from various enterprise systems within and outside the organization. Data in the lake can include raw and refined data, master data and transactional data, log files, and machine data. In Intelligent Data Lake, the data lake is a Hadoop cluster.
A data lake provides analysts with the ability to find, explore, and access any data centrally and to discover data relationships. The Intelligent Data Lake application has an intuitive spreadsheet-like interface in which data analysts can prepare and blend data. Analysts can also publish the data back to the data lake and share data with other analysts who are performing analytics on similar or related data. The ease of use and collaboration enable analysts to spend more time analyzing data and less time finding and preparing data.
In addition, IT can easily and seamlessly operationalize the work done by data analysts. IT can also have complete visibility into the activities in the lake for governance purposes.
Intelligent Data Lake uses a data catalog to provide users with the ability to find and explore data assets in the organization, in or out of the data lake. The data catalog is powered by the universal metadata services in the Informatica Live Data Map product. As an Intelligent Data Lake administrator, you manage the data catalog and run scans on enterprise systems to populate the data catalog.
Analysts use Intelligent Data Lake to search the catalog for data that reside in and outside the lake. Analysts can discover lineage and relationships between data in different enterprise systems. They can prepare data that is stored in the data lake. Data preparation includes combining, cleansing, transforming, and structuring data so that it is ready for analysis. Analysts publish the prepared data to the data lake for other analysts to reuse. Analysts can also use third-party business intelligence tools or analytical tools such as SAS to further analyze the data.
When analysts publish data, Intelligent Data Lake converts the preparation steps into an Informatica mapping. As an administrator, you can deploy and schedule the mapping to regularly load data with the new structure into the data lake. If necessary, developers can open the mapping in Informatica Developer to customize it with additional business logic or to optimize it for better performance.
IT administrators can use the user activity data generated within Intelligent Data Lake to monitor users, data assets, and project-related activities.