Intelligent Data Lake Administrator Guide > Data Preparation Service > Data Preparation Service Overview
  

Data Preparation Service Overview

The Data Preparation Service is an application service that manages data preparation within the Intelligent Data Lake application.
When an analyst prepares data in a project, the Data Preparation Service connects to the Data Preparation repository to store worksheet metadata. The service connects to HiveServer2 in the Hadoop cluster to read sample data or all data from the Hive table, depending on the size of the data. The service connects to the HDFS system in the Hadoop cluster to store the sample data being prepared in the worksheet.
The Data Preparation Service uses a MySQL database for the data preparation repository. You must configure a local storage location for data preparation file storage on the node on which the Data Preparation Service runs. The Data Preparation Service uses the Apache Solr indexing capabilities to provide recommendations of related data assets. This Solr instance does not run on the Hadoop cluster and is managed by the Data Preparation Service.
When you create the Intelligent Data Lake Service, you must associate it with a Data Preparation Service.