Create Catalog Resources
Use Informatica Catalog Administrator to create Hive and HDFS resources in Enterprise Data Catalog.
A resource is a repository object that represents a data source, a metadata repository, or an HDFS location in the data lake. Scanners attached to a resource extract metadata from the resource and store the metadata in Enterprise Data Catalog.
You must create an HDFS resource for each HDFS location in the data lake into which Enterprise Data Preparation users import, upload, or publish assets.
For more information about creating resources and scanners, see "Creating a Resource" in the Informatica Catalog Administrator Guide.
- 1. Create a Hive scanner that Enterprise Data Catalog uses to extract metadata from the Hive tables in the data lake. Configure the Hive resource with the following settings:
- - In the URL property on the General > Connection Properties panel, specify the Fully Qualified Domain Name (FQDN) of the Hive server in the JDBC connection URL.
- - If you are using operating system profiles, the Hive user name that you specify as the value for the User property must be a Hive superuser. For more information about operating system profiles, see Using Operating System Profiles.
- - Import the relevant connectors to extract metadata from Hive sources.
For more information about Hive scanner properties, see "Hive Resource Prerequisites and Connection Properties" in the Informatica Administrator Guide.
- 2. Create an HDFS resource for each HDFS location in the data lake.
For more information about HDFS resource properties, see "HDFS Resource Connection Properties" in the Informatica Catalog Administrator Guide.
- 3. Run a scan on the resources to load metadata into the catalog.
- 4. Create schedules for the resources so that Enterprise Data Catalog regularly scans the resources. As a best practice, schedule the resource scans to run during non-business hours.
Tools to complete this step:
- •Informatica Catalog Administrator