Introduction to CDGC Catalog Curation Agent recipe
The CDGC Catalog Curation Agent recipe automates the retrieval and enrichment of metadata descriptions from a catalog source in Cloud Data Governance & Catalog (CDGC) by integrating metadata extraction and update tools with a PDF document extraction tool. The agent follows a step-by-step workflow that fetches metadata assets, identifies those with missing descriptions, extracts relevant text from document files, and updates the metadata accordingly to improve catalog quality.
If API connection values haven't been set as defaults, the agent prompts the user to provide them. It then requests a folder path that contains documents for data extraction and validates the provided path to ensure accuracy before processing. When a valid folder path is provided, the agent extracts data from all files within that folder. This validation prevents errors during data loading, ensuring reliable and precise extraction before enabling user queries through the Azure OpenAI model.