Self Healing AI Agent > Introduction to Self Healing AI Agent recipe
  

Introduction to Self Healing AI Agent recipe

The Self Healing AI Agent autonomously identifies failed and suspended taskflows in data pipelines, diagnoses the underlying issues, and applies corrective actions to resolve them. If the agent can't fix the problem, it creates a ServiceNow incident and sends an email containing the error message and relevant logs to the appropriate teams.
The agent performs the following functions:
Using a variety of integrated tools, the agent executes each necessary step without requiring manual confirmation unless essential input is missing. It starts by authenticating with the GetLoginToken tool, then retrieves failed or suspended taskflows using TaskflowStatusRetriever or gathers detailed error information using ErrorAnalyserFlow when job details are provided.
Before proceeding with remediation, the agent runs the CheckSuccessfulRun tool to verify whether a successful execution has occurred since the failure to avoid redundant actions. It then consults the SOPAnalyst agent to find precise resolution steps and contact information specific to each failed taskflow and strictly follows those procedures.
The ValidationAgent performs required pre-checks, such as file existence verification or database connection tests described in the SOP. After validation succeeds, the agent uses ExtCommunicationAgent to notify stakeholders through email and RemediationAgent to restart taskflows or perform other corrective steps.
When SOP instructions specify escalation without automatic remediation, the agent raises a ServiceNow incident using ServiceNowAgent, providing all necessary details, such as assignment group SYSIDs, error messages, and job names. It retrieves and shares the incident number and link in its final report.
This structured process allows the agent to manage taskflow failures proactively, securely, and in compliance with SOPs. It provides a detailed summary of all actions taken and their outcomes to ensure effective monitoring and resolution.