Monitor an individual advanced cluster to view CLAIRE insights for the cluster, the infrastructure costs to run the cluster, and a list of cluster instances.
The following image shows the details that you can view for an individual cluster:
You can view the following details:
CLAIRE insights
If the advanced cluster uses a CLAIRE-powered configuration, you can view CLAIRE insights for the cluster. You can also view a summary of the CLAIRE recommendations that are available. To view the recommendations, navigate to the advanced configuration for the cluster in Administrator.
Infrastructure costs over time
If the advanced cluster uses a CLAIRE-powered configuration, you can view a graph of the infrastructure costs over time. For each time interval, you can view the actual infrastructure costs and compare them to your budget.
Cluster instances
You can view a list of cluster instances that are associated with the advanced cluster. Drill down on a cluster instance to view its activity log, lifeycle graph, configuration, as well as the jobs that were submitted to the cluster instance.
If a cluster is running, you can stop the cluster by hovering over it and clicking the Stop icon. To stop a cluster, you need at least the update privilege for advanced configurations.
The status message at the top of the page indicates whether the information on the page is up-to-date. If the information is out-of-date, the status message displays "Updates Available." To refresh the page, you can click the "Updates Available" message or the Refresh icon. You can also can enable Auto Refresh to automatically refresh the information every 20 seconds.
Information can become out-of-date when the state of a cluster changes, such as when you run a job to start a cluster or worker nodes are added to the cluster.
Cluster statuses
For each cluster that you view on the Advanced Clusters page, you can view the status of the cluster.
The following table describes cluster statuses:
Status
Description
Starting
The cluster is starting. A cluster starts as soon as you run a job.
Running
The cluster is running and processing jobs.
Stopping
The cluster is stopping. The jobs that were running on the cluster have completed and the cluster has reached the idle timeout in the advanced configuration, or you recently stopped the cluster in Monitor.
The time that it takes to stop a cluster depends on the cloud platform. If you run a job while the cluster is stopping, the cluster does not start and the job fails.
Stopped
The cluster has stopped.
Error
The cluster has an error. During an error, the Secure Agent attempts to recover the cluster.
User action might be necessary, such as when you receive a fail-to-start or fail-to-stop exception.
Unknown
The status of the cluster is unknown.
If the status is unknown, verify that the Secure Agent is running. If the agent is not running, enable the agent and check whether the cluster starts running.
If the cluster does not start running, an administrator can run the command to list clusters. If the command output returns the cluster state as partial or in-use, the administrator can run the command to delete the cluster.
For more information about the commands, see the Administrator help.
Monitor might not reflect the current cluster status if the following conditions are true:
•The Secure Agent machine is shut down.
•You update the advanced configuration, choosing to disable the advanced cluster when you save the configuration.
There is also a delay between the time that the cluster status changes and the time that the agent receives information about the cluster status. So the agent might submit a job to the cluster while the cluster is stopping or stopped. The job fails, and you must run the job again to restart the cluster.
For example, if the agent is notified that the cluster is running and the cluster reaches its idle timeout immediately afterwards, the agent submits the job to the cluster and the job fails.