Monitor > Monitoring advanced clusters > Monitoring an advanced cluster instance
  

Monitoring an advanced cluster instance

Monitor an advanced cluster instance to view its activity log, lifecycle graph, advanced configuration, and the jobs that were submitted to the cluster instance.
You can drill down on each cluster instance to view the following tabs:
Activity Log
A list of cluster events that include the time that a cluster is started, scaled up or down, stopped, and the time that it is modified based on updates to the advanced configuration. It also includes events that make the cluster unusable, as well as cluster recovery events.
Lifecycle Graph
A visual representation of the number of worker nodes on the cluster over time.
Configuration
The advanced configuration that a Secure Agent uses to create the advanced cluster. To edit the advanced configuration, you must use Administrator.
Jobs
A list of all jobs that were submitted to the cluster. You can stop and restart jobs and download log files.
When you monitor a cluster instance, the unique cluster ID appears in parentheses in the header of each page. You can use the cluster ID to identify the cluster instance in log files.
Note: The cluster ID in Monitor might not match the cluster ID that appears as the value for the KubernetesCluster tag that is assigned to cloud resources.

Monitoring the activity log

You can monitor the activity log for a cluster instance on the Activity Log tab after you drill down to a cluster instance from the Advanced Clusters page.
Use the activity log to monitor the events on a cluster. The events mark the time that a cluster is started, scaled up or down, stopped, or the time that it is modified based on updates to the advanced configuration. The columns that you can view depend on the cloud platform.
The following image shows the Activity Log tab:
The Activity Log tab lists events for a specific advanced cluster. By default, the page lists the time stamp, activity, node instance types, number of nodes, and number of total nodes for each event. The status message at the top of the page indicates that the information on the page is up-to-date.
  1. 1Cluster name
  2. 2Cluster ID
  3. 3Status message that indicates whether information on the page is up-to-date or needs to be refreshed
  4. 4Refresh icon
  5. 5Download icon
The status message at the top of the page indicates whether the information on the page is up-to-date. If the information is out-of-date, the status message displays "Updates Available." To refresh the page, click the "Updates Available" message or the Refresh icon.
Information can become out-of-date when a new cluster event occurs, such as when you run a job to start the cluster or the cluster is scaled up to increase the number of worker nodes.
To download the activity log, click the Download icon.

Cluster events

When you monitor the activity log for a cluster instance, you view a list of cluster events. Events occur on a cluster at a specific point in time.
The following table describes the events that can occur on a cluster:
Cluster event
Description
Starting
The cluster is starting.
Start
The cluster started.
Stopping
The cluster is stopping.
Stop
The cluster stopped.
Scale Up
The number of worker nodes on the cluster increased.
Scale Up Failed
The cluster failed to scale up.
The cluster might fail to scale up if an initialization script fails on a worker node that is added to the cluster.
Scale Down
The number of worker nodes on the cluster decreased.
Configuration Change
The advanced configuration was changed. The cluster is stopped at the time of a configuration change. The changes in the configuration take effect the next time that the cluster starts.
Unusable
The cluster entered an error status.
Recovery
The cluster was recovered after encountering an error.
If the Secure Agent stops unexpectedly and restarts on a different machine, the agent can recover the cluster only if the version of the Elastic Server on the new agent is the same as the version of the Elastic Server on the previous agent.

Viewing the lifecycle graph

You can view the lifecycle graph for a cluster instance on the Lifecycle Graph tab after you drill down to a cluster instance from the Advanced Clusters page.
The lifecycle graph is a visual representation of the number of worker nodes on the cluster over time. You can change the time range to view more or less granulated details about the changes to the number of worker nodes.
The following image shows the Lifecycle Graph tab:
The Lifecycle Graph tab shows a graph of the number of worker nodes on the cluster over time. The vertical axis shows the number of nodes, and the horizontal axis shows the units of time.
  1. 1Cluster name
  2. 2Cluster ID
  3. 3Time range

Viewing the configuration

You can view the configuration for a cluster instance on the Configuration tab after you drill down to a cluster instance from the Advanced Clusters page. The configuration that you view is the advanced configuration that you use to provision resources for an advanced cluster.
Use the Configuration tab to reference the configuration. The properties that you can view depend on the cloud platform. To edit the configuration, use Administrator.
The following image shows the Configuration tab:
The Configuration tab shows a read-only view of the advanced configuration that is used to create the cluster. You can review the basic, platform, advanced, and runtime properties.
  1. 1Cluster name
  2. 2Cluster ID

Monitoring jobs on a cluster

You can monitor all jobs that were submitted to a cluster on the Jobs tab after you drill down on the cluster instance from the Advanced Clusters page. The Jobs tab lists the jobs that are currently running and the jobs that have completed.
Use the Jobs tab to analyze job failures and debug both jobs and the advanced cluster.
To avoid unnecessary failures, check the status of the advanced cluster before you run a job on the cluster. The cluster should either not exist, be running, or be stopped.
The following image shows the Jobs tab:
The Jobs tab lists the jobs on the cluster. By default, the page lists the instance name, location, start time, end time, and status for each job. The status message at the top of the page indicates that the information on the page is up-to-date.
  1. 1Cluster name
  2. 2Cluster ID
  3. 3Status message that indicates whether information on the page is up-to-date or needs to be refreshed
  4. 4Refresh icon
The Jobs tab lists the jobs that were run within the last three days, plus the 1000 most recent jobs that are more than three days old.
The status message at the top of the page indicates whether the information on the page is up-to-date. If the information is out-of-date, the status message displays "Updates Available." To refresh the page, click the "Updates Available" message or the Refresh icon.
Information can become out-of-date when a job status changes or when a user starts a job.
When a job completes, you can drill down on the job to view the job details. To drill down on a job, click the instance name.
For information about the job details, see Monitoring advanced cluster subtasks.