Skip to content

Start Cluster Health Check

sagemaker_start_cluster_health_check R Documentation

Start deep health checks for a SageMaker HyperPod cluster

Description

Start deep health checks for a SageMaker HyperPod cluster. You can use describe_cluster_node API to track progress of the deep health checks. The unhealthy nodes will be automatically rebooted or replaced. Please see Resilience-related Kubernetes labels by SageMaker HyperPod for details.

Usage

sagemaker_start_cluster_health_check(ClusterName,
  DeepHealthCheckConfigurations)

Arguments

ClusterName

[required] The string name or the Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.

DeepHealthCheckConfigurations

[required] A list of configurations containing instance group names, EC2 instance IDs, and deep health checks to perform.

Value

A list with the following syntax:

list(
  ClusterArn = "string"
)

Request syntax

svc$start_cluster_health_check(
  ClusterName = "string",
  DeepHealthCheckConfigurations = list(
    list(
      InstanceGroupName = "string",
      InstanceIds = list(
        "string"
      ),
      DeepHealthChecks = list(
        "InstanceStress"|"InstanceConnectivity"
      )
    )
  )
)