Batch Reboot Cluster Nodes
| sagemaker_batch_reboot_cluster_nodes | R Documentation |
Reboots specific nodes within a SageMaker HyperPod cluster using a soft recovery mechanism¶
Description¶
Reboots specific nodes within a SageMaker HyperPod cluster using a soft
recovery mechanism. batch_reboot_cluster_nodes performs a graceful
reboot of the specified nodes by calling the Amazon Elastic Compute
Cloud RebootInstances API, which attempts to cleanly shut down the
operating system before restarting the instance.
This operation is useful for recovering from transient issues or applying certain configuration changes that require a restart.
-
Rebooting a node may cause temporary service interruption for workloads running on that node. Ensure your workloads can handle node restarts or use appropriate scheduling to minimize impact.
-
You can reboot up to 25 nodes in a single request.
-
For SageMaker HyperPod clusters using the Slurm workload manager, ensure rebooting nodes will not disrupt critical cluster operations.
Usage¶
sagemaker_batch_reboot_cluster_nodes(ClusterName, NodeIds,
NodeLogicalIds)
Arguments¶
ClusterName |
[required] The name or Amazon Resource Name (ARN) of the SageMaker HyperPod cluster containing the nodes to reboot. |
NodeIds |
A list of EC2 instance IDs to reboot using soft recovery. You can specify between 1 and 25 instance IDs.
|
NodeLogicalIds |
A list of logical node IDs to reboot using soft recovery. You can specify between 1 and 25 logical node IDs. The
|
Value¶
A list with the following syntax:
list(
Successful = list(
"string"
),
Failed = list(
list(
NodeId = "string",
ErrorCode = "InstanceIdNotFound"|"InvalidInstanceStatus"|"InstanceIdInUse"|"InternalServerError",
Message = "string"
)
),
FailedNodeLogicalIds = list(
list(
NodeLogicalId = "string",
ErrorCode = "InstanceIdNotFound"|"InvalidInstanceStatus"|"InstanceIdInUse"|"InternalServerError",
Message = "string"
)
),
SuccessfulNodeLogicalIds = list(
"string"
)
)
Request syntax¶
svc$batch_reboot_cluster_nodes(
ClusterName = "string",
NodeIds = list(
"string"
),
NodeLogicalIds = list(
"string"
)
)