Create Ai Recommendation Job

sagemaker_create_ai_recommendation_job

R Documentation

Creates a recommendation job that generates intelligent optimization recommendations for generative AI inference deployments¶

Description¶

Creates a recommendation job that generates intelligent optimization recommendations for generative AI inference deployments. The job analyzes your model, workload configuration, and performance targets to recommend optimal instance types, model optimization techniques (such as quantization and speculative decoding), and deployment configurations.

Usage¶

sagemaker_create_ai_recommendation_job(AIRecommendationJobName,
  ModelSource, OutputConfig, AIWorkloadConfigIdentifier,
  PerformanceTarget, RoleArn, InferenceSpecification, OptimizeModel,
  ComputeSpec, Tags)

Arguments¶

AIRecommendationJobName

[required] The name of the AI recommendation job. The name must be unique within your Amazon Web Services account in the current Amazon Web Services Region.

ModelSource

[required] The source of the model to optimize. Specify the Amazon S3 location of the model artifacts.

OutputConfig

[required] The output configuration for the recommendation job, including the Amazon S3 location for results and an optional model package group where the optimized model is registered.

AIWorkloadConfigIdentifier

[required] The name or Amazon Resource Name (ARN) of the AI workload configuration to use for this recommendation job.

PerformanceTarget

[required] The performance targets for the recommendation job. Specify constraints on metrics such as time to first token (ttft-ms), throughput, or cost.

RoleArn

[required] The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker AI to perform tasks on your behalf.

InferenceSpecification

The inference framework configuration. Specify the framework (such as LMI or vLLM) for the recommendation job.

OptimizeModel

Whether to allow model optimization techniques such as quantization, speculative decoding, and kernel tuning. The default is true.

ComputeSpec

The compute resource specification for the recommendation job. You can specify up to 3 instance types to consider, and optionally provide capacity reservation configuration.

Tags

The metadata that you apply to Amazon Web Services resources to help you categorize and organize them.

Value¶

A list with the following syntax:

list(
  AIRecommendationJobArn = "string"
)

Request syntax¶

svc$create_ai_recommendation_job(
  AIRecommendationJobName = "string",
  ModelSource = list(
    S3 = list(
      S3Uri = "string"
    )
  ),
  OutputConfig = list(
    S3OutputLocation = "string",
    ModelPackageGroupIdentifier = "string"
  ),
  AIWorkloadConfigIdentifier = "string",
  PerformanceTarget = list(
    Constraints = list(
      list(
        Metric = "ttft-ms"|"throughput"|"cost"
      )
    )
  ),
  RoleArn = "string",
  InferenceSpecification = list(
    Framework = "LMI"|"VLLM"
  ),
  OptimizeModel = TRUE|FALSE,
  ComputeSpec = list(
    InstanceTypes = list(
      "ml.g5.xlarge"|"ml.g5.2xlarge"|"ml.g5.4xlarge"|"ml.g5.8xlarge"|"ml.g5.12xlarge"|"ml.g5.16xlarge"|"ml.g5.24xlarge"|"ml.g5.48xlarge"|"ml.g6.xlarge"|"ml.g6.2xlarge"|"ml.g6.4xlarge"|"ml.g6.8xlarge"|"ml.g6.12xlarge"|"ml.g6.16xlarge"|"ml.g6.24xlarge"|"ml.g6.48xlarge"|"ml.g6e.xlarge"|"ml.g6e.2xlarge"|"ml.g6e.4xlarge"|"ml.g6e.8xlarge"|"ml.g6e.12xlarge"|"ml.g6e.16xlarge"|"ml.g6e.24xlarge"|"ml.g6e.48xlarge"|"ml.g7e.2xlarge"|"ml.g7e.4xlarge"|"ml.g7e.8xlarge"|"ml.g7e.12xlarge"|"ml.g7e.24xlarge"|"ml.g7e.48xlarge"|"ml.p3.2xlarge"|"ml.p3.8xlarge"|"ml.p3.16xlarge"|"ml.p4d.24xlarge"|"ml.p4de.24xlarge"|"ml.p5.4xlarge"|"ml.p5.48xlarge"|"ml.p5e.48xlarge"|"ml.p5en.48xlarge"|"ml.p6-b200.48xlarge"
    ),
    CapacityReservationConfig = list(
      CapacityReservationPreference = "capacity-reservations-only",
      MlReservationArns = list(
        "string"
      )
    )
  ),
  Tags = list(
    list(
      Key = "string",
      Value = "string"
    )
  )
)