Skip to content

Start Batch Evaluation

bedrockagentcore_start_batch_evaluation R Documentation

Starts a batch evaluation job that evaluates agent performance across multiple sessions

Description

Starts a batch evaluation job that evaluates agent performance across multiple sessions. Batch evaluations pull agent traces from CloudWatch Logs or an existing online evaluation configuration and run specified evaluators and insights against them.

Usage

bedrockagentcore_start_batch_evaluation(batchEvaluationName, evaluators,
  dataSourceConfig, clientToken, evaluationMetadata, description)

Arguments

batchEvaluationName

[required] The name of the batch evaluation. Must be unique within your account.

evaluators

The list of evaluators to apply during the batch evaluation. Can include both built-in evaluators and custom evaluators. Maximum of 10 evaluators.

dataSourceConfig

[required] The data source configuration that specifies where to pull agent session traces from for evaluation.

clientToken

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

evaluationMetadata

Optional metadata for the evaluation, including session-specific ground truth data and test scenario identifiers.

description

The description of the batch evaluation.

Value

A list with the following syntax:

list(
  batchEvaluationId = "string",
  batchEvaluationArn = "string",
  batchEvaluationName = "string",
  evaluators = list(
    list(
      evaluatorId = "string"
    )
  ),
  status = "PENDING"|"IN_PROGRESS"|"COMPLETED"|"COMPLETED_WITH_ERRORS"|"FAILED"|"STOPPING"|"STOPPED"|"DELETING",
  createdAt = as.POSIXct(
    "2015-01-01"
  ),
  outputConfig = list(
    cloudWatchConfig = list(
      logGroupName = "string",
      logStreamName = "string"
    )
  ),
  description = "string"
)

Request syntax

svc$start_batch_evaluation(
  batchEvaluationName = "string",
  evaluators = list(
    list(
      evaluatorId = "string"
    )
  ),
  dataSourceConfig = list(
    cloudWatchLogs = list(
      serviceNames = list(
        "string"
      ),
      logGroupNames = list(
        "string"
      ),
      filterConfig = list(
        sessionIds = list(
          "string"
        ),
        timeRange = list(
          startTime = as.POSIXct(
            "2015-01-01"
          ),
          endTime = as.POSIXct(
            "2015-01-01"
          )
        )
      )
    )
  ),
  clientToken = "string",
  evaluationMetadata = list(
    sessionMetadata = list(
      list(
        sessionId = "string",
        testScenarioId = "string",
        groundTruth = list(
          inline = list(
            assertions = list(
              list(
                text = "string"
              )
            ),
            expectedTrajectory = list(
              toolNames = list(
                "string"
              )
            ),
            turns = list(
              list(
                input = list(
                  prompt = "string"
                ),
                expectedResponse = list(
                  text = "string"
                )
              )
            )
          )
        ),
        metadata = list(
          "string"
        )
      )
    )
  ),
  description = "string"
)