Start Batch Evaluation

bedrockagentcore_start_batch_evaluation

R Documentation

Starts a batch evaluation job that evaluates agent performance across multiple sessions¶

Description¶

Starts a batch evaluation job that evaluates agent performance across multiple sessions. Batch evaluations pull agent traces from CloudWatch Logs or an existing online evaluation configuration and run specified evaluators and insights against them.

Usage¶

bedrockagentcore_start_batch_evaluation(batchEvaluationName, evaluators,
  dataSourceConfig, clientToken, evaluationMetadata, description)

Arguments¶

batchEvaluationName

[required] The name of the batch evaluation. Must be unique within your account.

evaluators

The list of evaluators to apply during the batch evaluation. Can include both built-in evaluators and custom evaluators. Maximum of 10 evaluators.

dataSourceConfig

[required] The data source configuration that specifies where to pull agent session traces from for evaluation.

clientToken

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

evaluationMetadata

Optional metadata for the evaluation, including session-specific ground truth data and test scenario identifiers.

description

The description of the batch evaluation.

Value¶

A list with the following syntax:

list(
  batchEvaluationId = "string",
  batchEvaluationArn = "string",
  batchEvaluationName = "string",
  evaluators = list(
    list(
      evaluatorId = "string"
    )
  ),
  status = "PENDING"|"IN_PROGRESS"|"COMPLETED"|"COMPLETED_WITH_ERRORS"|"FAILED"|"STOPPING"|"STOPPED"|"DELETING",
  createdAt = as.POSIXct(
    "2015-01-01"
  ),
  outputConfig = list(
    cloudWatchConfig = list(
      logGroupName = "string",
      logStreamName = "string"
    )
  ),
  description = "string"
)

Request syntax¶

svc$start_batch_evaluation(
  batchEvaluationName = "string",
  evaluators = list(
    list(
      evaluatorId = "string"
    )
  ),
  dataSourceConfig = list(
    cloudWatchLogs = list(
      serviceNames = list(
        "string"
      ),
      logGroupNames = list(
        "string"
      ),
      filterConfig = list(
        sessionIds = list(
          "string"
        ),
        timeRange = list(
          startTime = as.POSIXct(
            "2015-01-01"
          ),
          endTime = as.POSIXct(
            "2015-01-01"
          )
        )
      )
    )
  ),
  clientToken = "string",
  evaluationMetadata = list(
    sessionMetadata = list(
      list(
        sessionId = "string",
        testScenarioId = "string",
        groundTruth = list(
          inline = list(
            assertions = list(
              list(
                text = "string"
              )
            ),
            expectedTrajectory = list(
              toolNames = list(
                "string"
              )
            ),
            turns = list(
              list(
                input = list(
                  prompt = "string"
                ),
                expectedResponse = list(
                  text = "string"
                )
              )
            )
          )
        ),
        metadata = list(
          "string"
        )
      )
    )
  ),
  description = "string"
)