---
title: Benchmarks
description: >-
  Evaluate Agent output quality, track improvement, and use benchmark feedback
  to refine repeatable AI processes.
ogTitle: Agent Benchmarks
ogDescription: >-
  Learn how benchmarks evaluate Agent outputs and support quality control for
  repeatable AI processes.
ogImage: /assets/images/og/agents-benchmarks.webp
navigation:
  icon: fasl fa-bullseye-arrow
---

# Benchmarks

Benchmarks evaluate the final output of an Agent run. Use them to track quality, guide iteration, and improve repeatable AI processes before scaling them to scheduled or external-facing work.

## What Benchmarks Provide

Benchmarks can provide:

- objective scores across quality dimensions
- visual dashboards
- feedback about strengths and weaknesses
- recommendations for improvement
- quality tracking across runs

## Default Benchmark

The default benchmark evaluates broad output quality. It can assess dimensions such as readability, clarity, cohesion, tone, engagement, repetition, vocabulary, and complexity.

Use the default benchmark when you want a general quality signal without writing your own evaluation prompt.

## Custom Benchmark

Custom benchmarks use your own evaluation prompt. Use custom benchmarks when outputs need to meet domain-specific criteria, compliance standards, brand requirements, or internal review rubrics.

Custom benchmarks can reference Agent output variables with template syntax.

## Run on Completion

Enable **Run on Completion** when every completed run should be evaluated automatically. This is useful for scheduled Agents, production reporting, and processes where quality drift matters.

## Review Benchmark Results

Benchmark results can include:

- overall score
- per-dimension metric bars
- benchmark ranges
- feedback notes
- recommendations

Use low-scoring dimensions to decide whether to refine prompts, adjust variables, change source context, add validation steps, or use a different model.

## Iterate Safely

A practical improvement loop:

1. Run the Agent with representative inputs.
2. Review output and benchmark feedback.
3. Update prompts, variables, source policy, or output templates.
4. Run again with the same or comparable inputs.
5. Compare results.
6. Repeat until output quality is acceptable.

Keep important comparison runs available when you need a reviewable record of how an Agent changed over time.
