Workflow Benchmarks

Benchmarks evaluate the final output of an AI Workflow. Use them to track quality, guide iteration, and estimate cost before scaling to scheduled jobs.

Default Benchmark Dashboard

The default benchmark analyzes output across multiple dimensions and renders an interactive dashboard:

Overall Score: Single 0–100 score with a benchmark range.

Metric Bars: Horizontal bars show per‑dimension scores against benchmark ranges (shaded bands).

Feedback: Actionable notes highlighting strengths and areas to improve.

Common dimensions include Readability Ease/Grade, Clarity, Cohesion, Tone Appropriateness, Engagement, Repetition, Lexical Diversity, and Complexity.

Running Benchmarks

  • From a Workflow, open the Benchmark section and choose:
    Use default benchmark
    Edit benchmark (custom prompt)
  • Enable Run on completion to auto‑evaluate after changes.
  • From a Job, use Benchmark Output to run and review results for that execution.

Interpreting Results

  • Scores near or above the upper benchmark suggest strong performance.
  • Mid‑range scores indicate acceptable quality with room to improve.
  • Scores below the lower benchmark highlight areas to prioritize.

Use the feedback list to refine prompts, variables, or output templates, then re‑run to confirm improvements.