---
title: Workflow Benchmarks
description: >-
  Understand Workflow Benchmarks in Clear Ideas. Evaluate AI Workflow outputs
  for quality, track performance, and optimize workflow effectiveness.
ogTitle: Workflow Benchmarks Guide
ogDescription: >-
  Learn to use Workflow Benchmarks in Clear Ideas. Evaluate AI output quality,
  track performance metrics, and optimize workflow effectiveness
ogImage: /assets/images/og/ai-workflow-benchmarks.webp
navigation:
  icon: fasl fa-bullseye-arrow
---

# Workflow Benchmarks

Benchmarks evaluate the final output of an AI Workflow. Use them to track quality, guide iteration, and estimate cost before scaling to scheduled jobs. Benchmarks provide objective metrics to assess workflow output quality and identify areas for improvement.

## What Are Workflow Benchmarks?

Workflow Benchmarks are evaluation tools that analyze the quality of AI Workflow outputs. They provide:

- **Objective Metrics**: Quantitative scores across multiple quality dimensions
- **Visual Dashboards**: Interactive displays showing performance metrics
- **Actionable Feedback**: Specific recommendations for improvement
- **Quality Tracking**: Ability to track quality improvements over time

## Default Benchmark Dashboard

The default benchmark analyzes output across multiple dimensions and renders an interactive dashboard:

### Overall Score

**Overall Score**: Single 0–100 score with a benchmark range.

**Display**: The overall score appears prominently at the top of the dashboard with:
- **Score Value**: Your workflow's overall quality score (0-100)
- **Benchmark Range**: Typical score range for comparison
- **Color Coding**: Visual indicators (success, info, warning, error) based on performance

**Interpretation**:
- **Above Upper Benchmark**: Excellent performance
- **Within Benchmark Range**: Good performance
- **Below Lower Benchmark**: Needs improvement

### Metric Bars

**Metric Bars**: Horizontal bars show per‑dimension scores against benchmark ranges (shaded bands).

**Display**: Each metric appears as:
- **Metric Name**: Dimension being evaluated (e.g., "Clarity", "Readability")
- **Score**: Your score (0-100) for that dimension
- **Visual Bar**: Horizontal bar showing score position
- **Benchmark Range**: Shaded band showing typical score range
- **Rating Chip**: Color-coded rating (Excellent, Good, Fair, Poor)

**Common Dimensions**:
- **Readability Ease/Grade**: How easy the text is to read
- **Clarity**: How clear and understandable the content is
- **Cohesion**: How well ideas flow together
- **Tone Appropriateness**: Whether the tone matches the intended audience
- **Engagement**: How engaging and interesting the content is
- **Repetition**: Level of unnecessary repetition
- **Lexical Diversity**: Variety of vocabulary used
- **Complexity**: Appropriate complexity level

### Feedback Section

**Feedback**: Actionable notes highlighting strengths and areas to improve.

**Display**: Feedback appears as a list of:
- **Strengths**: What the workflow does well
- **Improvements**: Specific areas to focus on
- **Recommendations**: Suggestions for enhancing output quality

**Icons**: Feedback items include icons indicating:
- **Success**: Strengths and positive aspects
- **Info**: Informational notes
- **Warning**: Areas needing attention

## Running Benchmarks

### From a Workflow

To configure benchmarks for a workflow:

1. Open your **AI Workflow**
2. Navigate to the **Benchmark** section
3. Choose your benchmark option:

   **Use Default Benchmark**: Use the built-in default benchmark that analyzes multiple quality dimensions
   
   **Edit Benchmark**: Create a custom benchmark with your own evaluation prompt

4. Configure benchmark settings:
   - **Run on Completion**: Enable to automatically run benchmarks after workflow execution
   - **Use Default Prompt**: Toggle to use default or custom evaluation prompt

5. Save your workflow

### Default Benchmark

**What It Does**: The default benchmark uses a comprehensive evaluation prompt that analyzes:
- Overall quality score
- Multiple quality dimensions
- Document type inference
- Actionable feedback

**When to Use**: Use the default benchmark for general quality assessment and when you want comprehensive evaluation.

### Custom Benchmark

**What It Does**: Custom benchmarks use your own evaluation prompt to assess specific aspects of workflow output.

**When to Use**: Use custom benchmarks when you need to evaluate:
- Specific quality criteria
- Domain-specific requirements
- Custom evaluation metrics
- Specialized assessment needs

**Creating Custom Benchmarks**:
1. Select **Edit Benchmark**
2. Write your evaluation prompt
3. Use Handlebars variables to reference workflow output
4. Save the benchmark configuration

### Run on Completion

**Run on Completion**: Enable this option to automatically evaluate workflow output after each execution.

**Benefits**:
- Automatic quality tracking
- Immediate feedback on output quality
- Quality monitoring over time
- No manual benchmark execution needed

**Configuration**: Check the "Run on completion" checkbox in the Benchmark section.

### From a Job

To run benchmarks for a specific workflow job:

1. Open the **Workflow Job** you want to evaluate
2. Navigate to the **Benchmark Output** section
3. Review benchmark results if already run
4. Run benchmark if not yet executed

**Use Cases**:
- Evaluate specific job outputs
- Compare quality across different jobs
- Test workflow improvements
- Quality assurance for important outputs

## Interpreting Results

### Score Interpretation

**Scores Near or Above Upper Benchmark**:
**Strong Performance**: Indicates excellent quality that meets or exceeds typical standards.

**Action**: Continue with current workflow configuration. Consider this as a quality baseline.

**Mid-Range Scores**:
**Acceptable Quality**: Indicates good quality with room for improvement.

**Action**: Review feedback for specific improvement areas. Consider refining prompts or variables.

**Scores Below Lower Benchmark**:
**Needs Improvement**: Highlights areas that should be prioritized for enhancement.

**Action**: Focus on low-scoring dimensions. Revise prompts, adjust variables, or modify output templates.

### Metric Analysis

**Individual Metrics**: Review each metric bar to understand:
- Which dimensions perform well
- Which dimensions need improvement
- Relative strengths and weaknesses

**Benchmark Comparison**: Compare your scores to benchmark ranges to understand:
- How your output compares to typical quality
- Whether improvements are needed
- If quality meets expectations

### Feedback Utilization

**Strengths**: Identify what your workflow does well to:
- Maintain successful aspects
- Understand what works
- Build on strengths

**Improvements**: Focus on feedback recommendations to:
- Address specific quality issues
- Enhance weak areas
- Refine workflow configuration

## Using Benchmarks for Iteration

### Iterative Improvement Process

1. **Run Benchmark**: Execute workflow and review benchmark results
2. **Identify Issues**: Review low-scoring metrics and feedback
3. **Refine Workflow**: Update prompts, variables, or output templates
4. **Re-run**: Execute workflow again with improvements
5. **Compare**: Compare new benchmark results to previous results
6. **Iterate**: Repeat until quality meets your standards

### Refining Based on Benchmarks

**Prompt Refinement**: Use feedback to improve workflow prompts:
- Add clarity to instructions
- Specify tone requirements
- Include quality criteria

**Variable Adjustment**: Modify variables based on benchmark insights:
- Adjust content parameters
- Refine context variables
- Optimize input data

**Output Template Updates**: Enhance output templates:
- Improve structure
- Add formatting requirements
- Specify style guidelines

## Best Practices

### Regular Benchmarking

**Consistent Evaluation**: Run benchmarks regularly to track quality over time

**Quality Monitoring**: Monitor benchmark scores to detect quality degradation

**Improvement Tracking**: Track score improvements to measure workflow enhancements

### Benchmark Configuration

**Appropriate Benchmarks**: Choose default or custom benchmarks based on your needs

**Run on Completion**: Enable automatic benchmarking for continuous quality monitoring

**Custom Prompts**: Create custom benchmarks for specialized evaluation requirements

### Result Analysis

**Comprehensive Review**: Review all metrics, not just overall score

**Feedback Focus**: Pay attention to actionable feedback recommendations

**Comparative Analysis**: Compare benchmarks across different workflow versions

### Cost Estimation

**Before Scaling**: Use benchmarks to estimate quality before scaling to scheduled jobs

**Quality Assurance**: Ensure quality meets standards before production use

**Resource Planning**: Use benchmark insights to plan workflow improvements

## Limitations

### Benchmark Scope

**Output Evaluation**: Benchmarks evaluate final workflow output, not intermediate steps

**Quality Dimensions**: Benchmarks focus on content quality, not functional correctness

**Subjective Aspects**: Some quality aspects may be subjective and vary by use case

### Benchmark Accuracy

**Model Dependency**: Benchmark accuracy depends on the AI model used for evaluation

**Prompt Dependency**: Custom benchmark quality depends on evaluation prompt quality

**Context Limitations**: Benchmarks may not capture all contextual requirements

## Related Documentation

- [AI Workflows](/ai/ai-workflows) - Learn about creating and managing workflows
- [Prompt Best Practices](/ai/prompt-best-practices) - Improve workflow prompts
- [AI Credits](/ai/ai-credits) - Understand credit usage for benchmarks

