Extraction Workflows

View as Markdown

Extraction Workflows let you run an AI Workflow automatically on file upload, extract structured metadata, and save it to the file for search and review.

Overview

Extraction Workflows are configured centrally at the site level and resolved per upload:

  • Configure rules in Site Settings > Extraction Workflows
  • Triggered internally when upload processing completes
  • Rule resolution order: nearest matching folder rule, then site rule
  • One effective rule runs per upload
  • Extracted output is stored on the file metadata attributes

This is useful for document intelligence scenarios like ID/passport extraction, invoice fields, contract key dates, and compliance tags.

Where to Configure

Navigate to Site Settings > Extraction Workflows.

Each rule includes:

  • Enabled
  • Rule Name
  • Scope: Site or Folder
  • Target Workflow
  • Last Run Status

Rule Scope and Precedence

Rules support two scopes:

  • Site scope: applies to uploads anywhere in the site
  • Folder scope: applies to uploads in a specific folder subtree

When multiple rules could apply:

  1. Nearest matching folder rule wins
  2. Otherwise, site-level rule is used

Only one rule is dispatched per uploaded file.

Workflow Requirements

To use a workflow with Extraction Workflows:

  1. In the workflow editor, enable Upload Triggers
  2. Configure Upload variable mappings (payload key to workflow variable)
  3. Ensure mapped workflow variables use compatible types (typically fileRef or imageRef)

Upload variable mappings define how system upload fields map into your workflow variables.

Common upload payload keys include:

  • uploadFileRef
  • uploadFileId
  • uploadSiteId
  • uploadFolderId
  • uploadContentType
  • uploadFileName

Trigger Behavior

User-facing trigger name: Upload.

Runtime dispatch happens after upload/text processing completion events are emitted. This ensures extraction workflows can run in standard upload pipelines and in cases where text extraction processing is bypassed.

Metadata Output

Workflow output is persisted on file attributes and can be used in:

  • File View > Metadata tab
  • Content info icons (direct jump to Metadata tab)
  • Search metadata filters (see Advanced Search Settings)

This means Extraction Workflows do more than populate metadata for review. They also create a structured layer you can query later with Metadata Search.

Search Syntax

Use metadata tokens in search:

  • @expiryDate:2026
  • @fullName:"Jane Doe"
  • @tag:invoice
  • @tags:invoice

You can also query built-in simple fields:

  • @contentType:image/png
  • @createdAt:2026
  • @updatedAt:2026
  • @name:passport
  • @size:1024

Metadata-only queries are supported. When a query includes open text plus metadata tokens, search uses the open text for full-text/hybrid retrieval and applies metadata filters to results.

@tag:<value> and @tags:<value> are equivalent for content tags.

Extraction Workflows make Metadata Search practical at scale by applying consistent structure during upload. Instead of depending only on filenames or free text inside documents, teams can search directly against extracted fields.

Examples:

  • @vendorName:"Acme Corp"
  • @invoiceNumber:INV-1042
  • @expiryDate:2027
  • @governingLaw:"New York"
  • @tags:priority

This is especially useful when:

  • documents use inconsistent wording but contain the same underlying fields
  • reviewers need to narrow large sites quickly
  • downstream workflows depend on precise document classification
  • teams want to combine semantic retrieval with deterministic filters

Example combined query:

  • msa renewal @counterparty:"Northwind" @expiryDate:2027

In that example, open text helps retrieve relevant content while the extracted metadata filters narrow the result set.

Operational Guidance

  • Design workflow outputs with stable field names so search syntax stays predictable.
  • Use quoted values for fields that commonly contain spaces.
  • Add tags during or after extraction when you want quick categorical filters like @tag:finance or @tags:legal.
  • Test a few real search queries after rollout to confirm the workflow output matches the search fields users expect.

Best Practices

  • Start with folder-scoped rules for high-confidence document classes
  • Keep mappings minimal and explicit
  • Use quoted values when metadata values contain spaces
  • Validate outputs by opening the file Metadata tab and testing search filters