Extraction Workflows

View as Markdown

Extraction Workflows let you run an AI Workflow automatically on file upload, extract structured metadata, and save it to the file for search and review.

Overview

Extraction Workflows are configured centrally at the site level and resolved per upload:

Configure rules in Site Settings > Extraction Workflows
Triggered internally when upload processing completes
Rule resolution order: nearest matching folder rule, then site rule
One effective rule runs per upload
Extracted output is stored on the file metadata attributes

This is useful for document intelligence scenarios like ID/passport extraction, invoice fields, contract key dates, and compliance tags.

Where to Configure

Navigate to Site Settings > Extraction Workflows.

Each rule includes:

Enabled
Rule Name
Scope: Site or Folder
Target Workflow
Last Run Status

Rule Scope and Precedence

Rules support two scopes:

Site scope: applies to uploads anywhere in the site
Folder scope: applies to uploads in a specific folder subtree

When multiple rules could apply:

Nearest matching folder rule wins
Otherwise, site-level rule is used

Only one rule is dispatched per uploaded file.

Workflow Requirements

To use a workflow with Extraction Workflows:

In the workflow editor, enable Upload Triggers
Configure Upload variable mappings (payload key to workflow variable)
Ensure mapped workflow variables use compatible types (typically fileRef or imageRef)

Upload variable mappings define how system upload fields map into your workflow variables.

Common upload payload keys include:

uploadFileRef
uploadFileId
uploadSiteId
uploadFolderId
uploadContentType
uploadFileName

Runtime dispatch happens after upload/text processing completion events are emitted. This ensures extraction workflows can run in standard upload pipelines and in cases where text extraction processing is bypassed.

Metadata Output

Workflow output is persisted on file attributes and can be used in:

File View > Metadata tab
Content info icons (direct jump to Metadata tab)
Search metadata filters (see Advanced Search Settings)

This means Extraction Workflows do more than populate metadata for review. They also create a structured layer you can query later with Metadata Search.

Search Syntax

Use metadata tokens in search:

@expiryDate:2026
@fullName:"Jane Doe"
@tag:invoice
@tags:invoice

You can also query built-in simple fields:

@contentType:image/png
@createdAt:2026
@updatedAt:2026
@name:passport
@size:1024

Metadata-only queries are supported. When a query includes open text plus metadata tokens, search uses the open text for full-text/hybrid retrieval and applies metadata filters to results.

@tag:<value> and @tags:<value> are equivalent for content tags.

How Extraction Workflows Improve Metadata Search

Extraction Workflows make Metadata Search practical at scale by applying consistent structure during upload. Instead of depending only on filenames or free text inside documents, teams can search directly against extracted fields.

Examples:

@vendorName:"Acme Corp"
@invoiceNumber:INV-1042
@expiryDate:2027
@governingLaw:"New York"
@tags:priority

This is especially useful when:

documents use inconsistent wording but contain the same underlying fields
reviewers need to narrow large sites quickly
downstream workflows depend on precise document classification
teams want to combine semantic retrieval with deterministic filters

Example combined query:

msa renewal @counterparty:"Northwind" @expiryDate:2027

In that example, open text helps retrieve relevant content while the extracted metadata filters narrow the result set.

Operational Guidance

Design workflow outputs with stable field names so search syntax stays predictable.
Use quoted values for fields that commonly contain spaces.
Add tags during or after extraction when you want quick categorical filters like @tag:finance or @tags:legal.
Test a few real search queries after rollout to confirm the workflow output matches the search fields users expect.

Best Practices

Start with folder-scoped rules for high-confidence document classes
Keep mappings minimal and explicit
Use quoted values when metadata values contain spaces
Validate outputs by opening the file Metadata tab and testing search filters