PDF Redaction

PDF redaction in Clear Ideas helps administrators remove sensitive information from PDF documents while preserving controlled access to the original for authorized roles.

This feature is designed for situations where some users should continue to work from the original document, while others should only see a redacted representation across:

  • file viewing and downloads
  • extracted text
  • AI summaries
  • AI Chat
  • search results

How PDF Redaction Works

Clear Ideas uses a review, draft, and finalize workflow for PDF redaction:

  1. Open the PDF and choose Redact PDF
  2. Add redaction terms manually, or choose Identify PII Terms with AI to suggest personally identifiable information terms from the extracted document text
  3. Choose whether each term should match:
    • Exact text only
    • Include variants such as punctuation-adjacent tokens, possessive forms, and common structured-token variants inside emails, URLs, and similar identifiers
  4. Analyze and review detected regions in the document preview and region list
  5. Override individual detected instances when a match should remain identified but not be redacted
  6. Draw manual regions for anything that should be removed but was not detected automatically
  7. Choose Save Draft if you want to preserve the setup without generating the redacted representation yet
  8. Click Finalize to generate the redacted representation

After finalization, Clear Ideas generates a separate hidden redacted PDF representation and corresponding redacted extracted text, search data, and AI summary for restricted roles.

Automatic and Manual Redaction

Clear Ideas supports two complementary redaction methods:

Term-Based Redaction

Administrators can enter one or more terms to redact automatically.

This is useful for:

  • names
  • account or reference numbers
  • email addresses
  • URLs
  • case identifiers
  • repeated phrases

Each term can be configured independently as:

  • Exact to match only the entered term
  • Include variants to match supported deterministic variants, including punctuation-adjacent terms, possessive forms, and common formatting, wrapping, email, and URL cases for structured identifiers

AI-Assisted PII Identification

Administrators can also ask Clear Ideas to identify PII terms with AI before review.

That workflow:

  • uses extracted document text as the source
  • suggests terms that can be merged into the editable redaction term list
  • still requires administrator review before finalization
  • works alongside exact matching, include-variants behavior, and manual regions

Manual Redaction Regions

You can also draw redaction boxes directly on the PDF preview for:

  • signatures
  • stamps
  • handwritten notes
  • logos or visual marks
  • content that OCR or text matching should not be trusted to identify automatically

Manual regions are reviewed alongside term-based regions before finalization.

Text PDFs and OCR PDFs

PDF redaction supports both:

  • Textual PDFs with a native text layer
  • Scanned or image-based PDFs that require OCR

During PDF preparation, Clear Ideas extracts document text and layout metadata. That layout data is then reused to identify redaction regions quickly during the review workflow.

For scanned or OCR-heavy PDFs, Clear Ideas uses OCR-backed positioning so matches can still be detected and reviewed on the page.

Destructive Redaction

Clear Ideas redaction is designed to be stronger than drawing a visible black box over content.

When a redaction is finalized, the platform generates a redacted representation intended to remove the underlying content for restricted users, rather than simply obscuring it visually in the interface.

This distinction matters for security and compliance workflows where restricted users, AI features, and search should all operate on the redacted representation rather than on the original text.

Role-Aware Delivery

Redaction in Clear Ideas is role-aware.

By default:

  • authorized administrative roles can continue to access the original document
  • restricted roles receive the redacted representation instead

That role-aware behavior applies not only to the PDF viewer, but also to:

  • file downloads
  • extracted text
  • AI summaries
  • AI Chat retrieval
  • search and retrieval results

This allows administrators to manage redaction rules while ensuring that users who should only see redacted content do not receive the original through AI or search features.

Preview as Role

After redaction is configured, administrators can also use preview-as-role controls from the file actions area to verify how a restricted role will receive the document.

This is useful for validating:

  • watermarking and redaction behavior together
  • role-aware document delivery before external sharing
  • whether the governed viewer experience matches the intended audience

Redaction and Content Indexing

When a redaction is finalized, Clear Ideas regenerates a separate hidden redacted representation and then processes that artifact through the normal extraction pipeline.

That means restricted-role users interact with:

  • redacted extracted text
  • redacted search chunks
  • redacted AI summaries

while authorized users can still work from the original representation.

See also Content Indexing and AI Chat.

Best Practices

  • Review all automatically detected regions before finalizing
  • Review AI-suggested PII terms before finalizing rather than treating them as automatic approval
  • Use manual regions for signatures, stamps, or non-text visual content
  • Prefer Exact matching for narrow structured identifiers when over-redaction would be risky
  • Use Include variants when formatting differences, punctuation, or wrapped identifiers are common
  • Use Save Draft when the redaction setup needs to be reviewed or resumed later before generation
  • Re-finalize the redaction if terms or manual regions change so the redacted representation, extracted text, summary, and AI retrieval data stay aligned

Current Limitations

Today, finalized redacted PDFs are generated for secure delivery and redacted AI/search behavior.

However:

  • finalized redacted PDFs are not yet guaranteed to preserve searchable embedded OCR or text metadata inside the exported PDF itself
  • AI-assisted PII term identification suggests terms for administrator review, but it is not a no-review automatic redaction workflow

Those are planned follow-up areas, but they should not be relied on as current product behavior.