Content Indexing

The Clear Ideas App features an integrated indexing system for textual content across supported document formats. When enabled in the site settings, content is automatically indexed upon upload, provided it falls within the supported file formats.

Supported File Formats

  • JSON - .json (application/json)
  • Word Document - .docx (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
  • OpenDocument Text - .odt (application/vnd.oasis.opendocument.text)
  • EPUB - .epub (application/epub+zip)
  • PowerPoint Presentation - .pptx (application/vnd.openxmlformats-officedocument.presentationml.presentation)
  • PowerPoint - .ppt (application/vnd.ms-powerpoint)
  • RTF - .rtf (application/rtf)
  • Texinfo - .texinfo (application/x-texinfo)
  • LaTeX - .latex (application/x-latex)
  • PDF - .pdf (application/pdf)
  • reStructuredText - .rst (text/x-rst)
  • Textile - .textile (text/x-textile)
  • MediaWiki - .mediawiki (text/x-mediawiki)
  • DocBook - .docbook (application/docbook+xml)
  • JATS - .jats (application/jats+xml)
  • Org mode - .org (text/x-org)
  • Jupyter Notebook - .ipynb (application/x-ipynb+json)
  • CSV - .csv (text/csv)
  • AsciiDoc - .asciidoc (text/asciidoc)
  • CommonMark - .commonmark (text/commonmark)
  • Creole - .creole (text/x-creole)
  • OPML - .opml (application/x-opml)
  • ICML - .icml (application/vnd.adobe.icml)
  • Wiki - .wiki (text/x-wiki)
  • Jira - .jira (text/x-jira)
  • HTML - .html, .htm (text/html)
  • Markdown - .md, .markdown (text/markdown)
  • Excel - .xls (application/vnd.ms-excel)
  • Excel - .xlsx (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)

Optical Character Recognition (OCR)

While many supported formats are inherently text-based, others, such as PDFs, may require Optical Character Recognition (OCR) for efficient text extraction. Once the text is extracted, it is processed and made accessible for our AI-enhanced search and AI chat functionalities. All extracted text is secured in an encrypted state throughout its lifecycle in the Clear Ideas App, with the sole exception being when the text is utilized to fulfill an AI Chat request.

For further details on the security of extracted text, refer to our Encryption & Privacy guide.

Content Indexing Status

Administrators can monitor the content indexing status of each file through a dedicated icon displayed next to the file. For more information, please visit our Information Icons section.