What file types are indexed by GFI Archiver?


GFI Archiver indexes and searches the content of the following attachment types by default:
  • Adobe Acrobat (*.pdf)
  • Ansi Text (*.txt)
  • EML (emails saved by Outlook Express) (*.eml)
  • HTML (*.htm, *.html)
  • MHT archives (HTML archives saved by Internet Explorer) (*.mht)
  • Microsoft Excel (*.xls)
  • Microsoft Excel 2007, 2010, and 2013 (*.xlsx)
  • Microsoft PowerPoint (*.ppt)
  • Microsoft PowerPoint 2007, 2010, and 2013 (*.pptx)
  • Microsoft Rich Text Format (*.rtf)
  • Microsoft Word (*.doc)
  • Microsoft Word 2007, 2010, and 2013 (*.docx)
  • MSG (emails saved by Outlook) (*.msg)
  • OpenOffice versions 1, 2, and 3 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications)
  • WordPerfect (5.0 and later) (*.wpd, *.wpf)
  • XML (*.xml)
  • XML Paper Specification (*.xps)


GFI Archiver also indexes supported files found within the following archive types:
  • RAR (*.rar)
  • ZIP (*.zip)


The indexing engine does support the following additional file types. Note that these are not indexed by default:
  • Adobe Framemaker MIF (*.mif)
  • Ami Pro (*.sam)
  • ASF media files (metadata only) (*.asf)
  • CSV (Comma-separated values) (*.csv)
  • DBF (*.dbf)
  • Enhanced Metafile Format (*.emf)
  • Eudora MBX message files (*.mbx)
  • Flash (*.swf)
  • Ichitaro (versions 5 and later) (*.jtd, *.jbw)
  • JPEG (*.jpg)
  • Lotus 1-2-3 (*.123, *.wk?)
  • MBOX email archives such as Thunderbird, including attachments (*.mbx)
  • Microsoft Document Imaging (*.mdi)
  • Microsoft Searchable Tiff (*.tiff)
  • Microsoft Works (*.wks)
  • MP3 (metadata only) (*.mp3)
  • Multimate Advantage II (*.dox)
  • Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw)
  • QuickTime (*.mov, *.m4a, *.m4v)
  • Visio XML files (*.vdx)
  • Windows Metafile Format (*.wmf)
  • WMA media files (metadata only) (*.wma)
  • WMV video files (metadata only) (*.wmv)
  • WordStar (*.ws)
  • Write (*.wri)
  • XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf)


To include any of the file types from the previous list:
  1. Stop the GFI Archiver Search service
  2. Open the folder ..\GFI\Archiver\Search\Data
  3. Create a subfolder named Backup
  4. Create a copy of the file search.core.xml into the Backup subfolder
  5. Open search.core.xml in notepad.exe
  6. Add one line per extension within the <whitelist> section
    • For example, in order to add *.csv files add the line: <item ext=".csv" />
  7. Save the file
  8. Start the GFI Archiver Search service


Notes:
  • This procedure requires to edit files manually. If edited incorrectly it can leave the server in a non-operational state. Please keep backups of any file which is edited throughout this article before saving any changes to them.
  • When upgrading to a newer version of GFI Archiver, installation files mentioned in this procedure will be overwritten with default versions making the changes void and ineffective. It is therefore suggested to keep a record of this procedure and follow it once again directly after upgrading to keep this functionality intact.
  • GFI Archiver will not perform optical character recognition (OCR) against image data. Only text based content will be indexed. E.g. if a PDF contains an image showing text this text will not be searchable.
  • If additional file types are added to the index engine, attachments which have been indexed before this change was made will not automatically be searchable. If it is needed to search for such attachments within older emails, it is required to rebuild the indexes of the corresponding Archive Stores (via Configuration > Archive Stores).