DocuFilter is a document text extraction SDK solution that has been tested and proven to be reliable and technical.
It supports most document formats such as MS Office, HancomOffice, Open Office, PDF, EML, MSG, compressed (10 types), and even extract images embedded in documents.
Filtering speeds many times faster than existing commercial products
Stable performance backed by years of research and analysis experience
Filter large files over 2GB
Zero memory leak and exception handling for stability
Extract text from various document format types
Extract image data embedded within documents
Detect encrypted document files
Identification of DRM-enabled files (10 types)
Provides filtering of multiple (10 types, including Alz, Egg, etc.) compressed files
Supports Windows, Linux 32Bit/64Bit
Mobile environment (Android, iOS) available
Provides various interfaces such as C/C++, Java, Python, C#, etc.
Provides libraries and executable files suitable for your environment
Supports memory and file interfaces
Anywhere you need a preview of document content, such as internal privacy, search, mail, etc. Here are some examples
Document editors
MS Word (97, 2003, 2007, 2010, 2013, 2016)
OpenOffice Word Document (ODT)
Hancom HWP (2007,2010, 2014), including documentation for distribution
Ichitaro
Spreadsheet
MS Excel (97, 2003, 2007, 2010, 2013, 2016) - supports xlsb, xlsm
OpenOffice Excel Document (ODS)
Hancom CELL (2007,2010, 2014)
Presentation
MS PowerPoint (97, 2003, 2007, 2010, 2013,2016)
OpenOffice PowerPoint Document (ODP)
Hancom SHOW (2007,2010, 2014)
Compression
Zip, Egg, Alz, gzip, Tar, 7z, gz, rar, tbz, jar
Viewers
Portable Document Format(PDF)
Electronic PUBlication Format(EPUB)
Text
Portable Document Format(PDF)
Electronic PUBlication Format(EPUB)
Other
Support for Open Office ODF files
Added filtering capabilities for embedded OLE object documents
Added tag filtering for HTML documents
eml, rtf, msg, mp3, mime, chm
Files whose file format is unknown but whose internal strings can be extracted
Image extraction formats
HWP, DOC, DOCX, XLS, XLSX, PPT, PPTX, PDF
ODT, ODS, ODP, MP3