Hi all, I am happy to announce the release of Docfd 11.0.0.
What Docfd is
Think interactive grep for text files, PDFs, DOCXs, etc, but word/token based instead of regex and line based, so you can search across lines easily.
Docfd aims to provide good UX via integration with common text editors and PDF viewers, so you can jump directly to a search result with a single key press.
If you have used Recoll or other local document search engines before, then you can roughly think of this as Recoll-lite with TUI.
Interactive use
Non-interactive use
Features
- Multithreaded indexing and searching
- Multiline fuzzy search of multiple files
- Content view pane that shows the snippet surrounding the search result selected
- Text editor and PDF viewer integration
- Editable command history - rewrite/plan your actions in text editor
- Search scope narrowing - limit scope of next search based on current search results
- Clipboard integration
Changes since 3.0.0
The last version announced here was 3.0.0. Docfd has since then undergone many improvements.
Major changes:
- Asynchronous UI
- You can type and interact with UI without any blocking even if search is slow, and active search will be cancelled when appropriate
- Scripting functionality in the form of a commands file
- One-to-one correspondence to most UI interactions, allowing you to interact as normal, and save your interaction into a file to repeat the search steps later via
--commands-from
- One-to-one correspondence to most UI interactions, allowing you to interact as normal, and save your interaction into a file to repeat the search steps later via
- Swapped to using SQLite as index DB format, which lowers the memory footprint significantly
- For the sample of 1.4GB of PDFs used, earlier versions use around 1.9GB of memory to store the index in-memory, while versions since 9.0.0 use only 39MB of memory
- Document indexing was reworked into a multistage pipeline to allow I/O tasks and computational tasks to run concurrently, which makes indexing a few times faster than older versions usually
- Searching was also reworked into a pipeline for better work distribution across domains, improving search speed by 30% in the sample set of PDFs
- Added
--open-with
to allow customising the command used to open a file based on file extension