I would like to do some document management in an OCaml program.
It can be basic, or become progressively complex, mainly because of the structure of the repository.
The requirements are:
to be able to:
- define a document repository (name, size, subjects, mime type, and various key words).
- save a document with required values (document_name_, original_file_name, author, document_responsible,upload_date, source, comment, version, etc.), given or computed.
- delete a document.
- search for a document
1/ with criteria (keyword, size, mime type) that do not depend on the file content
2/ AND with keywords related to the file content : this requirement refers to indexing capabilities (a plugin is needed to read the file and store key/values or whatever more valuable). - edit a document, with its relevant editor, with versioning capability.
- log each event on documents.
- not rely on a document management software but only on required libraries.
- handle performance requirements from the beginning regarding an important number of documents (of possibly large size), that may be edited, searched and served over the internet.
Solutions?
Sure that I can save the documents as files in some directories.
But I would like to avoid using a rigid directory naming/structure because the repository structure will evolve.
A database could also be a solution. Storing (possibly large) binary objects (pdf, etc.) seems possible. However, there are many discussions in various forums where opinions are opposed:
no you should not because it will use to much CPU/RAM,
no you should just store path to files in a dedicated file system,
no you should not because a network file system will be more efficient,
yes you can as I’m doing for years with some M docs,
yes you can and documents can be streamed without using to much RAM,
yes you really should do it because your DB transactional mechanisms will ensure integrity,
etc.
Indexing files content requires plugins for each kind of file. And an indexing store (key/value or better approach).
Based on these requirements and supposed solutions, what would the best way to do that in OCaml?
Has someone successfully used a DB to store files/binary? Especially in Postgresql? (I see bytea an oid that may be used for that).
Can recent packages such as irmin do the job? (I think about its native git-like versioning capabilities).
What are the difficulties I may not have seen?
Thanks.