Service to convert document

ced · May 31, 2024, 11:16pm

Rational

It is common to have to convert OpenDocument reports to PDF. For now Tryton is launching a LibreOffice instance on each request.
On concurrent setup, this can lead to launch at the same times as much as LibreOffice as the number of workers. But LibreOffice is quite eager on resources.
Also when scaling Tryton by duplicate instances, this requires to reserve for each instances enough resources to launch LibreOffice even if it is used rarely.

Proposal

I propose to create a service “document-converter” (docker image).
The API is based on HTTP POST accepting a document and format as multipart/form-data and answering with the document converted to the format.
The use of multipart/form-data is because it is designed to efficiently sending large quantities of data like files.
The server will store the files in a temporary directory (and avoid to keep the content in memory) and answer using also a file in the temporary directory.
The server uses unoserver to make the conversion (to avoid the delay of starting LibreOffice at each request about 4 times faster).
The service could be protected by an API key that should be passed as Authorization header. And the size of the content is limited.

With Allow to configure report convert command (#13292) · Issues · Tryton / Tryton · GitLab, it will be possible to use a command like:
curl -s -X POST -F "document=@%(input_path)s" -F "format=%(output_extension)s" -H "Authorization: ${DOCUMENT_CONVERTER_KEY}" -o "%(output_path)s" ${DOCUMENT_CONVERTER_URL}

Implementation

ced · May 31, 2024, 11:24pm

I benchmarked the conversion of OpenDocument Text with 12 pages and a size of 21K using 1 vCPU (4000 bogomips) and 1GiB of memory.
The conversion included the upload and download (through home internet connexion), is done in 400ms (while launching libreoffice on local host is 1s).

Using the same hosting service I could launch 500 requests in parallel on the same service.

ced · June 1, 2024, 5:37pm

I have added tests which have 100% coverage and a README file.

htgoebel · June 2, 2024, 11:02am

I’m curious why such a service is needed and why you are reinventing the wheel.

Back in 2008 we already had support for communicating with OpenOffice using the UNO interface. The only requirement for that was starting OpenOffice with --headless - which was done be the module if OpenOffice was not running. See https://pypi.org/project/openoffice-python/0.1-20110209/. While this package might need some care (and to be ported to Python3), it’s already here and working.

No need for Docker, no need for any additional service, no need for configuration.

ced · June 6, 2024, 7:12am

I’m not. I just use maintained tools and develop the missing requirements which is authentication and no shared filesystem.

Of course you need nothing if you repeat the configuration on each deployment with unoserver. But as I explained this result in a waste of resources (except for small deployment).
I’m talking for people who are using deployment platform.

htgoebel · June 6, 2024, 8:16am

Ah, your solution allows to provide a dedicated convetion server / service. I missed this and this indeed is a plus. Sorry for being harsh.

sergyo · June 6, 2024, 10:21am

FYI the link is broken.

ced · June 6, 2024, 10:22am

No it is just private.