Rational
Some companies spend a lot of time to manage the incoming document (like supplier invoice).
It is pretty common that those documents are received in electronic format (like PDF).
It will be great if we could automate a large part of the management using OCR tools (like typless). So having a place to drop electronic documents in Tryton with the possibilities to add some metadata (like a type (invoice, delivery order etc), a source (like “From” email) and then Tryton tries to create or update documents (like create a supplier invoice or receive a shipment).
Proposal
In a module named document_incoming
:
We add a model document.incoming
with:
name
-
company
: optional -
data
: binary field with filestore id -
type
: selection required in state done -
source
: optional char -
parent
:Many2One
todocument.incoming
-
results
:One2Many
with just aReference
field -
state
: draft, exception, done, cancelled
The workflow execute the process for the specified type and set the state to done.
A wizard eases the split of PDF document per pages into other record linked with the original via the parent
field. The simplest way would be with a simple Char
field which contain a semicolon separated list of page range like 1-2;3;5-7
(to make one record with page 1 and 2, another with page 3 and a last one with page 5 to 7, page 4 is skipped (which means that we create a cancelled record for it)).
It is not allowed to process a record which is used as parent (to avoid process multiple times the same part of a document).
A route is defined to ease automate the creation of incoming document which allow to define optionally some metadata.
Another route is also defined for the same purpose but extracting attachment from email (and keep the email as attachment).
In a module named document_incoming_invoice
:
We extend document.incoming
to add the type supplier_invoice
.
On account.invoice
we add a Many2One
field to document.incoming
.
The processing create a draft supplier invoice for the document using the data extracted (can be from ocr dict or other sources like factur-x) and add the document as attachment (reusing the same filestore id).
The creation should never fail so for that we need to have default fallback values for the required fields like party, account etc.
The module must support to create invoice with and without line details. When there is no line detail, it creates just one line with the total.
For lines we try to find a line with the same description to select the same account but if we have a purchase order number, we search in the purchase line. If nothing is found, we use the default account.
If the total does not match, we add dummy line (or tax lines) to get the same totals.
In a module named document_incoming_ocr
:
We extend document.incoming
to add a Dict
field to store metadata extracted by OCR from the document.
The module can be extended by other modules to plug any kind of OCR service.
In a module named document_incoming_ocr_typless
:
It implements the requirements to access the API of the service.
And once the created documents are no more in draft, it upload to the feed-back loop the metadata updated with the real data.
Implementation
Future
Other OCR services could be implemented.
Support for embedded metadata in PDF like factur-x.