Taking advantage of LLM

It seems that Tryton could benefit from the recent advancements in Large Language Models like GPT-3/4 or LlaMa.

My proposal is not to bind Tryton to one specific implementation but provide the necessary abstractions to connect to these or other language models or other ML tools.

One idea on how to take advantage of these tools would be the following:

Add a button in the top menu bar of Tryton that allows users to add content. The user may write some text, drag & drop a file, or push a button to record some audio.

In case of audio, a speech recognition tool would be used to generate the text.

Files would be treated as URLs or converted into text (we should see what is the best option).

That content would be sent to a LLM asking it to return a JSON with the necessary actions to be executed by trytond and/or sao.

The inbox shown in sao, should be able to ā€œexplainā€ the user what actions will be executed and if the user agrees, then they would be executed.

For example, the following prompt in ChatGPT-4:

Given the prompt return a JSON with a list of actions with the following structure:

[{
    'action': 'create|search|update',
    'model': 'party.party|account.invoice|sale.sale|purchase.purchase',
    'values': { # Shown only in create|update actions
       {key: value}, 
    }
]

Available fields for party.party are name, code.
Available fields for account.invoice are number, party, invoice_date, lines.
Available fields for sale.sale are reference, party, sale_date, lines.
Available fields for purchase.purchase are reference, party, purchsae_date, lines.

Here's the prompt:

Hi!
the purchase manager of Zara made an order (their reference is 2314) of 25 light bulbs model 42321 that should be delivered ASAP. Remember that they have never purchased yet.

returns:

Here's the JSON with a list of actions to create a new party and a new purchase:

[{
'action': 'create',
'model': 'party.party',
'values': {
'name': 'Zara',
'code': 'ZARA'
}
},
{
'action': 'create',
'model': 'purchase.purchase',
'values': {
'reference': 'Compra_Zara_2314',
'party': 'Zara',
'purchase_date': '2023-03-21',
'lines': [
{'product': 'light bulbs model 42321', 'quantity': 25}
]
}
}]

So given that the LLM does most of the work by itself I think Tryton should provide an integrated interface, the JSON that must be passed to the LLM so it can learn the correct output format and options, as well as a mechanism to process that JSON so new modules can ā€œplug-inā€ new possibilities.

My opinion is that some actions could be executed by the server (one may be interested in supplying that kind of information using webservices) but other actions should be executed by sao itself. For example, consider the case that the text provided by the user contains the necessary information to create an invoice except for the name of the party, or they have to choose between a couple of them before the rest of the invoice can be saved.

Whatā€™s your opinion on the kind of integration Tryton should provide if any?

3 Likes

I donā€™t find your example to be very user friendly in fact :slight_smile:. Which users are able to define the JSON structure of the API calls that would need a LLM to help him generate those?

But I get that while writing a proteus script (or something like that) those kind of knowledge is useful and can speed up development but then the LLM should be included to your editor (there are already some plugins for neovim).

Of course, itā€™s your duty as a programmer to ask yourself questions about the usage of code that comes from this source, itā€™s not without issues regarding the copyright of the code produced and of your own code that has been feeded to the LLM.

Anyway, for users of Tryton LLM could be useful in some cases I suppose. But it would need to have access to the data generated by the users and processes that are stored (and used) by Tryton. From my point of view this is where we can make something: by providing an API to which the LLM can connect and be fed a lot of information. That way after the ā€œAlogirthmā€ has worked its magic the user can ask him questions about its own data (but he can not discover the answer himself because of the volume or of some hidden correlation). And he will be ready to welcome our :robot: overlord :smiley:.

But I donā€™t think many companies are ready yet to send their whole database to another company in order to solve unidentified issues.

I can foreseen a lot of security thread in such usage because itā€™s kind of giving user right access to an external ā€œAIā€ to your company system. And writing access rules against such ā€œAIā€ will be as much complicated as creating the ā€œAIā€ itself.

I see I did a very bad explanation :frowning:

The idea is that the user just introduces this part of the text:

Hi!
the purchase manager of Zara made an order (their reference is 2314) of 25 light bulbs model 42321 that should be delivered ASAP. Remember that they have never purchased yet.

Tryton sends that prompt with inside a larger prompt message that gives the LLM the instructions to follow, just like I did in the first post. The LLM returns the JSON and it is Tryton that interprets the output and carries on with the necessary actions or asks the user for confirmation. For example, with the resulting JSON I put above, Tryton could show a a list of actions with something like this:

A "Party" will be created with the following values:

Name: Zara
Code: ZARA

----

A "Sale" will be created with the following values:

Party: Zara
Reference: Compra_Zara_2314
Purchase Date: 2023-03-21
Lines:
    Product: light bulbs model 42321
    Quantity: 25

Next to each action there would be an OK button that if the user presses, would create the records.

Thereā€™s no need for using external AI. Recent LLaMa model which is said to be similar to GPT-3 (have not tested myself) has been trained in 5 hours with a stock GPU. So itā€™s something that can be run in the same Tryton server provided it has the right hardware.

Thatā€™s why I suggest that Tryton only provides the tools for such integration, then itā€™s up for company to decide if they want to use a LM and which one.

External or not the thread is still there as you can not review the algorithm used. So you can not know exactly what it can do.

OK I get it now :slight_smile:

Itā€™s indeed something nice. Maybe it could be the base for a plugin of some sort?

Indeed as Albert said there are ā€œportableā€ LLM and this threat could be mitigated by putting the LLM in a DMZ (if it can get out of this jail it has earned its freedom :smiley:).

But I donā€™t think the portable LLM have the learning algorithms, just the result of those.

Of course there are always the thread of attacking any piece of the software stack.
But the thread Iā€™m talking about is a new one. You can not analyze the algorithm build from machine learning to ensure that it will always behave as you expect. You can not be sure that with a specific input, it will not generate as answer ā€œDROP DATABASEā€ for example or ā€œcreate a payment to Asimov and validate itā€.

So for me such tool should be used to get input that can be cancelled without consequences or that are validated by human prior of usage.

I think it should be only on client side mainly to limit the risk but also because user should not be allowed to trigger actions that he could not do from the client.

So globally I think the main difficulty is to create a descriptive language to manipulate (script) the client. For me this sounds a lot like Scripting Tryton Tutorial Video
The remaining part if just input/output API.

Thatā€™s an approach I share, I was thinking of something like ā€˜ai_draftā€™ state for each model. Also Iā€™m wondering about required field, which sometimes not all the required information is supplied by the text of by the AI.

I think this can rely on the confirm and warning dialog. So the scripting language should not support to validate them.

Thatā€™s one of the main difficulty I think. How to make such scripting language that can pause on error to let the user fix it. The simple solution is to stop the execution on the first error (and never resume).

For me itā€™s not a good solution, because I think that the interesting part is doing this integration in background mode, like reading external communications from some inbox or customer chat, using as predraft.
With the approach from ai_draft I was thinking of not making required if this is the state. But not convincing me either for example on sale we need party to compute the right taxes.

Another approach I just have, is creating a table ai_json_loads like:

model (party.party, sale.sale ...)
json (ai_generated - something like ai response to @albert example)
name (ai_generated) -> rec_name
used (bool)

So when you enter for example to a sale, with something like the action/attachment button you can select one of the not used json load to try to fill as many gaps as possible.

Another option for another use case is saving ir.note, for example, when importing the @albert sale like I just proposed there should be a note with:

delivered ASAP

But also for those AI parsed messages that are not for creating anything new. So if the AI parses from the customer chat something like ā€œI will come to pick up the shipment A123 on Monday 12amā€, should be generating an ir.note with this info on the shipment.

Sure. Thatā€™s why I proposed that the user could validate and only allow a set of actions that we define.

We could allow full automation on the server side as long as we limit the set of actions or values allowed.

For example, I could allow creating and confirming sales from e-mails as long as the party is X.

I scripting API for sao makes sense to me.

What about using it as a source?

FTR, the scripting API for sao suggested, may be used not only for LLM extraction. OCR can also to parse image2text and send that as suggested input for invoices which the user can validate to include it.