Speech Recognition is quickly becoming a very robust technology and I think it would be a great addition to Tryton because it can be useful in several use cases such as:
- Commanding Sao (say “Next Tab”, “New Record”, etc without the need of mouse clicks)
- Filling Text fields (such as notes) to introduce large texts. Very useful not only from a PC but also using a tablet or a Smart Phone
- Giving commands such as “Show me the last 20 invoices” or “Show me a line chart of sales of Product X in the last year”
As an example of how far the technology is going you can take a look at https://cloud.google.com/speech-to-text/ where you’ll see that it is currently already possible to recognize +100 languages, punctuated transcriptions or even speaker diarization.
In order to add Speech Recognition we need two things:
- Using a Speech Recognition engine to convert speech to text
- Add a means to “understand” the text to effective commands
Speech Recognition
The Speech Recognition part is the easiest thanks existing to existing engines. For example, HTML5 already includes an API (which includes Speech Synthesis too).
You can do a quick test using Chrome/Chromium by saving the following JavaScript code in custom.js in sao:
var script = document.createElement('script');
script.onload = function () {
if (annyang) {
var commands = {
'tryton *action': function(action) {
jQuery('#global-search-entry').focus();
jQuery('#global-search-entry').val(action);
}
};
annyang.addCommands(commands);
annyang.setLanguage('en-US');
annyang.start();
}
};
script.src = '//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js';
document.head.appendChild(script);
You just need to say “Tryton Customer Invoices” and it will fill in the global search widget with the text “Customer Invoices”. The popup is not opened, but you get the idea of what is easily achievable.
Interpreting the transcript
As stated above, there are several use cases which would need to be addressed in different ways:
-
Commanding sao should be relatively simple using, for example, Annyang (the library used in the example above). We should simply define the commands required for moving around Sao, make them translatable, and implement their actions. I already gave some examples. “New Record” could trigger the “New Record” action of the currently opened form. “Next Tab” could change to the next tab, etc.
-
Filling text fields could be an extension to the commanding sao. For example, saying "Input " could fill in the text to the given field. So “Input Description The customer does not want us to deliver the goods yet” would save the text “The customer does not want us to deliver the goods yet” to the “Description” field.
-
Other interesting interpretations such as “Show me the last 20 invoices” could be handled on the server. The existing global search widget could be a good place to handle that and we could provide a means for modules to extend it.
For me, that feature should simply allow us to execute existing Tryton actions (window actions would be the most usual one), but we could consider adding a new “Text” action (ir.action.text) which simply returns a text to give the user an answer (Speach Synthesis could be used to give the answer). For example, the user could say “Tryton tell me the current balance of 572 account” and it would simply answer “20.000€”. No need for opening a new tab to get that information.
For handling user supplied sentences, something similar to ALICE could be implemented in trytond. You can find more information on how it works at https://www.pandorabots.com/docs/aiml-fundamentals/.
The idea is that ir module could handle sentences such as "Show me the last " but each module could add its own Categories (in AIML terminology). So the sale module could add the category to interpret ""Show me a line chart of sales of in the last ".