A recurring subject is Tryton tutorial and specifically video. Today, people want to learn by watching video.
The big difficulty is to create those video in a way that we can maintain them when new versions are releases.
we can solve this issue be acting fast, as fast as the development of tryton.
each version release, there should be video tutorial explaining the new features only.
The problem is that potentially on each release something may have change in any tutorial. So the goal is to be able to redo a tutorial with the minimum work.
Would it be possible to use test scenarios and build automatically videos from them? So how-to videos would be linked to the test scenarios that theirselves are updated by the developpers. Once existing scenarios are all turned into video, maybe it will be time to think to add more.
Thats a good idea but I think the videos require a more verbose explanation that the tests scenario.
Also the scenario may test some specific features that may not be relevant to all users.
For me the tests scenarios can be used as a base to create the first videos tutorials but after that they will need to be updated separatly.
Although I agree a video tutorial require a more verbose explanation than a test scenario, they would still be a good starting point. Anyway, please don’t give up
allows scripting the GTK client. It works on a KDE desktop as well. Installation via pip (requires pyatspi distro-package python3-pyatspi, which is not listed as dependency)
Not sure whether it is X11 only since it requires some xinit interface.
One can „click” or „focus’‘ elements based on its „name“, „description“, „role“ (eg. ‚page tab‘) and „id” — although the id is never set for the Tryton client.
There are two APIs: a „focus“-oriented and a „node“ oriented. The former seems more elegant, but misses some actions like double-click.
development seems stalled, anyhow there seem to be quite some users
An issue about publishing v0.9.10 on PyPI is still open while this version is on PyPI since four years already.
Version 0.9.11 was tagged three years ago, but this version is still not in PyPI. (resp.ticket exists)
Releases seem to not yet support Python 3 — which is a deal-breaker
xdotool
written in C
could be used for the Tryton desktop client and the Web client
X11 only. will most probably break on wayland.
emulates key-presses and mouse-moves, window manipulation
For executing an action (e.g. click on a button or into a text-box) one needs to know the screen position of the element in pixels. This is very low-level. We could leverage this by adding some abstraction layer, which knows about the position and sizes of elements. Anyhow, this seems cumbersome.
I did not test it (due to the last point)
testcafe
written in Javascript
could be used for the web client
Nodes are selected using CSS selectors and selcetor functions.
actions are performed on the selected nodes.
Due to the structure of the HTML in the Tryton Web UI (see image), we would need to implement selector functions to select e.g. „the ‚New‘ button in the toolbar of the tab panel ‚Parties‘“
I did not test it since I dislike Javascript, though this might be the way to create videos for the Web UI
text-to-speech options
say.js mentioned in the original post
written in Javascript, to be used with testcafe — at least this is the idea
For Linux: uses the „Festival“ speech engine, does not support exporting the files (thus no caching) Festival seems to support English only — which is deal-breaker IMHO
For Windows: uses the Windows SAPI.SpVoice API
For Mac: Uses the Mac tool say
For Python, there is no package like says.js AFAIK. Anyhow, this can be implemented using a TTS package and some play-back module, eventually caching the synthesized speech. (And indeed I already implemented this)
TTS— offline
Uses language models by Coqui. Examples for English sound very good, same for the few tests I made for German. HUGE, since all offline.
pyttx/pyttx3 — offline
Uses the operating-system’s mechanisms to generate speech (Windows: sapi5, Mac: nsss - NSSpeechSynthesizer, other: espeak — espeak is lousy AFAIK). Allows setting rate, voice, volume.
google-tts — online
Another package for Google TTS. Seems to allow setting rate, voice, output-encoding, etc.
ttw-wrapper — online or offline depending on used engine/service
Wrapper for several TTS engines: AWS Polly, Google, Microsoft, IBM Watson, PicoTTS, SAPI
tts-watson — online
API for accessing the IBM Watson TTS. Requires an API key, so I did not try it. (Registration and service seems to be free of charge)
nemo-tts — NVIDIA Neural Modules, package seems a bit outdated
Unsorted findings:
RHvoice project — seems to focus on eastern European languages
sanskrit-tts — python module
baidu-tts — python module
liepa-ttts — python module, Lithuanian language synthesizer from LIEPA project
PyPI list a lot more modules I did not investigate
I have created a tooling with which you can create tutorial videos script-driven - in several languages Jfast- automatically. You can admire the first video in English, German and French at https://dateicloud.de/index.php/s/6fQ9rgcDiSMXoJA . Also the playbook with which it was generated is attached.
The tooling is far beyond a proof-of-concept and in prototype state already. Anyhow, there is still room for improvements.
This is how it works:
The playbook contains the texts to be spoken as well as the actions to be performed (clicking buttons, entering text, etc…)
babel is used to extract the spoken strings and for translation.
text-to-speech is used for generating speech for each text. Translated texts are used for this.
Currently coqui TTS is used since it yields much better then Google TTS according to my tests. Anyhow Conqui’s model for French sound terrible, so we might need to improve here.
A script „record-video.sh” creates a fresh database for each video (subject for optimization), starts trytond, starts the tryton GTK client, runs the playbook.
The playbook starts (and pauses) the screen-recorder (simplescreenrecorder for now). The playbook decides about whether it starts before or after log-in.
The playbook „says” the text (synchronous or asynchronous), moves the mouse, clicks and enters texts.
The Accessibility Interface is used for interacting with the application. Thus it is quite high-level, although due to the technical way this GUI is made, some hurdles had to be taken. And there are still quite some issues to be solved.
As a test I created a French version of the video, which took about 10 minutes to translate the strngs using deepl.com and a minute to synthesize the speech. Thus within about 15 Minutes there was a translated version — with an all-French GUI.