I have created a tooling with which you can create tutorial videos script-driven - in several languages Jfast- automatically. You can admire the first video in English, German and French at https://dateicloud.de/index.php/s/6fQ9rgcDiSMXoJA . Also the playbook with which it was generated is attached.
The tooling is far beyond a proof-of-concept and in prototype state already. Anyhow, there is still room for improvements.
This is how it works:
- The playbook contains the texts to be spoken as well as the actions to be performed (clicking buttons, entering text, etc…)
- babel is used to extract the spoken strings and for translation.
- text-to-speech is used for generating speech for each text. Translated texts are used for this.
Currently coqui TTS is used since it yields much better then Google TTS according to my tests. Anyhow Conqui’s model for French sound terrible, so we might need to improve here. - A script „record-video.sh” creates a fresh database for each video (subject for optimization), starts trytond, starts the tryton GTK client, runs the playbook.
- The playbook starts (and pauses) the screen-recorder (simplescreenrecorder for now). The playbook decides about whether it starts before or after log-in.
- The playbook „says” the text (synchronous or asynchronous), moves the mouse, clicks and enters texts.
- The Accessibility Interface is used for interacting with the application. Thus it is quite high-level, although due to the technical way this GUI is made, some hurdles had to be taken. And there are still quite some issues to be solved.
As a test I created a French version of the video, which took about 10 minutes to translate the strngs using deepl.com and a minute to synthesize the speech. Thus within about 15 Minutes there was a translated version — with an all-French GUI.