Scripting Tryton Tutorial Video

Rational

A recurring subject is Tryton tutorial and specifically video. Today, people want to learn by watching video.
The big difficulty is to create those video in a way that we can maintain them when new versions are releases.

Proposal

The idea of Making ERPNext Help Videos with Scripting is very attractive.
So we could reuse the libraries: testcafe, say.js and dogtail.

Implementation

9 Likes

we can solve this issue be acting fast, as fast as the development of tryton.
each version release, there should be video tutorial explaining the new features only.

1 Like

Experience shows that it is not the case.

The problem is that potentially on each release something may have change in any tutorial. So the goal is to be able to redo a tutorial with the minimum work.

2 Likes

The only way out of this problem is designing the tutorial sequence.

Let us make an example, trial is the best way of approval.

1 Like

Another tool:

Would it be possible to use test scenarios and build automatically videos from them? So how-to videos would be linked to the test scenarios that theirselves are updated by the developpers. Once existing scenarios are all turned into video, maybe it will be time to think to add more.

2 Likes

Thats a good idea but I think the videos require a more verbose explanation that the tests scenario.
Also the scenario may test some specific features that may not be relevant to all users.

For me the tests scenarios can be used as a base to create the first videos tutorials but after that they will need to be updated separatly.

1 Like

For example here is the erpnext videotutorial for the equivalent functionality of the sale_advancement’s module scenario.

1 Like

Although I agree a video tutorial require a more verbose explanation than a test scenario, they would still be a good starting point. Anyway, please don’t give up :wink:

Two more possible projects:

I did some recherche and testing on this topic. I also discovered a new project „ldtp“.

  • dogtail

    • written in Python
    • could be used for the Tryton desktop client
    • uses the accessibility interface (atspi, Assistive Technology Service Provider Interface), thus it should work for Wayland, too
    • allows scripting the GTK client. It works on a KDE desktop as well. Installation via pip (requires pyatspi distro-package python3-pyatspi, which is not listed as dependency)
    • Not sure whether it is X11 only since it requires some xinit interface.
    • One can „click” or „focus’‘ elements based on its „name“, „description“, „role“ (eg. ‚page tab‘) and „id” — although the id is never set for the Tryton client.
    • There are two APIs: a „focus“-oriented and a „node“ oriented. The former seems more elegant, but misses some actions like double-click.
    • development seems stalled, anyhow there seem to be quite some users
      • An issue about publishing v0.9.10 on PyPI is still open while this version is on PyPI since four years already.
      • Version 0.9.11 was tagged three years ago, but this version is still not in PyPI. (resp.ticket exists)
    • Some sibling projects use dogtail for behavior driven testing, e.g. dogtail / zenity · GitLab
  • ldtp2

    • written in Python
    • could be used for the Tryton desktop client
    • Claims to be platform independent: Linux, Windows, MacOS
    • I did not test it, since it seems to use a client-server architecture and thus be more complicated to setup than dogtail.
    • API looks much like he one for dogtail.
    • Project status is not clear, see Status of project and repository unclear · Issue #59 · ldtp/ldtp2 · GitHub for details.
      • Releases seem to not yet support Python 3 — which is a deal-breaker
  • xdotool

    • written in C
    • could be used for the Tryton desktop client and the Web client
    • X11 only. will most probably break on wayland.
    • emulates key-presses and mouse-moves, window manipulation
    • For executing an action (e.g. click on a button or into a text-box) one needs to know the screen position of the element in pixels. This is very low-level. We could leverage this by adding some abstraction layer, which knows about the position and sizes of elements. Anyhow, this seems cumbersome.
    • I did not test it (due to the last point)
  • testcafe

    • written in Javascript
    • could be used for the web client
    • Nodes are selected using CSS selectors and selcetor functions.
    • actions are performed on the selected nodes.
    • Due to the structure of the HTML in the Tryton Web UI (see image), we would need to implement selector functions to select e.g. „the ‚New‘ button in the toolbar of the tab panel ‚Parties‘“
      grafik
    • I did not test it since I dislike Javascript, though this might be the way to create videos for the Web UI

text-to-speech options

  • say.js mentioned in the original post
    • written in Javascript, to be used with testcafe — at least this is the idea
    • For Linux: uses the „Festival“ speech engine, does not support exporting the files (thus no caching)
      Festival seems to support English only — which is deal-breaker IMHO
    • For Windows: uses the Windows SAPI.SpVoice API
    • For Mac: Uses the Mac tool say

For Python, there is no package like says.js AFAIK. Anyhow, this can be implemented using a TTS package and some play-back module, eventually caching the synthesized speech. (And indeed I already implemented this)

  • TTS— offline
    Uses language models by Coqui. Examples for English sound very good, same for the few tests I made for German. HUGE, since all offline.
  • pyttx/pyttx3 — offline
    Uses the operating-system’s mechanisms to generate speech (Windows: sapi5, Mac: nsss - NSSpeechSynthesizer, other: espeak — espeak is lousy AFAIK). Allows setting rate, voice, volume.
  • gTTS — online
    Used the Google text-to-speech API. Seems to have only one voice per language. (German voice is a bit slow. Both German and English sound a bit tinny.) I also tested other German voices Unterstützte Stimmen und Sprachen  |  Cloud Text-to-Speech-Dokumentation  |  Google Cloud, all somewhat tinny.
  • google-tts — online
    Another package for Google TTS. Seems to allow setting rate, voice, output-encoding, etc.
  • ttw-wrapper — online or offline depending on used engine/service
    Wrapper for several TTS engines: AWS Polly, Google, Microsoft, IBM Watson, PicoTTS, SAPI
  • tts-watson — online
    API for accessing the IBM Watson TTS. Requires an API key, so I did not try it. (Registration and service seems to be free of charge)
  • nemo-tts — NVIDIA Neural Modules, package seems a bit outdated

Unsorted findings:

  • RHvoice project — seems to focus on eastern European languages
  • sanskrit-tts — python module
  • baidu-tts — python module
  • liepa-ttts — python module, Lithuanian language synthesizer from LIEPA project
  • PyPI list a lot more modules I did not investigate

Links:

And for play-back:

  • playsound — Python
    could be used to play the exported sound files.
2 Likes

I have created a tooling with which you can create tutorial videos script-driven - in several languages Jfast- automatically. You can admire the first video in English, German and French at https://dateicloud.de/index.php/s/6fQ9rgcDiSMXoJA . Also the playbook with which it was generated is attached.

The tooling is far beyond a proof-of-concept and in prototype state already. Anyhow, there is still room for improvements.

This is how it works:

  • The playbook contains the texts to be spoken as well as the actions to be performed (clicking buttons, entering text, etc…)
  • babel is used to extract the spoken strings and for translation.
  • text-to-speech is used for generating speech for each text. Translated texts are used for this.
    Currently coqui TTS is used since it yields much better then Google TTS according to my tests. Anyhow Conqui’s model for French sound terrible, so we might need to improve here.
  • A script „record-video.sh” creates a fresh database for each video (subject for optimization), starts trytond, starts the tryton GTK client, runs the playbook.
  • The playbook starts (and pauses) the screen-recorder (simplescreenrecorder for now). The playbook decides about whether it starts before or after log-in.
  • The playbook „says” the text (synchronous or asynchronous), moves the mouse, clicks and enters texts.
  • The Accessibility Interface is used for interacting with the application. Thus it is quite high-level, although due to the technical way this GUI is made, some hurdles had to be taken. And there are still quite some issues to be solved.

As a test I created a French version of the video, which took about 10 minutes to translate the strngs using deepl.com and a minute to synthesize the speech. Thus within about 15 Minutes there was a translated version — with an all-French GUI.

5 Likes