PostgreSQL has this GUC named log_min_duration_statement which is very useful to find which queries are taking too long to execute in production environments.
I’d like to introduce something similar to tryton where any request that takes more milliseconds than the specified is logged as INFO.
It could look like this:
min_duration_request = 100
which would log all requests taking 100 ms or more.
I think that apart from trytond, both trytond-cron and trytond-worker would also honor that parameter, although probably users would want the value to be higher in trytond-worker.
I do not really see the point.
In PostgreSQL it is useful because it can be an hint to add indexes but for Python code there is no optimization that can be applied.
Moreover most of the http servers already log the duration of the requests. The only drawback is that the content of the log is not very helpful as it contains only the database. From sometimes now I’m wondering if we should make client put the method called in the path.
For cron and worker, I see no problem to have long task running.
Creating indexes is not the only type of decision one can make out of this kind of information.
The point of this is information is having it in hand with the necessary detail to make decisions which can include:
- Tweak configuration settings (ie. record_cache_size)
- Upgrade hardware
- Change the type of connectivity users have (network issues are not rare)
- Tune PostgreSQL settings
- Delegate tasks to cron or workers
- Improve code performance
The right performance logs can help make the diagnostics and decisions faster when problems arise and even be proactive (ie, notifications can be used to act before users start complaining).
I already considered using the proxy’s logs but as you said http servers don’t provide the necessary information and adding the method called is not enough for properly understanding what’s going on. Information such as on what record (or how many records) did the user press the button that made the process take so much time is important. Also having the context is pretty useful.
I think this proposal is just a starting point that could be improved in the future to provide further information such as how much time did the request spend executing SQL queries or how many cache misses there were. Information that only Tryton can gather.
Having very long tasks can always cause problems, specially in cron. Also, the fact that processes can take longer, does not mean that they should. Knowing how much they take is the starting point.
I’ve seen systems with very high load where long running transactions in cron became a big issue and you always have to figure out how much time it took by comparing the timestamp of the starting log with the ending one. Providing a minimal infrastructure to ease diagnostics is invaluable in production environments.
I guess we could convert the debug logging in dispatcher to become info and log at the end of the process with a timing (I guess timing it will not be too expensive).
But it is still useful to also have an logging entry at the start because it give information about the running request when the system is blocked.
I do not think it should be done with logging but instead with Issue 11031: Make use of Server-Timing headers to monitor performances - Tryton issue tracker
We have already the info logging for the starting and ending of cron job. So we could have the timing at the end.
Idem for worker.
But I do not think we can define a maximum duration.
For me it will be good enough if the logging duration was presented using a standard format in the log so it can be parsed easily.
I agree with that. In fact, the current situation where one must use -vvv to print the response is quite annoying because it prints all SQL queries too, which is clearly too much in many cases.
How can that information be analyzed afterwards? I have the impression that the proposal allows to analyze some interesting information for the developer but it is the developer himself that executes those requests.
The benefit of logs is that they record information when the developer is not looking at it.
That would be indeed a very nice improvement.
Maybe my proposal for min_duration_request should indeed rise a warning, so that one can automatically register issues with Sentry.