Internal server error on uwsgi with docker image

pokoli · November 14, 2018, 4:08pm

We are using the docker image in order to deploy our tryton servers (5.0 version) but we have some strange issues after some time running. The issue is that a RecursionError: maximum recursion depth exceeded while calling a Python object is raised on the uwsgi server on some random intervals.

The recursion error is a repetition of the following traceback:

 Traceback (most recent call last):                                             
  File "/usr/local/lib/python3.5/dist-packages/trytond/model/model.py", line 393, in __getattr__
    return super(ModelStorage, self).__getattr__(name)                         
  File "/usr/local/lib/python3.5/dist-packages/trytond/model/model.py", line 396, in __getattr__
    return self._values[name]                                                  
TypeError: 'NoneType' object is not subscriptable                              
    % (self.__name__, name, self._values))                                     
AttributeError: 'ir.session' Model has no attribute 'None': None               
                                                                               
During handling of the above exception, another exception occurred:            
                                                                               
Traceback (most recent call last):                                             
                                                                               
During handling of the above exception, another exception occurred:            
                                                                               
  File "/usr/local/lib/python3.5/dist-packages/trytond/model/model.py", line 393, in __getattr__
Traceback (most recent call last):                                             
  File "/usr/local/lib/python3.5/dist-packages/trytond/model/modelstorage.py", line 1227, in __getattr__
    return self._values[name]                                                  
TypeError: 'NoneType' object is not subscriptable

The strange think is that after this exceptions the uwsgi workers are killed with the following message:

Tue Nov 13 16:13:19 2018 - worker 1 (pid: 25) is taking too much time to die...NO MERCY !!!
worker 1 killed successfully (pid: 25)                                         
uWSGI worker 1 cheaped.                                                        
Tue Nov 13 16:13:36 2018 - worker 2 (pid: 33) is taking too much time to die...NO MERCY !!!
worker 2 killed successfully (pid: 33)                                         
uWSGI worker 2 cheaped.                                                        
Tue Nov 13 16:13:39 2018 - worker 3 (pid: 41) is taking too much time to die...NO MERCY !!!
worker 3 killed successfully (pid: 41)                                         
uWSGI worker 3 cheaped.

And then everything works as expected as the workers are restarted. On customer side, this results as a 502 Bad Gateway Error (as we use nginx in front of the uwsgi server) for some time and when the workers are restarted everything is working as expected.

Here is the uwsgi log from the start to the server to when the issue is first shown.

I haven’t been able to found some procedure to reproduce the issue.

Have anyone found some similar behavior? Any idea on how to solve it or how to produce a scenario to reproduce it will be much apreciated.

ced · November 14, 2018, 10:32pm

This is just a guess from the traceback (which seems to be interleaved between multiple request).
It looks like the sessions, returned by the search in Session.check, contain an instance with a negative (or None) id. So when it tries to access create_date still in Session.check, Model.__getattr__ raise AttributeError which is re-raised in ModelStorage.__getattr__.
Normally, there is a not null constraint on id so I guess it is the negative one.
It will be good to add a constraint against non negative id on the table (maybe even in base).

pokoli · November 15, 2018, 8:22am

Thanks for your reply, I’ve checked the ir.session table and all the records have positive ids.

Indeed, I have also logs of this traceback for other models. Some examples are: product.uom, sale.subscription.service, ir.action.keyword, ir.ui.view

ced · November 18, 2018, 6:17pm

I created Check stored ids are positvie (#7867) · Issues · Tryton / Tryton · GitLab

pokoli · November 20, 2018, 11:25am

Altought I agree that this contraing will be a good adition, this is not the cause of our problems.

We’ve got the same error for several models and none of them have any negative id stored on the database.

We disabled threads on the servers 4 days ago and we haven’t experimented the error anymore. So I start to think that the error is related to some issue related to threading issues.

albert · November 20, 2018, 11:52am

Although this problem should probably be investigated further, using threads is a bad idea with WSGI servers for performance anyway. We did many experiments when 4.0 was released and threading kills performance and it does not seem to be related to tryton but probably to GIL.

In this article they’re surprised by bjoern performance compared to all other actors:

https://dzone.com/articles/a-performance-analysis-of-python-wsgi-servers-part

But the key is not that bjoern overperfoms them but that all servers are configured to use whereas bjoern is single-threaded by design. For example, we configured mod_wsgi with a single-thread per process and we got similar results to bjoern.