Usage of cheaper option of uwsgi

We do most of our deployment using the Docker images which is based on µwsgi with the cheaper option.
We found that on some busy system for which we already increased the minimum worker, the user complains about having random 502 Bad Gateway errors. We have disabled the cheaper option and it seems this is gone.
I think that µwsgi is killing worker while it is still processing request. Does anyone have the same issue? Should we remove the cheaper option by default? If yes, what would be good default options instead?

Yes, I have encountered what I think may be the same issue, and removing the cheaper option seemed to fix the issue for me too.

When I was searching for a solution I found this: https://github.com/unbit/uwsgi/issues/1702 which also seems to be the same thing, but doesn’t provide a solution.

I think someone has also been hit with the same thing happen on the demo server, which I commented on: Issue 9147: bad gateway on demo - Tryton issue tracker. I’m guessing it was probably during a time when it was busy.

Yes, I think so.

I think it is difficult to come up with a good default option, just as the uwsgi docs say:

There is no magic rule for setting the number of processes or threads to use. It is very much application and system dependent.

Perhaps a relatively small number of processes and threads would be a good default (4 and 2 respectively?), in the hope that the people who have a need for more processes and threads are the ones who are best placed to be able to tune this for their setup.

Indeed we have the similar problems but we solved by disabling the threads usage.
So probably the issues is related to cheaper + threads usage.

For some systems the cheaper option is usefull as it removes the non used workers, which is good for memory usage. On the other hand we’ve found that normally tryton does not use a lot of memory (except when you have some massive tasks) so probably not a big issue.

But on the other hand normally the time to start a new worker for tryton is noticeable (1-2seconds) due to the pool init, so when starting a new worker you notice a slow down on the clients but then when all the workers are started system goes to a normall behaviour.

For now we are still using the cheaper option but with a default value of 2 or 4 (depending on the memory available) and the threads disabled. The main reason to have a higher cheaper value is to avoid starting/stoping so much workers.

Just my 2 cents.

1 Like

We use similar configuration as @pokoli. By default we disable threads in all deployments. Additionally for busy systems we increase cheaper to 4 or more.

1 Like

If you disable thread, this means that each process responses to less requests so when it is killed on average less requests are impacted. But also it makes all your process workers more busy so less targeted by the cheaper killer.

This seems to be the problem. So I think we should disable cheaper option by default. User may still activate it by passing the proper arguments.

I agree but I would use:

processes=1
threads=4

because threads are better for the Cache and for the memory in general.

I filled https://bugs.tryton.org/issue9210

Did you have any ETA before pushing it? I will like to try this settings on some servers before setting this as default value.

No for me, there is no hurry for this issue. Even I would like to have feed-backs.

Ok, I did a quick test one of our deployments and the system seems to be stable (no 502 errors so far).
I will be updating some of our production servers to let users test this configuration. I will let them test the next week, to see if some of them have any issue.

IIUC the cache of a process is reused between all threads. Am I right?

I’m changing one of our deployments which has 4 process and 0 threads into the proposed configuration (1 process, 4threads). Let’s see if it improves.

Yes the cache is mainly a global dictionary which is shared between all threads.

We’ve been using this setup for a week on several servers and everything is working well. We’ve seen no issue on the servers and using one worker is stable.

So for me it makes sense to include this patch as new default values

2 Likes

I’m still experimenting 502 random bad gateway errors although my setup is a bit different:

--processes number_of_cpus --cheaper 0 --threads 4