Counters & performance


(albert) #1

Rational

Tryton clients make an extensive use of the search_count RPC call which in turn executes a “SELECT count(*) FROM table” on the database. Although newer PostgreSQL versions can, in some cases, return that information just by using an index only scan, that is till an expensive operation because it may need to process all the index. In other cases, reading the whole table is still necessary for just returning the number of tuples in it.

When a table grows (+2M sales, +14M stock moves, +20M account move lines) the count operation can easily take between 0.5s and 1.5s on modern hardware.

The operation is used in the total count (up right corner on the Tryton client) but also in the domain counts, although in domains, it only shows +99 when the number of rows is above large.

Proposal

I propose we accept that by default the total count will not be as precise as it currently is.

The same way that in domains +99 is shown when the number is, well, >99, the same could be applied to the total count. For example, it brings not relevant information that there are 78.923 records. At a first glance, simply knowing that the number is >9999 is more than enough. For example, we could show a +9999 but instead of having 9999 hard coded it could be (10*limit)-1 where limit is the total number of records loaded by default (which is 1000 by default in Tryton client).

We could let the user click on the number, if she really needs the exact number.

That in itself is not enough to improve performance, so the idea is to add a “limit” parameter so the client can tell the server to stop counting if the number of records is greater than a given number. The user

Implementation

We could change the “search_count” RPC call to accept a “limit” parameter which would execute the following query:

SELECT count(*) FROM (SELECT 1 FROM table LIMIT 10000)

instead of:

SELECT count(*) FROM table

In case the user is in the second (or third, or whatever) page we should pass 10*limit + offset as limit parameter of the search_count() call.

The same parameter would be used by domain counts, but in that case, the value given to the limit parameter would simply be “100”.

I did some tests at SQL level and using the proposed limit parameter can take <10ms vs >800ms for large tables. Times depending on the size of the table and the filters applied, but it’s easily 2 orders of magnitude faster.

I think it’s worth noting that the proposed query can be somewhat slower for small tables. That seems quite obvious as we’re adding a subquery. For example, for a very small table, such as ir_module, the take goes from 0.6ms to 0.7ms.

So we’d propose applying the subquery behaviour to only the tables configured in a trytond.conf parameter. Something like:

[database]
count_subquery = stock_move, sale_sale, sale_line

Opinions?


(Cédric Krier) #2

I found this post which summary well the counting on PostgreSQL.

I think it will be good to support the limit on counting. Currently the limit is set on the counting which of course has no effect as it will limit the result which is always one row.
I agree that having always the form SELECT COUNT(*) FROM (SELECT 1 FROM … LIMIT …) … is not a good option as it avoid most of the time the usage of index. So it should be done only if we suspect that the result will be above the limit.
I’m not fan of the configuration proposal. For me, the system should guess itself when to perform the optimization or not. I think the server could keep in memory when a search count exceeded the limit so on the next similar query it runs the optimized version. Of course on the first run, it may be slow but it is a kind of warmup. We could store such statistics to restore it on startup but I’m not sure it deserves this complexity.


(albert) #3

To be precise, in the testing I did, the subquery does not prevent the usage of the index. Postgres is smart enough to know that it can use the primary key’s index to obtain “SELECT 1”.

Indeed, it’s not that we suspect that it is above the limit, but simply that it will be slower to use the subquery than the plain count(*).

I did some more testing to ensure such a complexity is worth it and my conclusion is that it really is not.

For example, taking a large table (stock_move with 14M rows), takes the same time using both queries, even when the subquery does not have a limit.

The test I did was executing the standard count() 5 times to warm up. Then executed count() 5 times and computed the average elapsed time. Then I executed using subquery (without limit!) 5 times and computed its average. Here’s the result:

0.4548s for standard count
0.4479s for subquery count (without limit)

That is, subquery took 1.5% less time. For me, it’s not that the subquery is faster but that it takes more or less the same time but there’s always some noise in benchmarks.

I even created a script to check performance with all tables in a database using the same method: warmup + count + subquery (5 times each) and the pattern is always the same: no noticeable difference except from noise.

Here’s the script for anyone wanting to repeat the tests: https://bitbucket.org/snippets/nantic/6e5jjy

So for me, we can always use the subquery, and no configuration nor smart statistics system is required.


(Cédric Krier) #4

For me it is still needed because the limit clause prevent the usage of index. So there are cases where applying the limit will be slower than without and vice-versa.


(albert) #5

The limit does not prevent the usage of the index:

# explain select count(*) from (select 1 from stock_move limit 100) a;
                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=4.57..4.58 rows=1 width=8)
   ->  Limit  (cost=0.43..3.32 rows=100 width=4)
         ->  Index Only Scan using stock_move_product_index on stock_move  
              (cost=0.43..380552.04 rows=13168338 width=4)
(3 rows)

(Cédric Krier) #6

Indeed it is not the usage of index but the usage of parallel sequence scan.


(albert) #7

But parallelism only helps when the number of tuples is very large. And we won’t be using LIMIT with very large numbers. In this case we’d need a LIMIT above 3 millions to make subquery slower than plain count(*):

# select count(*) from stock_move;
  count   
----------
 14227299
(1 row)

Time: 665,105 ms
# explain analyze select count(*) from (select 1 from stock_move limit 1000000) a;
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=41399.42..41399.43 rows=1 width=8) (actual time=204.597..204.597 rows=1 loops=1)
   ->  Limit  (cost=0.43..28899.42 rows=1000000 width=4) (actual time=0.028..166.275 rows=1000000 loops=1)
         ->  Index Only Scan using stock_move_product_index on stock_move  (cost=0.43..380552.04 rows=13168338 width=4) (actual time=0.027..117.547 rows=1000000 loops=1)
               Heap Fetches: 23875
 Planning Time: 0.119 ms
 Execution Time: 204.626 ms
(6 rows)

Time: 205,042 ms
# explain analyze select count(*) from (select 1 from stock_move limit 3000000) a;
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=124197.39..124197.40 rows=1 width=8) (actual time=594.129..594.129 rows=1 loops=1)
   ->  Limit  (cost=0.43..86697.39 rows=3000000 width=4) (actual time=1.829..487.659 rows=3000000 loops=1)
         ->  Index Only Scan using stock_move_product_index on stock_move  (cost=0.43..380552.04 rows=13168338 width=4) (actual time=0.022..340.986 rows=3000000 loops=1)
               Heap Fetches: 74508
 Planning Time: 0.106 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.360 ms, Inlining 0.000 ms, Optimization 0.119 ms, Emission 1.598 ms, Total 2.077 ms
 Execution Time: 594.563 ms
(10 rows)

Time: 594,995 ms