Speed up supply stock wizard

Hi,

I’m trying to improve the performance of supply stock wizard. Currently or database have the following figures:

  • Product Template: 4.500 records
  • Product Product: 60.000 records (there are products that have more than 100 variants other just one)
  • Stock move: 93.000 records.

On our current database the process more than 12 minutes as we currently have the following functions executed:

  1. Shipments internal are created
  2. Purchase requests are created
  3. New shipments are generated as we have order points to generate internal shipments to supply demand for the first internal shipments
  4. Purchase requests are recomputed (but nothing new is created).

After some digging I found most of the time is spent reading the 60.000 products. Which is run every in every step (internal and purchase requests).

For example I just found that changing this code from generate_shipment_internal :

        products = Product.search([
                ('type', 'in', ['goods', 'assets']),
                ], order=[('id', 'ASC')])
        product_ids = [p.id for p in products]

Into:

        products = Product.search([
                ('type', 'in', ['goods', 'assets']),
                ], order=[('id', 'ASC')], query=True)
        cursor = Transaction().connection.cursor()
        cursor.execute(*products)
        product_ids = [p[0] for p in cursor.fetchall()]

To avoid reading all the product fields provided a performance improvement of 2minutes and 24seconds.

This has a very big impact on the database and I’m wondering which may cause the ORM to take so much time to load this data. I executed the query used to read the products and the database and it just takes 50ms:

# EXPLAIN ANALYZE SELECT "a"."id" AS "id", "a"."active" AS "active", "a"."code" AS "code", "a"."color" AS "color", "a"."create_date" AS "create_date", "a"."create_uid" AS "create_uid", "a"."manufacturer_code" AS "manufacturer_code", "a"."odoo_id" AS "odoo_id", "a"."replaced_by" AS "replaced_by", "a"."size" AS "size", "a"."suffix_code" AS "suffix_code", "a"."template" AS "template", "a"."write_date" AS "write_date", "a"."write_uid" AS "write_uid", CAST(EXTRACT('EPOCH' FROM COALESCE("a"."write_date", "a"."create_date")) AS VARCHAR) AS "_timestamp" FROM "product_product" AS "a" LEFT JOIN "product_template" AS "b" ON ("b"."id" = "a"."template") WHERE ((("b"."type" IN ('goods', 'assets')) AND true) AND ((("b"."consumable" = false) OR ("b"."consumable" IS NULL)) AND true) AND (("b"."purchasable" = true) AND true) AND ("a"."active" AND "b"."active")) ORDER BY "a"."id" ASC
;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..8234.56 rows=58149 width=105) (actual time=0.022..48.284 rows=60295 loops=1)
   ->  Index Scan using product_product_pkey on product_product a  (cost=0.29..5191.45 rows=60753 width=73) (actual time=0.008..11.871 rows=60762 loops=1)
         Filter: active
         Rows Removed by Filter: 153
   ->  Memoize  (cost=0.29..0.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=60762)
         Cache Key: a.template
         Cache Mode: logical
         Hits: 56315  Misses: 4447  Evictions: 0  Overflows: 0  Memory Usage: 446kB
         ->  Index Scan using product_template_pkey on product_template b  (cost=0.28..0.31 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=4447)
               Index Cond: (id = a.template)
               Filter: (((NOT consumable) OR (consumable IS NULL)) AND purchasable AND active AND ((type)::text = ANY ('{goods,assets}'::text[])))
               Rows Removed by Filter: 0
 Planning Time: 0.254 ms
 Execution Time: 50.174 ms

Do you have some clue on how to dig and properly understand which is causing the performance degradation?

1 Like

The reason for the performance degradation could be because the search function returns a list of instances loaded using the browse() function, so it executes more querys to load the data for the returning instances.

browse function does not query any data.
Here is just the cost of creating instances which is about 500 instances per second which is about what is a little bit less than I would expect (which is about 1000/s).

Maybe we could factorize ModelStorage.search to have an implementation ModelStorage.search_ids which could be benefit some code like this one but also the clients.

1 Like

Indeed I have better results in terms of instances per seconds with lower records, but worse oneswith more instances. I see a performance regression when the numbers increase.

I’m testing with a simple proteus script:

import datetime
from proteus import config, Model
config.set_trytond()

Party = Model.get('party.party')
Product = Model.get('product.product')
Move = Model.get('stock.move')

for LIMIT in [1000, 2000, 10000, None]:
    print(f"Testing {LIMIT} instances")
    for model_name in ['party.party', 'product.template', 'product.product', 'stock.move']:
        start = datetime.datetime.now()
        records = Model.get(model_name).find([], limit=LIMIT)
        instance_time = datetime.datetime.now() - start
        seconds = instance_time.total_seconds()
        instances_per_second = len(records) / seconds
        print(f"Instancied {len(records)} of {model_name} in {instance_time}: {instances_per_second}")

And here are the results (last number is instances per second):

Testing 1000 instances
Instancied 1000 of party.party in 0:00:00.055094: 18150.796819980398
Instancied 1000 of product.template in 0:00:00.097518: 10254.517114789065
Instancied 1000 of product.product in 0:00:00.104385: 9579.92048665996
Instancied 1000 of stock.move in 0:00:00.057564: 17371.96859148079
Testing 2000 instances
Instancied 2000 of party.party in 0:00:00.168276: 11885.23615964249
Instancied 2000 of product.template in 0:00:00.183870: 10877.250231141566
Instancied 2000 of product.product in 0:00:00.216414: 9241.546295526168
Instancied 2000 of stock.move in 0:00:00.175278: 11410.445121464189
Testing 10000 instances
Instancied 5949 of party.party in 0:00:01.285264: 4628.621045948536
Instancied 4456 of product.template in 0:00:00.758436: 5875.24853778038
Instancied 10000 of product.product in 0:00:03.858734: 2591.523541140695
Instancied 10000 of stock.move in 0:00:03.748329: 2667.8554630610065
Testing None instances
Instancied 5949 of party.party in 0:00:01.407437: 4226.832177923417
Instancied 4456 of product.template in 0:00:00.796783: 5592.488795569182
Instancied 60762 of product.product in 0:02:29.632299: 406.0754289419827
Instancied 95293 of stock.move in 0:07:17.993628: 217.56709209477359

I clearly notice that the instances per second browsed for the same model decrease when the number of records read increase. Also this is not dependent to the model.

I also run the same test with a bigger records cache (20.000 records) and I have similar results:

Testing 1000 instances
Instancied 1000 of party.party in 0:00:00.055560: 17998.560115190787
Instancied 1000 of product.template in 0:00:00.097142: 10294.208478310102
Instancied 1000 of product.product in 0:00:00.105363: 9490.997788597515
Instancied 1000 of stock.move in 0:00:00.056597: 17668.781030796687
Testing 2000 instances
Instancied 2000 of party.party in 0:00:00.162511: 12306.85922799072
Instancied 2000 of product.template in 0:00:00.174217: 11479.935941957443
Instancied 2000 of product.product in 0:00:00.209250: 9557.94504181601
Instancied 2000 of stock.move in 0:00:00.171613: 11654.12876646874
Testing 10000 instances
Instancied 5949 of party.party in 0:00:01.237627: 4806.7794254650225
Instancied 4456 of product.template in 0:00:00.725035: 6145.910197438744
Instancied 10000 of product.product in 0:00:03.592430: 2783.6311354709765
Instancied 10000 of stock.move in 0:00:03.539178: 2825.514851188609
Testing None instances
Instancied 5949 of party.party in 0:00:01.429681: 4161.068098407966
Instancied 4456 of product.template in 0:00:00.801841: 5557.2114671113095
Instancied 60762 of product.product in 0:02:46.212596: 365.56796212965713
Instancied 95293 of stock.move in 0:07:27.790344: 212.80717924547295

Which makes me think this is not a cache issue.

That was my first tought but the instances are needed latter in the purchase_request generation so we need to read them anyway. So this will not benefit the stock supply process. On the other hand this makes me think that probably we may increase the performances of the process by having a cache size big enought to store all the records in memory. So as they are re-read several times in the same transaction that will reduce the time to a single transaction, but not sure if I will hit some memory limit by having a big cache limit for all models.

I don’t know if this will help or hinder the discussion.

Some time ago, in 6.0, I encountered the problem that Model.write() for a single record increased dramatically depending on the number of records instantiated.

I never reported it because I couldn’t figure it out, but I’m mentioning it in case it’s useful or related to this topic.

Here are some numbers I collected during testing at the time:

--- script1 ---
templates = ProductTemplate.search(...)
templates # templates.len -> 70792
template = template[0]

ProductTemplate.write([template], {'foo': 'bar'})  # time = 2.80s


--- script2 ---
templates = ProductTemplate.search(..., limit=1)
template = templates[0]
ProductTemplate.write([template], {'foo': 'bar'})  # time = 0.02s

Yes, there’s some issue with long transactions or many writes.

I’ve stumbled it with large import processes, but never found the time to dig into it.

Indeed when importing the data of this database we have the same performance issue. For example importing the 60.000 product instances takes nearly 45 minutes.

But as we did not import data only on the start of the project it does not hurt so much.

P.D: We use a proteus script that conects directly to the database to import to the data.

We use trytond and the problem is there too. We’ll try to provide a reproducible scenario in the next few weeks.