Allow function fields to avoid getattr in searching

numaelis · October 25, 2022, 3:15am

Hi,
my idea is to allow the values of the function fields to optionally (manually) be loaded without having to go through the getattr, when reading many records.
and it is to be able to create a class method that manually, using cursor , python-sql or another strategy, that already loads the entire list of values for the “read method”
here an example:

     party_name = fields.Function(fields.Char("Supplier"),getter="get_party_name")
    
    @fields.depends('party')
    def get_party_name(self, name=None):
        if self.party:
            return self.party.name[:10] + self.party.code + str(self.party.create_date)
        return ""
    
    @classmethod
    def party_name_read_ids(cls, ids):
        data={}
        table = cls.__table__()
        Party = Pool().get('party.party')
        party = Party.__table__()
        red_sql = reduce_ids(table.id, ids)
        cursor = Transaction().connection.cursor()
        cursor.execute(*party.join(table, condition=table.party==party.id
                ).select(table.id, party.name, party.code, party.create_date,
                where=red_sql))

        result = cursor.fetchall()
        for res in result:
            data[res[0]]=res[1][:10]+res[2]+str(res[3]) if res[1]!=None else None
        return data

I think it would be better if the function field had a parameter, ex:

party_name = fields.Function(fields.Char("Supplier"),getter="get_party_name", read_ids = "party_name_read_ids")

the result of applying this strategy for reading, exponentially increases performance when there are many records.
I just had to prevent “read” from loading that field so that it doesn’t do the getattr to the get_party_name function, and then add the list of values from the read_ids function
using cProfile in a test with 301 record there is a difference of 0.004 seconds vs 0.057 seconds
with 2800 records there is a 0.015 seconds vs 0.77 seconds
it will depend on how that list of values function is built

I think this can be very useful for exporting a lot of data when there are function fields.
and maybe for reports with many instances,
only that it would be necessary to add to the browse or the search some parameter that loads that list in cache.

pokoli · October 25, 2022, 6:17am

This is already supported, function fields can use a classmethod that accepts a list of records and compute the values for all of them in a single call. IN such cas you can use an SQL query to compute them.

You can even pass a list of names and compute multiple fields in a single call to the function.

Here is an example of such feature

ced · October 25, 2022, 7:35am

The main problem with the proposal is that you have now 2 ways to compute the same thing. So you do not have any more the guarantee of having the same result.
As @pokoli said you can achieve the same result by using a class method as getter.

Now I think we should try our best to have the instance method getter as performent as possible. Here I think the gain comes mainly from the join in the SQL query. The instance method getter will have to make a second query to read the party but normally only once per thousand of records (and only if there are as much different parties). This explain the difference of your measurements. The second measure there are 3 more queries.
Now I’m little bit astonished by the difference that the extra query is adding. I’m wondering if you have tested that on the last version (with the record instead of dict, the local cache prefilled, the readonly optimization etc.)

A last point your class method implementation does not scale well, it will fail on large list of ids so it should use a grouped_slice and you can avoid the fetchall call an iterate on the cursor so the result is read progressively reducing the memory consumption.

numaelis · October 25, 2022, 6:52pm

thanks pokoli!
Now I understand better the getter method!
my resulting code is (without the group_slice):

@classmethod
    def get_party_name(cls, invoices, names):
        ids = [inv.id for inv in invoices]
        data={}
        table = cls.__table__()
        Party = Pool().get('party.party')
        party = Party.__table__()
        red_sql = reduce_ids(table.id, ids)
        for name in names:
            _data={}
            cursor = Transaction().connection.cursor()
            cursor.execute(*party.join(table, condition=table.party==party.id
                    ).select(table.id, party.name, party.code, party.create_date,
                    where=red_sql))
            result = cursor.fetchall()
            for res in result:
                _data[res[0]]=res[1][:10]+res[2]+str(res[3]) if res[1]!=None else None
            data[name] = _data
        return data

numaelis · October 25, 2022, 7:28pm

the version i am using 6.4 but many back months.
applying the getter classmethod I have now
301 records: 0.005 seconds vs 0.009 seconds
2800 records 0.015 seconds vs 0.036 seconds
698 record (on an old pc with tryton 6.0) 0.080 sec vs 0.135 sec

the search method used is “search_read” with [“id”,“party_name”]

el cprofile en los 301 records:

4927 function calls (4853 primitive calls) in 0.005 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.003    0.003 modelsql.py:731(read)
        1    0.000    0.000    0.002    0.002 invoice.py:563(party_name_read_ids)
        2    0.000    0.000    0.002    0.001 database.py:65(execute)
        2    0.002    0.001    0.002    0.001 {function LoggingCursor.execute at 0x7f3496449f70}
      302    0.000    0.000    0.001    0.000 __init__.py:22(cursor_dict)
        6    0.000    0.000    0.001    0.000 __init__.py:181(__iter__)
        1    0.000    0.000    0.000    0.000 modelstorage.py:207(read)
        2    0.000    0.000    0.000    0.000 __init__.py:587(__str__)
        2    0.000    0.000    0.000    0.000 misc.py:106(reduce_ids)
        2    0.000    0.000    0.000    0.000 user.py:1029(user_group_all_table)
     23/8    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
      302    0.000    0.000    0.000    0.000 {method 'fetchmany' of 'psycopg2.extensions.cursor' objects}
      301    0.000    0.000    0.000    0.000 __init__.py:29(<dictcomp>)
        1    0.000    0.000    0.000    0.000 model.py:610(check)
        1    0.000    0.000    0.000    0.000 model.py:522(get_access)
        2    0.000    0.000    0.000    0.000 group.py:92(group_parent_all_cte)
        1    0.000    0.000    0.000    0.000 model.py:799(check)
        1    0.000    0.000    0.000    0.000 model.py:731(get_access)
        1    0.000    0.000    0.000    0.000 {method 'fetchall' of 'psycopg2.extensions.cursor' objects}

getter classm:

15302 function calls (15226 primitive calls) in 0.009 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.009    0.009 modelsql.py:731(read)
        1    0.000    0.000    0.005    0.005 function.py:15(wrapper)
        1    0.001    0.001    0.005    0.005 function.py:106(get)
        1    0.000    0.000    0.002    0.002 function.py:139(call)
        1    0.000    0.000    0.002    0.002 invoice.py:556(get_party_name)
        2    0.000    0.000    0.002    0.001 database.py:65(execute)
        2    0.002    0.001    0.002    0.001 {function LoggingCursor.execute at 0x7f587d8a3f70}
        1    0.000    0.000    0.001    0.001 modelstorage.py:666(browse)
        1    0.000    0.000    0.001    0.001 modelstorage.py:675(<listcomp>)
      301    0.001    0.000    0.001    0.000 modelstorage.py:1532(__init__)
      302    0.000    0.000    0.001    0.000 __init__.py:22(cursor_dict)
      402    0.000    0.000    0.001    0.000 cache.py:421(__missing__)
        6    0.000    0.000    0.000    0.000 __init__.py:181(__iter__)
        1    0.000    0.000    0.000    0.000 modelstorage.py:207(read)
      405    0.000    0.000    0.000    0.000 cache.py:417(__setitem__)
      301    0.000    0.000    0.000    0.000 model.py:249(__init__)
        2    0.000    0.000    0.000    0.000 __init__.py:587(__str__)
      302    0.000    0.000    0.000    0.000 {method 'fetchmany' of 'psycopg2.extensions.cursor' objects}
        2    0.000    0.000    0.000    0.000 misc.py:106(reduce_ids)
      302    0.000    0.000    0.000    0.000 {built-in method builtins.any}
      301    0.000    0.000    0.000    0.000 __init__.py:29(<dictcomp>)
        2    0.000    0.000    0.000    0.000 user.py:1029(user_group_all_table)

I don’t know why
I just avoid passing the functional read field,
maybe i’m missing something
any additional check

    party_name = fields.Function(fields.Char("Supplier"), getter="get_party_name")
    
    @classmethod
    def get_party_name(cls, invoices, names):
        ids = [inv.id for inv in invoices]
        data={}
        table = cls.__table__()
        Party = Pool().get('party.party')
        party = Party.__table__()
        red_sql = reduce_ids(table.id, ids)
        for name in names:
            _data={}
            cursor = Transaction().connection.cursor()
            cursor.execute(*party.join(table, condition=table.party==party.id
                    ).select(table.id, party.name, party.code, party.create_date,
                    where=red_sql))
            result = cursor.fetchall()
            for res in result:
                _data[res[0]]=res[1][:10]+res[2]+str(res[3]) if res[1]!=None else None
            data[name] = _data
        return data
        
        
    or
    
    
    @classmethod
    def party_name_read_ids(cls, ids):
        data={}
        table = cls.__table__()
        Party = Pool().get('party.party')
        party = Party.__table__()
        red_sql = reduce_ids(table.id, ids)
        cursor = Transaction().connection.cursor()
        cursor.execute(*party.join(table, condition=table.party==party.id
                ).select(table.id, party.name, party.code, party.create_date,
                where=red_sql))

        result = cursor.fetchall()
        for res in result:
            data[res[0]]=res[1][:10]+res[2]+str(res[3]) if res[1]!=None else None
        return data
    
    @classmethod
    def read(cls, ids, fields_names):
        new_fields_names=[]
        data_party_name={}
        for name in fields_names:
            if "party_name" not in name:
                new_fields_names.append(name)
            else:
                data_party_name = cls.party_name_read_ids(ids)
        result = super(Invoice, cls).read(ids, new_fields_names)
        if data_party_name:
            for row in result:
                row["party_name"]=data_party_name.get(row["id"],None)
        return result

Thanks for the help, now I understand more about the getter and group_slice!

pokoli · October 26, 2022, 6:23am

Just for reference, if oyou are computing a single field you can just use the name argument instead of names. Tryton will detect this and will pass a single string instead of a list.

In such case you just need to return a dict withs record ids as key and the value as value.

This is explained on our documentation about function field getters.

Topic		Replies	Views
SQL getter for Function fields Ideas	3	700	March 22, 2020
Light function field definition Ideas	6	516	June 14, 2021
Using C extensions to speed up Tryton Ideas	26	1429	May 18, 2024
Generic support for cache fields Feature	2	108	January 15, 2025
Allow to read related xxx2Many Feature	3	1091	November 27, 2018

Allow function fields to avoid __getattr__ in searching

Related topics

Allow function fields to avoid getattr in searching