Better References fields

Rationale

Reference fields are currently hard to deal with when writing SQL queries, since joins require either Concatenations or Substrings, which is neither easy to read nor efficient.

Solution

Add a way to store reference fields in two split columns, one for the model, and one for the id.

Advantages :

  • Easier to join on

  • Reduce index size => Usually, an index on the “id” part will be enough

Implementation

I see two solutions for this :

  • Add a new field type. I do not know if it is possible to this properly since fields are bound to either one or no columns, and this one should be bound to two.

  • Add a way to declare a function field and the two column fields in one go : Example implementation

The only valid rationale for me is the efficiency of join. It is valid because there is no index available on the ID part. But indeed, I’m wondering if adding an index on the substring expression will not be enough.
I also have concern about loosing the one to one relation between a field and a column because there will be risk to have column name collision (via modularity).

Just to clarify, “Easier to join on” was meant mainly in terms of readability.

For efficiency, I usually go with concatenating the target substring and the joined id to then match the full string, which allows to use both the target id index and the reference field index.

Also, the index size actually matters when updating large tables, since indexes on reference fields (20+ char) are significantely bigger than they are on IDs.

For readability, we could create a helper.

This does not scale.

As you will have to index also the model string, it does not matter. More over my suggested index is a index on integers.

What kind of helper are you thinking of ? A custom sql operator ?

I feared it would not, that’s one of the reason for which I would like to improve it !

I am not sure that indexing on the model string will always be necessary. I mean, usually there are only a few effective models, and I think there will almost never be searches only on the model part. So a simple index on the id part should be enough for most cases.

Also, I do not think that the index you proposed would be used for this particular kind of searches (('origin', 'like', 'account.move,%'))

I do not know but we do often the same thing so we can create an helper.

I’m not sure you know how indexes are used by database. Of course it is possible to create an composite index on the reference field that will be used for expression used on Reference.convert_domain.

I played around with the prototype I posted in the opening post, and I noticed that the model part of the reference fields is very expensive (in terms of DB volume). This is logical, but I did not thought it would be that much.

Anyway, is there a chance to support postgresql’s ENUM type someday ? I think it might fit the Selection / Reference model use case at least in some cases.

With Issue 11788: Index ID part of Reference field - Tryton issue tracker this proposal should no more be needed.