Support Full Text Search

ced · January 21, 2021, 6:37pm

Rational

The search for plain text is currently using ilike with % at the beginning and end. This works pretty well when searching and we already know what may be the result (so for back-office users).
But this is not enough natural for casual users or even for external user (like web user). Such audience expect to search like in web search site.
For example it would be useful to implement a search on product for a web-shop.

Proposal

I propose to add a parameter search_full_text (similar to search_unaccented) to Char and Dict field, set to False by default and set to True for Text.
When the search_full_text is True, the ilike operator with a starting and ending % in value (or no wildcard) are converted into a full text search clause, if the back-end supports it like this:

[('description', 'ilike', '%foo bar%')]

(description @@ websearch_to_tsquery('foo bar')).

If the field is translatable then,

(to_tsvector(<language>, description) @@ websearch_to_tsquery(<language>, 'foo bar'))

where the correspondence between the database language and the Tryton code comes from the configuration with a fallback to simple.

A contextual key <model>.<field>.search_full_text permits to disable or force (without transformation) the conversion of the clause if the field has search_full_text set.

Ordering on the Char field with a contextual key <model>.<field>.rank_full_text will use the rank of the full text matching if the back-end supports it otherwise it is NULL.

Reference

https://www.postgresql.org/docs/current/textsearch.html

Implementation

Issue 10023: Add full text search - Tryton issue tracker

jgras · January 21, 2021, 10:20pm

Good feature. But your example to search a product in a webshop I avoided in my implementation. For such a case I think is better to update an external index, since there is no need to stress the ERP-backend with this requests.

pokoli · January 22, 2021, 8:16am

I think it will be good to have the full text implementation. The Global search could also benefit from its presence.

Product names and descriptions are a translatable field but I do not see any mention to translation on your proposal? Does your proposal also want to support searching on translations by default or it will require an additional effort?

albert · January 22, 2021, 8:35am

I’d like to see Tryton supporting Full Text Search, and I also think that using PostgreSQL’s implementation is the way to go.

However, I don’t quite like the proposal of changing:

That will prevent developers and external webservices to use pure “ilike” domains on fields with the “search_full_text” boolean set and to use full text search features on fields that have “search_full_text” set to false.

I’d rather add a new operator:

[(‘description’, ‘@@’, ‘foo bar’)]

That can be used in any VARCHAR field. I would also keep the “search_full_text” attribute on Char/Text fields but that would be sent to the client and sao/tryton would use the ‘@@’ operator for those fields.

I don’t know if with your proposal you tried it to make it compatible with existing search_rec_name implementations, for example, which my proposal may not cover completely. Is that the reason you didn’t propose a specific operator?

ced · January 22, 2021, 9:10am

Translated fields will also apply the operator in the same condition.

I did not propose to add a new operator for multiple reasons:

it is complex:
- client domain parser must be changed
- mostly all code need to be reviewed
it does not make sense when used for Char like code, number, party’s name etc.
it should not be used on field like Selection, Reference etc.
it has no meaning for non Char/Text field

As it will need to have a fallback operator for back-end not supporting it. I find less intrusive to use the fallback operator as syntax to apply it.

nicoe · January 22, 2021, 9:59am

It will need some changes anyway, won’t it?

I mean for supporting the case where a search of ‘milk honey’ is different then ‘“milk honey”’ because if I am not mistaken the first one will search for the best matching text where the order is not relevant while the other will search for exact phrase between the quotes.

ced · January 22, 2021, 10:04am

No.

We do not care. The behavior will change for field with full text search activated.
The client domain parser does not permit already to write all the possible domain. And that’s not a problem.
Moreover what the client does in domain parser for a string without operator, is to create a domain that tries to simulate a poor full text search with the ilike operator. So we will just implement what was the initial intention.

nicoe · January 22, 2021, 10:17am

Once we open the can of the full text search, there will be request for exact-yet-full-text search.

There’s a difference between not being able to write a reference field domain and not being able to write a full text search that is quite common (as in Google).

ced · January 22, 2021, 10:19am

I do not understand.

Yes and that is what this proposal does. It adds support for common web search.

nicoe · January 22, 2021, 10:22am

I mean that once the possibility of full text search is there people will want the same features that they find in popular full text search engine. Because they are used to it.

That’s not what I would define as common. People usually know that quoting will make the search exact for this part of the query.

ced · January 22, 2021, 10:35am

And it will be possible to quote the search. For example: "\"foo bar\"" will result into 'foo' <-> 'bar'.

But the goal is not to change the search entry of the client into a simple “web search”. But to add “web search” syntax to part of it because such search has mainly usefulness on Text. Our search entry has a more powerful syntax.
Indeed the real benefit of the full text search is not direct user of Tryton but external one.

ced · January 22, 2021, 11:57am

This will permit to always have the current behavior for those who wants it.

nicoe · January 22, 2021, 2:57pm

OK with the quotes I agree that it will work, it’s a bit difficult to find but it’s better than straight impossible.

This is nice because I think that it will sometimes be needed.

jgras · January 22, 2021, 3:11pm

If the fulltext search has benefits for the search in tryton I welcome it very much - but I wouldn’t use it for my webshop. I also feel more comfortable if this load is done by a dedicated engine optimized for such operations without stressing my bussiness software.

That is the reason why I use external search indexes. You mentioned product search:
In my case the index has more than one field which are indexed and searched: name, description, category, manufacturer name, use case etc. - and I put a weight of all of them. I am optimzing this right now and still far from a good result

Example: “hammer for beginners”
I don’t want a book for beginners as first result because for the book ‘beginner’ is witten 10 times in the description and for the hammer only twice. So for this case I need to search in more than one field and weight them.

Without wishing to say anything against this initiative, I think that one should not raise false expectations.

albert · January 22, 2021, 3:26pm

So that’s basically two reasons:

complexity
there’s some fields for which it does not apply

Regarding the latter doesn’t look like a very strong argument as this already happens with several existing operators: “like/not like/ilike/not ilike” when applied to non-text fields, “parent_of/child_of” for most fields, “where/not where”.

Regarding the complexity:

When you say that it requires changes to the domain parser, do you mean that it requires changing the parser of the search box in sao and tryton clients? Because what I had in mind it seems it would require minimal changes to the parser (I’m not an expert in that code so I may be completely wrong).
When you say that it requires to review mostly all the code, what kind problems do you envision? Are you thinking of searcher methods? Of places that parse domains? Can’t think of many problems myself.

ced · January 22, 2021, 3:29pm

It is not because we made mistakes in the past that we must repeat them.

Yes and it will be very complicated.

Code that does not expect a new operator.

ced · January 22, 2021, 3:50pm

I do not see what can not be done with the PostgreSQL Full Text Search feature that other indexed does.
About the concern for the load, you could still have a Tryton running on a replicated database on which you perform the full text searches. But I must say that website that will be overloaded by full text search queries on the database, are probably not in the scale of Tryton scope.
For the weighting there are also plenty of options that could be used. But this proposal is to implement the basics ready to use feature. For fine tuning it will be necessary to write some SQL queries but at least functions and tools will be there.

Also this feature has the benefit to add the full text search even on non-indexed data.

jgras · January 22, 2021, 5:52pm

After thinking and reading again about that (and I would like to replace one of the components) I found some articles about that. This one is a good conclusion - Postgresql can be fast and I know that all the stuff like stemming etc is also available: https://rocky.dev/full-text-search

So if I could wish for something without being impertinent:
field.Text('Description', search_full_text=True, column="fulltext", weight=2)

And I remember that I replaced one engine by the other because of speed or maintainability. I experimented with ngram length, weights, stemming etc - user-friendliness in searching is a topic of its own. And also I choose the index because I can use ready-made HTML snippets that I display in real time (what is not a question of the search itself but with redis and redisearch very integrated).

As I said, I would welcome the opportunity to dispense with a search engine as additional component.

ced · January 22, 2021, 10:33pm

Here is the implementation Issue 10023: Add full text search - Tryton issue tracker
Instead of adding a method for ordering, I use a context key to transform the ordering of the field by the full text ranking.

ced · January 23, 2021, 12:06pm

I think I will propose a new type of field to store an parsed document that can be filled using weighting (with fallback to TEXT). This will be an internal only field and that can be searched only using full text.