The search for plain text is currently using ilike with % at the beginning and end. This works pretty well when searching and we already know what may be the result (so for back-office users).
But this is not enough natural for casual users or even for external user (like web user). Such audience expect to search like in web search site.
For example it would be useful to implement a search on product for a web-shop.
Proposal
I propose to add a parameter search_full_text (similar to search_unaccented) to Char and Dict field, set to False by default and set to True for Text.
When the search_full_text is True, the ilike operator with a starting and ending % in value (or no wildcard) are converted into a full text search clause, if the back-end supports it like this:
where the correspondence between the database language and the Tryton code comes from the configuration with a fallback to simple.
A contextual key <model>.<field>.search_full_text permits to disable or force (without transformation) the conversion of the clause if the field has search_full_text set.
Ordering on the Char field with a contextual key <model>.<field>.rank_full_text will use the rank of the full text matching if the back-end supports it otherwise it is NULL.
Good feature. But your example to search a product in a webshop I avoided in my implementation. For such a case I think is better to update an external index, since there is no need to stress the ERP-backend with this requests.
I think it will be good to have the full text implementation. The Global search could also benefit from its presence.
Product names and descriptions are a translatable field but I do not see any mention to translation on your proposal? Does your proposal also want to support searching on translations by default or it will require an additional effort?
Iâd like to see Tryton supporting Full Text Search, and I also think that using PostgreSQLâs implementation is the way to go.
However, I donât quite like the proposal of changing:
That will prevent developers and external webservices to use pure âilikeâ domains on fields with the âsearch_full_textâ boolean set and to use full text search features on fields that have âsearch_full_textâ set to false.
Iâd rather add a new operator:
[(âdescriptionâ, â@@â, âfoo barâ)]
That can be used in any VARCHAR field. I would also keep the âsearch_full_textâ attribute on Char/Text fields but that would be sent to the client and sao/tryton would use the â@@â operator for those fields.
I donât know if with your proposal you tried it to make it compatible with existing search_rec_name implementations, for example, which my proposal may not cover completely. Is that the reason you didnât propose a specific operator?
I mean for supporting the case where a search of âmilk honeyâ is different then ââmilk honeyââ because if I am not mistaken the first one will search for the best matching text where the order is not relevant while the other will search for exact phrase between the quotes.
We do not care. The behavior will change for field with full text search activated.
The client domain parser does not permit already to write all the possible domain. And thatâs not a problem.
Moreover what the client does in domain parser for a string without operator, is to create a domain that tries to simulate a poor full text search with the ilike operator. So we will just implement what was the initial intention.
Once we open the can of the full text search, there will be request for exact-yet-full-text search.
Thereâs a difference between not being able to write a reference field domain and not being able to write a full text search that is quite common (as in Google).
I mean that once the possibility of full text search is there people will want the same features that they find in popular full text search engine. Because they are used to it.
Thatâs not what I would define as common. People usually know that quoting will make the search exact for this part of the query.
And it will be possible to quote the search. For example: "\"foo bar\"" will result into 'foo' <-> 'bar'.
But the goal is not to change the search entry of the client into a simple âweb searchâ. But to add âweb searchâ syntax to part of it because such search has mainly usefulness on Text. Our search entry has a more powerful syntax.
Indeed the real benefit of the full text search is not direct user of Tryton but external one.
If the fulltext search has benefits for the search in tryton I welcome it very much - but I wouldnât use it for my webshop. I also feel more comfortable if this load is done by a dedicated engine optimized for such operations without stressing my bussiness software.
That is the reason why I use external search indexes. You mentioned product search:
In my case the index has more than one field which are indexed and searched: name, description, category, manufacturer name, use case etc. - and I put a weight of all of them. I am optimzing this right now and still far from a good result
Example: âhammer for beginnersâ
I donât want a book for beginners as first result because for the book âbeginnerâ is witten 10 times in the description and for the hammer only twice. So for this case I need to search in more than one field and weight them.
Without wishing to say anything against this initiative, I think that one should not raise false expectations.
Regarding the latter doesnât look like a very strong argument as this already happens with several existing operators: âlike/not like/ilike/not ilikeâ when applied to non-text fields, âparent_of/child_ofâ for most fields, âwhere/not whereâ.
Regarding the complexity:
When you say that it requires changes to the domain parser, do you mean that it requires changing the parser of the search box in sao and tryton clients? Because what I had in mind it seems it would require minimal changes to the parser (Iâm not an expert in that code so I may be completely wrong).
When you say that it requires to review mostly all the code, what kind problems do you envision? Are you thinking of searcher methods? Of places that parse domains? Canât think of many problems myself.
I do not see what can not be done with the PostgreSQL Full Text Search feature that other indexed does.
About the concern for the load, you could still have a Tryton running on a replicated database on which you perform the full text searches. But I must say that website that will be overloaded by full text search queries on the database, are probably not in the scale of Tryton scope.
For the weighting there are also plenty of options that could be used. But this proposal is to implement the basics ready to use feature. For fine tuning it will be necessary to write some SQL queries but at least functions and tools will be there.
Also this feature has the benefit to add the full text search even on non-indexed data.
After thinking and reading again about that (and I would like to replace one of the components) I found some articles about that. This one is a good conclusion - Postgresql can be fast and I know that all the stuff like stemming etc is also available: https://rocky.dev/full-text-search
So if I could wish for something without being impertinent: field.Text('Description', search_full_text=True, column="fulltext", weight=2)
And I remember that I replaced one engine by the other because of speed or maintainability. I experimented with ngram length, weights, stemming etc - user-friendliness in searching is a topic of its own. And also I choose the index because I can use ready-made HTML snippets that I display in real time (what is not a question of the search itself but with redis and redisearch very integrated).
As I said, I would welcome the opportunity to dispense with a search engine as additional component.
I think I will propose a new type of field to store an parsed document that can be filled using weighting (with fallback to TEXT). This will be an internal only field and that can be searched only using full text.