Hello,
We encountered a few problems linked to the limitations on Hash
indexes in postgres.
As per the docs:
Hash indexes have been designed to cope with uneven distributions of hash values. Direct access to the bucket pages works well if the hash values are evenly distributed. When inserts mean that the bucket page becomes full, additional overflow pages are chained to that specific bucket page, locally expanding the storage for index tuples that match that hash value. When scanning a hash bucket during queries, we need to scan through all of the overflow pages. Thus an unbalanced hash index might actually be worse than a B-tree in terms of number of block accesses required, for some data.
As a result of the overflow cases, we can say that hash indexes are most suitable for unique, nearly unique data or data with a low number of rows per hash bucket. One possible way to avoid problems is to exclude highly non-unique values from the index using a partial index condition, but this may not be suitable in many cases.
What I get from this is that Hash
indexes are a good fit for columns with diverse data.
Symptoms here are extremely slow index building, and sub-par performances when the index is used.
I’m wondering if there could be some warnings if one uses a Index.Equality
on such columns (fields.Selection
/ fields.Boolean
should maybe even be forbidden), since developpers may assume that a valid optimisation for a ('state', '=', ...)
query is to add a Index.Equality
.