Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid casting a tsvector column to text and back again. #262

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

savef
Copy link

@savef savef commented Aug 24, 2015

When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the time improved from 40ms to 3ms. And on a query that matched 397 results the time improved from over 3000ms to 60ms.

I think this essentially solves the problem that the :tsvector_column option tries to solve, but automatically. It takes the columns in :against and if they are a tsvector it does not cast them (to text, and back to tsvector). I think the only slight difference is that with this commit the column will still be coalesced with "".

Here's what the relevant part of an example query looks like:
setweight(to_tsvector('english', coalesce("documents"."title"::text, '')), 'A') || setweight(to_tsvector('english', coalesce("documents"."author"::text, '')), 'B') || setweight(coalesce("documents"."content_tsv", ''), 'C')


Thanks for reading, if this is something you'll consider merging here are a couple of followup questions.

If you agree that this renders the :tsvector_column option obsolete is it worth removing that feature as part of this PR?

Do the changes here apply to either trigram or dmetaphone searching? I don't use those but would look at improving them too if the changes are relevant.


This was #247, but I needed it to point to a new branch on my fork.

savef added 2 commits May 27, 2015 10:53
When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the speed improved from 40ms to 3ms. And on a query that matched 397 results the speed improved from over 3000ms to 60ms.
@faucct
Copy link

faucct commented Nov 5, 2015

+1'ing this pull request, also it would be nice to make it work in associated_against columns too.
Column#to_sql method should be renamed to to_text probably and the coercing logic can be hidden inside Column#to_tsvector(normalizer) method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants