-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Exact search on a post title #428
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a solid start! I'm not sure it's doing exactly what you want it to right now - some comments below that will hopefully help.
@@ -26,13 +30,16 @@ def attributes_print | |||
def self.sanitize_for_search(term, **cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original behaviour of this method was to take a search parameter and list of columns, and turn them into a search against a single fulltext index, i.e.
sanitize_for_search 'query', posts: :body
# => "MATCH (`posts`.`body`) AGAINST ('query' IN BOOLEAN MODE)"
Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST
clauses for multiple fulltext indexes? At the moment, I don't think this does what you want it to. If you pass in an array of columns, you get the array of clauses:
sanitize_for_search 'query', posts: [:body, :title]
# => [" MATCH `posts`.`body` AGAINST ('query' IN BOOLEAN MODE)", " MATCH `posts`.`title` AGAINST ('query' IN BOOLEAN MODE)"]
If you only pass a single column, you just get the column's name, no MATCH clause at all:
sanitize_for_search 'query', posts: :body
# => "`posts`.`body`"
What's probably an easier way to do it (if I understand what you want correctly) is:
- keep the purpose of the
cols = cols.map
block the same, i.e. map unsanitized column names to sanitized column names; then - return a map of that, which turns it into the MATCH clauses.
Something like this:
cols = cols.map do |table_name, column_or_array|
if column_or_array.is_a?(Array)
column_or_array.map { |column_name| "#{sanitize_name table_name}.#{sanitize_name column_name}" }
else
"#{sanitize_name table_name}.#{sanitize_name column_or_array}"
end
end.flatten
cols.map do |sanitized_name|
ActiveRecord::Base.sanitize_sql(["MATCH (#{sanitized_name}) AGAINST (? IN BOOLEAN MODE)", term])
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST clauses for multiple fulltext indexes?
Yup exactly what I was trying to do. Ran your suggestion and it worked like a charm!
Would there be a case where we pass in a single column? The methods are first being called here https://github.com/codidact/qpixel/blob/develop/app/models/post.rb#L59 and it looks like it will only ever look at the columns that we tell it. If there is, then the way I am trying to do the search weight would break (not that it was a good way to do it anyways 😛) since it will try to look for a column that doesn't exist.
mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ') | ||
whereClause = sanitized.map { |val| val.to_s }.join(' OR') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Ruby variables are named in snake_case
, i.e. mapped_cols
and where_clause
@@ -8,7 +8,11 @@ def self.fuzzy_search(term, **cols) | |||
|
|||
def self.match_search(term, **cols) | |||
sanitized = sanitize_for_search term, **cols | |||
select(Arel.sql("`#{table_name}`.*, #{sanitized} AS search_score")).where(sanitized) | |||
|
|||
mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActiveSupport has a neat extension you can use to simplify this:
mapped_cols = sanitized.map.with_index { |val, idx| "#{val} AS search_score_#{idx}" }.join(', ')
mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ') | ||
whereClause = sanitized.map { |val| val.to_s }.join(' OR') | ||
|
||
select(Arel.sql("`#{table_name}`.*, #{mappedCols}")).where(whereClause).order('search_score_0 * 2 + search_score_1 * 1 DESC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap raw SQL statements in Arel.sql
when you pass them to ActiveRecord:
.order(Arel.sql('search_score_0 * 2 + search_score_1 * 1 DESC'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "* 1" do? Seems unnecessary - i.e., is that any different from just "search_score_0 * 2 + search_score_1 DESC"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not - I just copied it out without really thinking about it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, does nothing really. Was just experimenting on how we would determine search weights
@@ -8,7 +8,7 @@ def search | |||
.paginate(page: params[:page], per_page: 25) | |||
|
|||
if search_data[:search].present? | |||
posts.search(search_data[:search]).user_sort({ term: params[:sort], default: :search_score }, | |||
posts.search(search_data[:search]).user_sort({ term: params[:sort] }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with updating to re-add a default for the new weighted search, and to change the relevance
sort to the same weighted search. Not sure whether our implementation of user_sort
supports that via Arel.sql
, though, so that's probably not one for this PR - needs investigation.
It looks like we lost track of this. Does anybody have any updates? There is a more recent effort looking into moving to elasticsearch, which interacts with searching titles, but it's also marked as a draft. |
Working on addressing #387.
Previously the search function was only using body_markdown. I changed it so that it also uses the title for the search. Because of this, the sql statement got a little more complex. I am trying to have the title's search score weigh more than body_markdown and was having a bit of trouble figuring out how to do that without hard coding it. Wondering if there is any way I could improve this before opening up the PR for review.