This repository contains a set of Ruby classes designed to make ElasticSearch query creation simpler and easier to debug. Type checking and attribute coercion are handled using Virtus to make it harder to construct invalid ElasticSearch queries before sending them to the server.
The ElasticSearch Query DSL has a tremendous amount of flexibility, allowing users to finely tune their search results. However, elaborate queries often take the form of a complex, deeply nested hash, which can become difficult to create or traverse. By wrapping the core components of the query DSL into Ruby objects, Daedal addresses the following issues:
- Constructing a large nested hash can be a headache, and using some of the popular hash extensions may result in performance issues
- Remembering all the optional parameters each query can take, where they reside within the query structure, and what values they can take can be challenging
- Improperly structured queries, or queries with bad parameters, are hard to catch before sending to the server (and receiving an error)
- Debugging invalid queries can be a grueling task
Daedal also makes it easy to define custom queries tailored to your specific use case - you can see a simple example at the end of the documentation.
From the terminal:
$ gem install daedal
or in your Gemfile
:
gem 'daedal'
Then, it's as simple as including the line:
require 'daedal'
See the Daedal Wiki for some examples.
Other Ruby packages for ElasticSearch allow you to create queries either as hashes or by constructing raw JSON:
author_query = {'match' => {'author' => {'query' => 'Beckett'}}}
For simple queries like the example above, this works just fine. However, as queries become more complicated, the hash can quickly take on a life of its own. Inspired by ElasticSearch's Java API, Daedal contains Ruby classes designed to make query construction more manageable.
Queries are contained within the Daedal::Queries
module. You can construct query components like:
author_query = Daedal::Queries::MatchQuery.new(field: 'author', query: 'Beckett')
Each query has #to_json
defined for easy conversion for use with any of the Ruby
ElasticSearch clients out there:
author_query.to_json # => "{\"match\":{\"author\":{\"query\":\"Beckett\"}}}"
The benefits of using Daedal become more obvious for aggregate queries such as the bool query
:
bool_query = Daedal::Queries::BoolQuery.new(must: [author_query])
bool_query.to_json # => "{\"bool\":{\"should\":[],\"must\":[{\"match\":{\"author\":{\"query\":\"Beckett\"}}}],\"must_not\":[]}}"
lines_query = Daedal::Queries::MatchQuery.new(field: 'lines', query: "We're waiting for Godot")
bool_query.should << lines_query
bool_query.to_json # => "{\"bool\":{\"should\":[{\"match\":{\"lines\":{\"query\":\"We're waiting for Godot\"}}}],\"must\":[{\"match\":{\"author\":{\"query\":\"Beckett\"}}}],\"must_not\":[]}}"
Currently, the following queries have been implemented:
- bool query
- constant score query
- dis max query
- filtered query
- fuzzy query
- match all query
- match query
- multi match query
- nested query
- prefix query
- query string query
On deck:
Queries I'm not planning on implementing at all, since they're deprecated:
Filters are contained within the Daedal::Filters
module. You can construct filter components
in the same way as queries:
term_filter = Daedal::Filters::TermFilter.new(field: 'characters', term: 'Pozzo')
term_filter.to_json # => "{\"term\":{\"characters\":\"Pozzo\"}}"
Currently, the following filters have been implemented:
- and filter
- bool filter
- geo distance filter
- or filter
- range filter
- term filter
- terms filter
- nested filter
- exists filter
When creating ElasticSearch queries via nested hashes, it is all too easy to assign an invalid value to a specific field, which would then result in an error response when sending the query to the server. For instance, the query:
constant_score_query = {'constant_score' => {'boost' => 'foo', 'query' => {'match_all' => {}}}}
would yield a server error, since the boost
parameter must be a number.
Daedal uses Virtus to perform data-type coercions. Invalid query parameters are surfaced at runtime, making debugging much easier. The previous example in Daedal would raise an error:
match_all_query = Daedal::Queries::MatchAllQuery.new
constant_score_query = Daedal::Queries::ConstantScoreQuery.new(boost: 'foo', query: match_all_query)
# Virtus::CoercionError: Failed to coerce "foo" into Float
Similarly, trying to add non-queries to an aggregate query like the bool
or dis max
queries
will result in an error:
dis_max_query = Daedal::Queries::DisMaxQuery.new
dis_max_query.queries << :foo
# Virtus::CoercionError: Failed to coerce :foo into "Daedal::Queries::Query"
Currently, I've only made it through a fraction of the entire Query DSL, but will be adding more as time goes by. If there's a component of the query DSL that you need that isn't implemented yet, it's easy to define it yourself within your project (or, feel free to contribute to Daedal!).
Creating custom filters or queries tailored specifically to your use case is also pretty straightforward. Here are some guidelines:
- Make your class inherit from
Daedal::Queries::Query
orDaedal::Filters::Filter
- Define the parameters for your query using Virtus attributes (note:
strict
coercion is being used) - Define the
#to_hash
method
Example of a custom query:
class PlayQuery < Daedal::Queries::Query
# define the parameters that you want in your query
# if the field is optional, make sure to set required to false
attribute :author, String
attribute :title, String
attribute :characters, Array[String], required: false
def construct_query
author_query = Daedal::Queries::MatchQuery.new(field: 'author', query: author)
title_query = Daedal::Queries::MatchQuery.new(field: 'title', query: title)
full_query = Daedal::Queries::BoolQuery.new(must: [author_query], should: [title_query])
characters.each do |character|
full_query.should << Daedal::Queries::MatchQuery.new(field: 'characters', query: character)
end
full_query
end
# define the to_hash method to convert for use in ElasticSearch
def to_hash
construct_query.to_hash
end
end
play_query = PlayQuery.new(author: 'Beckett', title: 'Waiting for Godot', characters: ['Estragon', 'Vladimir'])
puts play_query.to_json # => {"bool":{"should":[{"match":{"title":{"query":"Waiting for Godot"}}},{"match":{"characters":{"query":"Estragon"}}},{"match":{"characters":{"query":"Vladimir"}}}],"must":[{"match":{"author":{"query":"Beckett"}}}],"must_not":[]}}
The ElasticSearch Query DSL is pretty big and includes a ton of nuance. I'm starting with the most basic parts of the DSL (and the parts I use for work), so if you want to help out with the project to meet your needs please feel free to contribute! I just ask that you:
- Fork the project
- Make your changes or additions
- Add tests! My goal is to keep Daedal a thoroughly tested project
- Send me a pull request
Feedback or suggestions are also always welcome!
The MIT License (MIT)
Copyright (c) 2013 Christopher Schuch
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.