toptal/chewy
{ "createdAt": "2013-12-09T08:33:03Z", "defaultBranch": "master", "description": "High-level Elasticsearch Ruby framework based on the official elasticsearch-ruby client", "fullName": "toptal/chewy", "homepage": "", "language": "Ruby", "name": "chewy", "pushedAt": "2025-11-26T07:02:15Z", "stargazersCount": 1896, "topics": [ "elasticsearch", "elasticsearch-client", "ruby" ], "updatedAt": "2025-11-21T15:02:35Z", "url": "https://github.com/toptal/chewy"}Chewy is an ODM (Object Document Mapper), built on top of the official Elasticsearch client.
Why Chewy?
Section titled “Why Chewy?”In this section we’ll cover why you might want to use Chewy instead of the official elasticsearch-ruby client gem.
-
Every index is observable by all the related models.
Most of the indexed models are related to other and sometimes it is necessary to denormalize this related data and put at the same object. For example, you need to index an array of tags together with an article. Chewy allows you to specify an updateable index for every model separately - so corresponding articles will be reindexed on any tag update.
-
Bulk import everywhere.
Chewy utilizes the bulk ES API for full reindexing or index updates. It also uses atomic updates. All the changed objects are collected inside the atomic block and the index is updated once at the end with all the collected objects. See
Chewy.strategy(:atomic)for more details. -
Powerful querying DSL.
Chewy has an ActiveRecord-style query DSL. It is chainable, mergeable and lazy, so you can produce queries in the most efficient way. It also has object-oriented query and filter builders.
-
Support for ActiveRecord.
Installation
Section titled “Installation”Add this line to your application’s Gemfile:
gem 'chewy'And then execute:
$ bundleOr install it yourself as:
$ gem install chewyCompatibility
Section titled “Compatibility”Chewy is compatible with MRI 3.0-3.3¹.
¹ Ruby 3 is supported with Rails 6.1, 7.0, 7.1 and 7.2
Elasticsearch compatibility matrix
Section titled “Elasticsearch compatibility matrix”| Chewy version | Elasticsearch version |
|---|---|
| 8.0.0 | 8.x |
| 7.2.x | 7.x |
| 7.1.x | 7.x |
| 7.0.x | 6.8, 7.x |
| 6.0.0 | 5.x, 6.x |
| 5.x | 5.x, limited support for 1.x & 2.x |
Important: Chewy doesn’t follow SemVer, so you should always check the release notes before upgrading. The major version is linked to the newest supported Elasticsearch and the minor version bumps may include breaking changes.
See our [migration guide]!(migration_guide.md) for detailed upgrade instructions between various Chewy versions.
Active Record
Section titled “Active Record”The following Active Record versions are supported by Chewy:
- 6.1
- 7.0
- 7.1
- 7.2
Getting Started
Section titled “Getting Started”Chewy provides functionality for Elasticsearch index handling, documents import mappings, index update strategies and chainable query DSL.
Minimal client setting
Section titled “Minimal client setting”Create config/initializers/chewy.rb with this line:
Chewy.settings = {host: 'localhost:9250'}And run rails g chewy:install to generate chewy.yml:
# separate environment configstest: host: 'localhost:9250' prefix: 'test'development: host: 'localhost:9200'Elasticsearch
Section titled “Elasticsearch”Make sure you have Elasticsearch up and running. You can install it locally, but the easiest way is to use Docker:
$ docker run --rm --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.15.0Security
Section titled “Security”Please note that starting from version 8 ElasticSearch has security features enabled by default.
Docker command above has it disabled for local testing convenience. If you want to enable it, omit
"xpack.security.enabled=false" part from Docker command, and run these command after starting container (container name es8 assumed):
Reset password for elastic user:
docker container exec es8 '/usr/share/elasticsearch/bin/elasticsearch-reset-password' -u elasticExtract CA certificate generated by ElasticSearch on first run:
docker container cp es8:/usr/share/elasticsearch/config/certs/http_ca.crt tmp/And then add them to settings:
development: host: 'localhost:9200' user: 'elastic' password: 'SomeLongPassword' transport_options: ssl: ca_file: './tmp/http_ca.crt'Create app/chewy/users_index.rb with User Index:
class UsersIndex < Chewy::Index settings analysis: { analyzer: { email: { tokenizer: 'keyword', filter: ['lowercase'] } } }
index_scope User field :first_name field :last_name field :email, analyzer: 'email'endAdd User model, table and migrate it:
$ bundle exec rails g model User first_name last_name email$ bundle exec rails db:migrateAdd update_index to app/models/user.rb:
class User < ApplicationRecord update_index('users') { self }endExample of data request
Section titled “Example of data request”- Once a record is created (could be done via the Rails console), it creates User index too:
User.create( first_name: "test1", last_name: "test1", email: 'test1@example.com', # other fields)# UsersIndex Import (355.3ms) {:index=>1}# => #<User id: 1, first_name: "test1", last_name: "test1", email: "test1@example.com", # other fields>- A query could be exposed at a given
UsersController:
def search @users = UsersIndex.query(query_string: { fields: [:first_name, :last_name, :email, ...], query: search_params[:query], default_operator: 'and' }) render json: @users.to_json, status: :okend
private
def search_params params.permit(:query, :page, :per)end- So a request against
http://localhost:3000/users/search?query=test1@example.comissuing a response like:
[ { "attributes":{ "id":"1", "first_name":"test1", "last_name":"test1", "email":"test1@example.com", ... "_score":0.9808291, "_explanation":null }, "_data":{ "_index":"users", "_type":"_doc", "_id":"1", "_score":0.9808291, "_source":{ "first_name":"test1", "last_name":"test1", "email":"test1@example.com", ... } } }]Usage and configuration
Section titled “Usage and configuration”Client settings
Section titled “Client settings”To configure the Chewy client you need to add chewy.rb file with Chewy.settings hash:
Chewy.settings = {host: 'localhost:9250'} # do not use environmentsAnd add chewy.yml configuration file.
You can create chewy.yml manually or run rails g chewy:install to generate it:
# separate environment configstest: host: 'localhost:9250' prefix: 'test'development: host: 'localhost:9200'The resulting config merges both hashes. Client options are passed as is to Elasticsearch::Transport::Client except for the :prefix, which is used internally by Chewy to create prefixed index names:
Chewy.settings = {prefix: 'test'} UsersIndex.index_name # => 'test_users'The logger may be set explicitly:
Chewy.logger = Logger.new(STDOUT)See [config.rb]!(lib/chewy/config.rb) for more details.
AWS Elasticsearch
Section titled “AWS Elasticsearch”If you would like to use AWS’s Elasticsearch using an IAM user policy, you will need to sign your requests for the es:* action by injecting the appropriate headers passing a proc to transport_options.
You’ll need an additional gem for Faraday middleware: add gem 'faraday_middleware-aws-sigv4' to your Gemfile.
require 'faraday_middleware/aws_sigv4'
Chewy.settings = { host: 'http://my-es-instance-on-aws.us-east-1.es.amazonaws.com:80', port: 80, # 443 for https host transport_options: { headers: { content_type: 'application/json' }, proc: -> (f) do f.request :aws_sigv4, service: 'es', region: 'us-east-1', access_key_id: ENV['AWS_ACCESS_KEY'], secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'] end }}Index definition
Section titled “Index definition”- Create
/app/chewy/users_index.rb
class UsersIndex < Chewy::Index
end- Define index scope (you can omit this part if you don’t need to specify a scope (i.e. use PORO objects for import) or options)
class UsersIndex < Chewy::Index index_scope User.active # or just model instead_of scope: index_scope Userend- Add some mappings
class UsersIndex < Chewy::Index index_scope User.active.includes(:country, :badges, :projects) field :first_name, :last_name # multiple fields without additional options field :email, analyzer: 'email' # Elasticsearch-related options field :country, value: ->(user) { user.country.name } # custom value proc field :badges, value: ->(user) { user.badges.map(&:name) } # passing array values to index field :projects do # the same block syntax for multi_field, if `:type` is specified field :title field :description # default data type is `text` # additional top-level objects passed to value proc: field :categories, value: ->(project, user) { project.categories.map(&:name) if user.active? } end field :rating, type: 'integer' # custom data type field :created, type: 'date', include_in_all: false, value: ->{ created_at } # value proc for source object contextendSee here for mapping definitions.
- Add some index-related settings. Analyzer repositories might be used as well. See
Chewy::Index.settingsdocs for details:
class UsersIndex < Chewy::Index settings analysis: { analyzer: { email: { tokenizer: 'keyword', filter: ['lowercase'] } } }
index_scope User.active.includes(:country, :badges, :projects) root date_detection: false do template 'about_translations.*', type: 'text', analyzer: 'standard'
field :first_name, :last_name field :email, analyzer: 'email' field :country, value: ->(user) { user.country.name } field :badges, value: ->(user) { user.badges.map(&:name) } field :projects do field :title field :description end field :about_translations, type: 'object' # pass object type explicitly if necessary field :rating, type: 'integer' field :created, type: 'date', include_in_all: false, value: ->{ created_at } endendSee index settings here. See root object settings here.
See [mapping.rb]!(lib/chewy/index/mapping.rb) for more details.
- Add model-observing code
class User < ActiveRecord::Base update_index('users') { self } # specifying index and back-reference # for updating after user save or destroyend
class Country < ActiveRecord::Base has_many :users
update_index('users') { users } # return single object or collectionend
class Project < ActiveRecord::Base update_index('users') { user if user.active? } # you can return even `nil` from the back-referenceend
class Book < ActiveRecord::Base update_index(->(book) {"books_#{book.language}"}) { self } # dynamic index name with proc. # For book with language == "en" # this code will generate `books_en`endAlso, you can use the second argument for method name passing:
update_index('users', :self)update_index('users', :users)In the case of a belongs_to association you may need to update both associated objects, previous and current:
class City < ActiveRecord::Base belongs_to :country
update_index('cities') { self } update_index 'countries' do previous_changes['country_id'] || country endendDefault import options
Section titled “Default import options”Every index has default_import_options configuration to specify, suddenly, default import options:
class ProductsIndex < Chewy::Index index_scope Post.includes(:tags) default_import_options batch_size: 100, bulk_size: 10.megabytes, refresh: false
field :name field :tags, value: -> { tags.map(&:name) }endSee [import.rb]!(lib/chewy/index/import.rb) for available options.
Multi (nested) and object field types
Section titled “Multi (nested) and object field types”To define an objects field you can simply nest fields in the DSL:
field :projects do field :title field :descriptionendThis will automatically set the type or root field to object. You may also specify type: 'objects' explicitly.
To define a multi field you have to specify any type except for object or nested in the root field:
field :full_name, type: 'text', value: ->{ full_name.strip } do field :ordered, analyzer: 'ordered' field :untouched, type: 'keyword'endThe value: option for internal fields will no longer be effective.
Geo Point fields
Section titled “Geo Point fields”You can use Elasticsearch’s geo mapping with the geo_point field type, allowing you to query, filter and order by latitude and longitude. You can use the following hash format:
field :coordinates, type: 'geo_point', value: ->{ {lat: latitude, lon: longitude} }or by using nested fields:
field :coordinates, type: 'geo_point' do field :lat, value: ->{ latitude } field :long, value: ->{ longitude }endSee the section on Script fields for details on calculating distance in a search.
Join fields
Section titled “Join fields”You can use a join field
to implement parent-child relationships between documents.
It replaces the old parent_id based parent-child mapping
To use it, you need to pass relations and join (with type and id) options:
field :hierarchy_link, type: :join, relations: {question: %i[answer comment], answer: :vote, vote: :subvote}, join: {type: :comment_type, id: :commented_id}assuming you have comment_type and commented_id fields in your model.
Note that when you reindex a parent, its children and grandchildren will be reindexed as well. This may require additional queries to the primary database and to elastisearch.
Also note that the join field doesn’t support crutches (it should be a field directly defined on the model).
Crutches™ technology
Section titled “Crutches™ technology”Assume you are defining your index like this (product has_many categories through product_categories):
class ProductsIndex < Chewy::Index index_scope Product.includes(:categories) field :name field :category_names, value: ->(product) { product.categories.map(&:name) } # or shorter just -> { categories.map(&:name) }endThen the Chewy reindexing flow will look like the following pseudo-code:
Product.includes(:categories).find_in_batches(1000) do |batch| bulk_body = batch.map do |object| {name: object.name, category_names: object.categories.map(&:name)}.to_json end # here we are sending every batch of data to ES Chewy.client.bulk bulk_bodyendIf you meet complicated cases when associations are not applicable you can replace Rails associations with Chewy Crutches™ technology:
class ProductsIndex < Chewy::Index index_scope Product crutch :categories do |collection| # collection here is a current batch of products # data is fetched with a lightweight query without objects initialization data = ProductCategory.joins(:category).where(product_id: collection.map(&:id)).pluck(:product_id, 'categories.name') # then we have to convert fetched data to appropriate format # this will return our data in structure like: # {123 => ['sweets', 'juices'], 456 => ['meat']} data.each.with_object({}) { |(id, name), result| (result[id] ||= []).push(name) } end
field :name # simply use crutch-fetched data as a value: field :category_names, value: ->(product, crutches) { crutches[:categories][product.id] }endAn example flow will look like this:
Product.includes(:categories).find_in_batches(1000) do |batch| crutches[:categories] = ProductCategory.joins(:category).where(product_id: batch.map(&:id)).pluck(:product_id, 'categories.name') .each.with_object({}) { |(id, name), result| (result[id] ||= []).push(name) }
bulk_body = batch.map do |object| {name: object.name, category_names: crutches[:categories][object.id]}.to_json end Chewy.client.bulk bulk_bodyendSo Chewy Crutches™ technology is able to increase your indexing performance in some cases up to a hundredfold or even more depending on your associations complexity.
Witchcraft™ technology
Section titled “Witchcraft™ technology”One more experimental technology to increase import performance. As far as you know, chewy defines value proc for every imported field in mapping, so at the import time each of these procs is executed on imported object to extract result document to import. It would be great for performance to use one huge whole-document-returning proc instead. So basically the idea or Witchcraft™ technology is to compile a single document-returning proc from the index definition.
index_scope Productwitchcraft!
field :titlefield :tags, value: -> { tags.map(&:name) }field :categories do field :name, value: -> (product, category) { category.name } field :type, value: -> (product, category, crutch) { crutch.types[category.name] }endThe index definition above will be compiled to something close to:
-> (object, crutches) do { title: object.title, tags: object.tags.map(&:name), categories: object.categories.map do |object2| { name: object2.name type: crutches.types[object2.name] } end }endAnd don’t even ask how is it possible, it is a witchcraft. Obviously not every type of definition might be compiled. There are some restrictions:
- Use reasonable formatting to make
method_sourcebe able to extract field value proc sources. - Value procs with splat arguments are not supported right now.
- If you are generating fields dynamically use value proc with arguments, argumentless value procs are not supported yet:
[:first_name, :last_name].each do |name| field name, value: -> (o) { o.send(name) }endHowever, it is quite possible that your index definition will be supported by Witchcraft™ technology out of the box in most of the cases.
Raw Import
Section titled “Raw Import”Another way to speed up import time is Raw Imports. This technology is only available in ActiveRecord adapter. Very often, ActiveRecord model instantiation is what consumes most of the CPU and RAM resources. Precious time is wasted on converting, say, timestamps from strings and then serializing them back to strings. Chewy can operate on raw hashes of data directly obtained from the database. All you need is to provide a way to convert that hash to a lightweight object that mimics the behaviour of the normal ActiveRecord object.
class LightweightProduct def initialize(attributes) @attributes = attributes end
# Depending on the database, `created_at` might # be in different formats. In PostgreSQL, for example, # you might see the following format: # "2016-03-22 16:23:22" # # Taking into account that Elastic expects something different, # one might do something like the following, just to avoid # unnecessary String -> DateTime -> String conversion. # # "2016-03-22 16:23:22" -> "2016-03-22T16:23:22Z" def created_at @attributes['created_at'].tr(' ', 'T') << 'Z' endend
index_scope Productdefault_import_options raw_import: ->(hash) { LightweightProduct.new(hash)}
field :created_at, 'datetime'Also, you can pass :raw_import option to the import method explicitly.
Index creation during import
Section titled “Index creation during import”By default, when you perform import Chewy checks whether an index exists and creates it if it’s absent.
You can turn off this feature to decrease Elasticsearch hits count.
To do so you need to set skip_index_creation_on_import parameter to false in your config/chewy.yml
Skip record fields during import
Section titled “Skip record fields during import”You can use ignore_blank: true to skip fields that return true for the .blank? method:
index_scope Countryfield :idfield :cities, ignore_blank: true do field :id field :name field :surname, ignore_blank: true field :descriptionendDefault values for different types
Section titled “Default values for different types”By default ignore_blank is false on every type except geo_point.
Journaling
Section titled “Journaling”You can record all actions that were made to the separate journal index in ElasticSearch. When you create/update/destroy your documents, it will be saved in this special index. If you make something with a batch of documents (e.g. during index reset) it will be saved as a one record, including primary keys of each document that was affected. Common journal record looks like this:
{ "action": "index", "object_id": [1, 2, 3], "index_name": "...", "created_at": "<timestamp>"}This feature is turned off by default.
You can turn it on by setting journal option to true in config/chewy.yml.
Also, you can provide this option while you’re importing some index:
CityIndex.import journal: trueOr as a default import option for an index:
class CityIndex index_scope City default_import_options journal: trueendYou may be wondering why do you need it? The answer is simple: not to lose the data.
Imagine that you reset your index in a zero-downtime manner (to separate index),
and in the meantime somebody keeps updating the data frequently (to old
index). So all these actions will be written to the journal index and you’ll be
able to apply them after index reset using the Chewy::Journal interface.
When enabled, journal can grow to enormous size, consider setting up cron job
that would clean it occasionally using chewy:journal:clean rake
task.
Index manipulation
Section titled “Index manipulation”UsersIndex.delete # destroy index if it existsUsersIndex.delete!
UsersIndex.createUsersIndex.create! # use bang or non-bang methods
UsersIndex.purgeUsersIndex.purge! # deletes then creates index
UsersIndex.import # import with 0 arguments process all the data specified in index_scope definitionUsersIndex.import User.where('rating > 100') # or import specified users scopeUsersIndex.import User.where('rating > 100').to_a # or import specified users arrayUsersIndex.import [1, 2, 42] # pass even ids for import, it will be handled in the most effective wayUsersIndex.import User.where('rating > 100'), update_fields: [:email] # if update fields are specified - it will update their values only with the `update` bulk actionUsersIndex.import! # raises an exception in case of any import errors
UsersIndex.reset! # purges index and imports default data for all typesIf the passed user is #destroyed?, or satisfies a delete_if index_scope option, or the specified id does not exist in the database, import will perform delete from index action for this object.
index_scope User, delete_if: :deleted_atindex_scope User, delete_if: -> { deleted_at }index_scope User, delete_if: ->(user) { user.deleted_at }See [actions.rb]!(lib/chewy/index/actions.rb) for more details.
Index update strategies
Section titled “Index update strategies”Assume you’ve got the following code:
class City < ActiveRecord::Base update_index 'cities', :selfend
class CitiesIndex < Chewy::Index index_scope City field :nameendIf you do something like City.first.save! you’ll get an UndefinedUpdateStrategy exception instead of the object saving and index updating. This exception forces you to choose an appropriate update strategy for the current context.
If you want to return to the pre-0.7.0 behavior - just set Chewy.root_strategy = :bypass.
:atomic
Section titled “:atomic”The main strategy here is :atomic. Assume you have to update a lot of records in the db.
Chewy.strategy(:atomic) do City.popular.map(&:do_some_update_action!)endUsing this strategy delays the index update request until the end of the block. Updated records are aggregated and the index update happens with the bulk API. So this strategy is highly optimized.
:sidekiq
Section titled “:sidekiq”This does the same thing as :atomic, but asynchronously using sidekiq. Patch Chewy::Strategy::Sidekiq::Worker for index updates improving.
Chewy.strategy(:sidekiq) do City.popular.map(&:do_some_update_action!)endThe default queue name is chewy, you can customize it in settings: sidekiq.queue_name
Chewy.settings[:sidekiq] = {queue: :low}:lazy_sidekiq
Section titled “:lazy_sidekiq”This does the same thing as :sidekiq, but with lazy evaluation. Beware it does not allow you to use any non-persistent record state for indices and conditions because record will be re-fetched from database asynchronously using sidekiq. However for destroying records strategy will fallback to :sidekiq because it’s not possible to re-fetch deleted records from database.
The purpose of this strategy is to improve the response time of the code that should update indexes, as it does not only defer actual ES calls to a background job but update_index callbacks evaluation (for created and updated objects) too. Similar to :sidekiq, index update is asynchronous so this strategy cannot be used when data and index synchronization is required.
Chewy.strategy(:lazy_sidekiq) do City.popular.map(&:do_some_update_action!)endThe default queue name is chewy, you can customize it in settings: sidekiq.queue_name
Chewy.settings[:sidekiq] = {queue: :low}:delayed_sidekiq
Section titled “:delayed_sidekiq”It accumulates IDs of records to be reindexed during the latency window in Redis and then performs the reindexing of all accumulated records at once.
This strategy is very useful in the case of frequently mutated records.
It supports the update_fields option, so it will attempt to select just enough data from the database.
Keep in mind, this strategy does not guarantee reindexing in the event of Sidekiq worker termination or an error during the reindexing phase. This behavior is intentional to prevent continuous growth of Redis db.
There are three options that can be defined in the index:
class CitiesIndex... strategy_config delayed_sidekiq: { latency: 3, margin: 2, ttl: 60 * 60 * 24, reindex_wrapper: ->(&reindex) { ActiveRecord::Base.connected_to(role: :reading) { reindex.call } } # latency - will prevent scheduling identical jobs # margin - main purpose is to cover db replication lag by the margin # ttl - a chunk expiration time (in seconds) # reindex_wrapper - lambda that accepts block to wrap that reindex process AR connection block. }
...endAlso you can define defaults in the initializers/chewy.rb
Chewy.settings = { strategy_config: { delayed_sidekiq: { latency: 3, margin: 2, ttl: 60 * 60 * 24, reindex_wrapper: ->(&reindex) { ActiveRecord::Base.connected_to(role: :reading) { reindex.call } } } }}or in config/chewy.yml
strategy_config: delayed_sidekiq: latency: 3 margin: 2 ttl: <%= 60 * 60 * 24 %> # reindex_wrapper setting is not possible here!!! use the initializer insteadYou can use the strategy identically to other strategies
Chewy.strategy(:delayed_sidekiq) do City.popular.map(&:do_some_update_action!)endThe default queue name is chewy, you can customize it in settings: sidekiq.queue_name
Chewy.settings[:sidekiq] = {queue: :low}Explicit call of the reindex using :delayed_sidekiq strategy
CitiesIndex.import([1, 2, 3], strategy: :delayed_sidekiq)Explicit call of the reindex using :delayed_sidekiq strategy with :update_fields support
CitiesIndex.import([1, 2, 3], update_fields: [:name], strategy: :delayed_sidekiq)While running tests with delayed_sidekiq strategy and Sidekiq is using a real redis instance that is NOT cleaned up in between tests (via e.g. Sidekiq.redis(&:flushdb)), you’ll want to cleanup some redis keys in between tests to avoid state leaking and flaky tests. Chewy provides a convenience method for that:
# it might be a good idea to also add to your testing setup, e.g.: a rspec `before` hookChewy::Strategy::DelayedSidekiq.clear_timechunks!:active_job
Section titled “:active_job”This does the same thing as :atomic, but using ActiveJob. This will inherit the ActiveJob configuration settings including the active_job.queue_adapter setting for the environment. Patch Chewy::Strategy::ActiveJob::Worker for index updates improving.
Chewy.strategy(:active_job) do City.popular.map(&:do_some_update_action!)endThe default queue name is chewy, you can customize it in settings: active_job.queue_name
Chewy.settings[:active_job] = {queue: :low}:urgent
Section titled “:urgent”The following strategy is convenient if you are going to update documents in your index one by one.
Chewy.strategy(:urgent) do City.popular.map(&:do_some_update_action!)endThis code will perform City.popular.count requests for ES documents update.
It is convenient for use in e.g. the Rails console with non-block notation:
> Chewy.strategy(:urgent)> City.popular.map(&:do_some_update_action!):bypass
Section titled “:bypass”When the bypass strategy is active the index will not be automatically updated on object save.
For example, on City.first.save! the cities index would not be updated.
Nesting
Section titled “Nesting”Strategies are designed to allow nesting, so it is possible to redefine it for nested contexts.
Chewy.strategy(:atomic) do city1.do_update! Chewy.strategy(:urgent) do city2.do_update! city3.do_update! # there will be 2 update index requests for city2 and city3 end city4..do_update! # city1 and city4 will be grouped in one index update requestendNon-block notation
Section titled “Non-block notation”It is possible to nest strategies without blocks:
Chewy.strategy(:urgent)city1.do_update! # index updatedChewy.strategy(:bypass)city2.do_update! # update bypassedChewy.strategy.popcity3.do_update! # index updated againDesigning your own strategies
Section titled “Designing your own strategies”See [strategy/base.rb]!(lib/chewy/strategy/base.rb) for more details. See [strategy/atomic.rb]!(lib/chewy/strategy/atomic.rb) for an example.
Rails application strategies integration
Section titled “Rails application strategies integration”There are a couple of predefined strategies for your Rails application. Initially, the Rails console uses the :urgent strategy by default, except in the sandbox case. When you are running sandbox it switches to the :bypass strategy to avoid polluting the index.
Migrations are wrapped with the :bypass strategy. Because the main behavior implies that indices are reset after migration, there is no need for extra index updates. Also indexing might be broken during migrations because of the outdated schema.
Controller actions are wrapped with the configurable value of Chewy.request_strategy and defaults to :atomic. This is done at the middleware level to reduce the number of index update requests inside actions.
It is also a good idea to set up the :bypass strategy inside your test suite and import objects manually only when needed, and use Chewy.massacre when needed to flush test ES indices before every example. This will allow you to minimize unnecessary ES requests and reduce overhead.
Deprecation note: since version 8 wildcard removing of indices is disabled by default. You can enable it for a cluster with setting action.destructive_requires_name to false.
RSpec.configure do |config| config.before(:suite) do Chewy.strategy(:bypass) endendElasticsearch client options
Section titled “Elasticsearch client options”All connection options, except the :prefix, are passed to the Elasticseach::Client.new (chewy/lib/chewy.rb):
Here’s the relevant Elasticsearch documentation on the subject: https://rubydoc.info/gems/elasticsearch-transport#setting-hosts
ActiveSupport::Notifications support
Section titled “ActiveSupport::Notifications support”Chewy has notifying the following events:
search_query.chewy payload
Section titled “search_query.chewy payload”payload[:index]: requested index classpayload[:request]: request hash
import_objects.chewy payload
Section titled “import_objects.chewy payload”-
payload[:index]: currently imported index name -
payload[:import]: imports stats, total imported and deleted objects count:{index: 30, delete: 5} -
payload[:errors]: might not exist. Contains grouped errors with objects ids list:{index: {'error 1 text' => ['1', '2', '3'],'error 2 text' => ['4']}, delete: {'delete error text' => ['10', '12']}}
NewRelic integration
Section titled “NewRelic integration”To integrate with NewRelic you may use the following example source (config/initializers/chewy.rb):
require 'new_relic/agent/instrumentation/evented_subscriber'
class ChewySubscriber < NewRelic::Agent::Instrumentation::EventedSubscriber def start(name, id, payload) event = ChewyEvent.new(name, Time.current, nil, id, payload) push_event(event) end
def finish(_name, id, _payload) pop_event(id).finish end
class ChewyEvent < NewRelic::Agent::Instrumentation::Event OPERATIONS = { 'import_objects.chewy' => 'import', 'search_query.chewy' => 'search', 'delete_query.chewy' => 'delete' }.freeze
def initialize(*args) super @segment = start_segment end
def start_segment segment = NewRelic::Agent::Transaction::DatastoreSegment.new product, operation, collection, host, port if (txn = state.current_transaction) segment.transaction = txn end segment.notice_sql @payload[:request].to_s segment.start segment end
def finish if (txn = state.current_transaction) txn.add_segment @segment end @segment.finish end
private
def state @state ||= NewRelic::Agent::TransactionState.tl_get end
def product 'Elasticsearch' end
def operation OPERATIONS[name] end
def collection payload.values_at(:type, :index) .reject { |value| value.try(:empty?) } .first .to_s end
def host Chewy.client.transport.hosts.first[:host] end
def port Chewy.client.transport.hosts.first[:port] end endend
ActiveSupport::Notifications.subscribe(/.chewy$/, ChewySubscriber.new)Search requests
Section titled “Search requests”Quick introduction.
Composing requests
Section titled “Composing requests”The request DSL have the same chainable nature as AR. The main class is Chewy::Search::Request.
CitiesIndex.query(match: {name: 'London'})Main methods of the request DSL are: query, filter and post_filter, it is possible to pass pure query hashes or use elasticsearch-dsl.
CitiesIndex .filter(term: {name: 'Bangkok'}) .query(match: {name: 'London'}) .query.not(range: {population: {gt: 1_000_000}})You can query a set of indexes at once:
CitiesIndex.indices(CountriesIndex).query(match: {name: 'Some'})See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html and https://github.com/elastic/elasticsearch-dsl-ruby for more details.
An important part of requests manipulation is merging. There are 4 methods to perform it: merge, and, or, not. See [Chewy::Search::QueryProxy]!(lib/chewy/search/query_proxy.rb) for details. Also, only and except methods help to remove unneeded parts of the request.
Every other request part is covered by a bunch of additional methods, see [Chewy::Search::Request]!(lib/chewy/search/request.rb) for details:
CitiesIndex.limit(10).offset(30).order(:name, {population: {order: :desc}})Request DSL also provides additional scope actions, like delete_all, exists?, count, pluck, etc.
Pagination
Section titled “Pagination”The request DSL supports pagination with Kaminari. An extension is enabled on initialization if Kaminari is available. See [Chewy::Search]!(lib/chewy/search.rb) and [Chewy::Search::Pagination::Kaminari]!(lib/chewy/search/pagination/kaminari.rb) for details.
Named scopes
Section titled “Named scopes”Chewy supports named scopes functionality. There is no specialized DSL for named scopes definition, it is simply about defining class methods.
See [Chewy::Search::Scoping]!(lib/chewy/search/scoping.rb) for details.
Scroll API
Section titled “Scroll API”ElasticSearch scroll API is utilized by a bunch of methods: scroll_batches, scroll_hits, scroll_wrappers and scroll_objects.
See [Chewy::Search::Scrolling]!(lib/chewy/search/scrolling.rb) for details.
Loading objects
Section titled “Loading objects”It is possible to load ORM/ODM source objects with the objects method. To provide additional loading options use load method:
CitiesIndex.load(scope: -> { active }).to_a # to_a returns `Chewy::Index` wrappers.CitiesIndex.load(scope: -> { active }).objects # An array of AR source objects.See [Chewy::Search::Loader]!(lib/chewy/search/loader.rb) for more details.
In case when it is necessary to iterate through both of the wrappers and objects simultaneously, object_hash method helps a lot:
scope = CitiesIndex.load(scope: -> { active })scope.each do |wrapper| scope.object_hash[wrapper]endRake tasks
Section titled “Rake tasks”For a Rails application, some index-maintaining rake tasks are defined.
chewy:reset
Section titled “chewy:reset”Performs zero-downtime reindexing as described here. So the rake task creates a new index with unique suffix and then simply aliases it to the common index name. The previous index is deleted afterwards (see Chewy::Index.reset! for more details).
rake chewy:reset # resets all the existing indicesrake chewy:reset[users] # resets UsersIndex onlyrake chewy:reset[users,cities] # resets UsersIndex and CitiesIndexrake chewy:reset[-users,cities] # resets every index in the application except specified oneschewy:upgrade
Section titled “chewy:upgrade”Performs reset exactly the same way as chewy:reset does, but only when the index specification (setting or mapping) was changed.
It works only when index specification is locked in Chewy::Stash::Specification index. The first run will reset all indexes and lock their specifications.
See [Chewy::Stash::Specification]!(lib/chewy/stash.rb) and [Chewy::Index::Specification]!(lib/chewy/index/specification.rb) for more details.
rake chewy:upgrade # upgrades all the existing indicesrake chewy:upgrade[users] # upgrades UsersIndex onlyrake chewy:upgrade[users,cities] # upgrades UsersIndex and CitiesIndexrake chewy:upgrade[-users,cities] # upgrades every index in the application except specified oneschewy:update
Section titled “chewy:update”It doesn’t create indexes, it simply imports everything to the existing ones and fails if the index was not created before.
rake chewy:update # updates all the existing indicesrake chewy:update[users] # updates UsersIndex onlyrake chewy:update[users,cities] # updates UsersIndex and CitiesIndexrake chewy:update[-users,cities] # updates every index in the application except UsersIndex and CitiesIndexchewy:sync
Section titled “chewy:sync”Provides a way to synchronize outdated indexes with the source quickly and without doing a full reset. By default field updated_at is used to find outdated records, but this could be customized by outdated_sync_field as described at [Chewy::Index::Syncer]!(lib/chewy/index/syncer.rb).
Arguments are similar to the ones taken by chewy:update task.
See [Chewy::Index::Syncer]!(lib/chewy/index/syncer.rb) for more details.
rake chewy:sync # synchronizes all the existing indicesrake chewy:sync[users] # synchronizes UsersIndex onlyrake chewy:sync[users,cities] # synchronizes UsersIndex and CitiesIndexrake chewy:sync[-users,cities] # synchronizes every index in the application except except UsersIndex and CitiesIndexchewy:deploy
Section titled “chewy:deploy”This rake task is especially useful during the production deploy. It is a combination of chewy:upgrade and chewy:sync and the latter is called only for the indexes that were not reset during the first stage.
It is not possible to specify any particular indexes for this task as it doesn’t make much sense.
Right now the approach is that if some data had been updated, but index definition was not changed (no changes satisfying the synchronization algorithm were done), it would be much faster to perform manual partial index update inside data migrations or even manually after the deploy.
Also, there is always full reset alternative with rake chewy:reset.
chewy:create_missing_indexes
Section titled “chewy:create_missing_indexes”This rake task creates newly defined indexes in ElasticSearch and skips existing ones. Useful for production-like environments.
Parallelizing rake tasks
Section titled “Parallelizing rake tasks”Every task described above has its own parallel version. Every parallel rake task takes the number for processes for execution as the first argument and the rest of the arguments are exactly the same as for the non-parallel task version.
https://github.com/grosser/parallel gem is required to use these tasks.
If the number of processes is not specified explicitly - parallel gem tries to automatically derive the number of processes to use.
rake chewy:parallel:resetrake chewy:parallel:upgrade[4]rake chewy:parallel:update[4,cities]rake chewy:parallel:sync[4,-users]rake chewy:parallel:deploy[4] # performs parallel upgrade and parallel sync afterwardschewy:journal
Section titled “chewy:journal”This namespace contains two tasks for the journal manipulations: chewy:journal:apply and chewy:journal:clean. Both are taking time as the first argument (optional for clean) and a list of indexes exactly as the tasks above. Time can be in any format parsable by ActiveSupport.
rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)"] # apply journaled changes for the past hourrake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)",users] # apply journaled changes for the past hour on UsersIndex onlyWhen the size of the journal becomes very large, the classical way of deletion would be obstructive and resource consuming. Fortunately, Chewy internally uses delete-by-query ES function which supports async execution with batching and throttling.
The available options, which can be set by ENV variables, are listed below:
WAIT_FOR_COMPLETION- a boolean flag. It controls async execution. It waits by default. When set tofalse(0,f,falseoroffin any case spelling is accepted asfalse), Elasticsearch performs some preflight checks, launches the request, and returns a task reference you can use to cancel the task or get its status.REQUESTS_PER_SECOND- float. The throttle for this request in sub-requests per second. No throttling is enforced by default.SCROLL_SIZE- integer. The number of documents to be deleted in single sub-request. The default batch size is 1000.
rake chewy:journal:clean WAIT_FOR_COMPLETION=false REQUESTS_PER_SECOND=10 SCROLL_SIZE=5000RSpec integration
Section titled “RSpec integration”Just add require 'chewy/rspec' to your spec_helper.rb and you will get additional features:
[update_index]!(lib/chewy/rspec/update_index.rb) helper
mock_elasticsearch_response helper to mock elasticsearch response
mock_elasticsearch_response_sources helper to mock elasticsearch response sources
build_query matcher to compare request and expected query (returns true/false)
To use mock_elasticsearch_response and mock_elasticsearch_response_sources helpers add include Chewy::Rspec::Helpers to your tests.
See [chewy/rspec/]!(lib/chewy/rspec/) for more details.
Minitest integration
Section titled “Minitest integration”Add require 'chewy/minitest' to your test_helper.rb, and then for tests which you’d like indexing test hooks, include Chewy::Minitest::Helpers.
Since you can set :bypass strategy for test suites and manually handle import for the index and manually flush test indices using Chewy.massacre. This will help reduce unnecessary ES requests
But if you require chewy to index/update model regularly in your test suite then you can specify :urgent strategy for documents indexing. Add Chewy.strategy(:urgent) to test_helper.rb.
Also, you can use additional helpers:
mock_elasticsearch_response to mock elasticsearch response
mock_elasticsearch_response_sources to mock elasticsearch response sources
assert_elasticsearch_query to compare request and expected query (returns true/false)
See [chewy/minitest/]!(lib/chewy/minitest/) for more details.
DatabaseCleaner
Section titled “DatabaseCleaner”If you use DatabaseCleaner in your tests with the transaction strategy, you may run into the problem that ActiveRecord’s models are not indexed automatically on save despite the fact that you set the callbacks to do this with the update_index method. The issue arises because chewy indices data on after_commit run as default, but all after_commit callbacks are not run with the DatabaseCleaner’s’ transaction strategy. You can solve this issue by changing the Chewy.use_after_commit_callbacks option. Just add the following initializer in your Rails application:
Chewy.use_after_commit_callbacks = !Rails.env.test?Pre-request Filter
Section titled “Pre-request Filter”Should you need to inspect the query prior to it being dispatched to ElasticSearch during any queries, you can use the before_es_request_filter. before_es_request_filter is a callable object, as demonstrated below:
Chewy.before_es_request_filter = -> (method_name, args, kw_args) { ... }While using the before_es_request_filter, please consider the following:
before_es_request_filteracts as a simple proxy before any request made via theElasticSearch::Client. The arguments passed to this filter include:method_name- The name of the method being called. Examples are search, count, bulk and etc.argsandkw_args- These are the positional arguments provided in the method call.
- The operation is synchronous, so avoid executing any heavy or time-consuming operations within the filter to prevent performance degradation.
- The return value of the proc is disregarded. This filter is intended for inspection or modification of the query rather than generating a response.
- Any exception raised inside the callback will propagate upward and halt the execution of the query. It is essential to handle potential errors adequately to ensure the stability of your search functionality.
Import scope clean-up behavior
Section titled “Import scope clean-up behavior”Whenever you set the import_scope for the index, in the case of ActiveRecord,
options for order, offset and limit will be removed. You can set the behavior of
chewy, before the clean-up itself.
The default behavior is a warning sent to the Chewy logger (:warn). Another more
restrictive option is raising an exception (:raise). Both options have a
negative impact on performance since verifying whether the code uses any of
these options requires building AREL query.
To avoid the loading time impact, you can ignore the check (:ignore) before
the clean-up.
Chewy.import_scope_cleanup_behavior = :ignoreContributing
Section titled “Contributing”- Fork it (http://github.com/toptal/chewy/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Implement your changes, cover it with specs and make sure old specs are passing
- Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
Use the following Rake tasks to control the Elasticsearch cluster while developing, if you prefer native Elasticsearch installation over the dockerized one:
rake elasticsearch:start # start Elasticsearch cluster on 9250 port for testsrake elasticsearch:stop # stop ElasticsearchCopyright
Section titled “Copyright”Copyright (c) 2013-2025 Toptal, LLC. See [LICENSE.txt]!(LICENSE.txt) for further details.