Chapter 20

Full-Text Search

Books, authors, and publishers are the foundation of Alexandria, our e-commerce website selling digital books.

If you imagine the home page of a similar website (like Amazon, for example), there is a mandatory feature to hook users: they need to be able to search for books using any word related to it. To make that possible (and easy) for the clients, we can’t let them try to filter everything themselves - that would be a waste of precious seconds during which we could lose them.

Instead, the server should offer a way to retrieve a list of entities that match a string of characters sent by the client. That’s exactly for that purpose that we are going to implement full-text search in Alexandria.

20.1. Setup

Before we can implement it, we need to change our database management system. SQLite is not good enough for what we want to do, and it’s not a good idea to have a different database in local and in production anyway.

Therefore, we’re going to switch to PostgreSQL.

20.1.1. Installing PostgreSQL

The first thing to do before we can start playing with it is to actually install PostgreSQL.

Mac OS X

To install it on Mac OS X just use the commands below, one after the other.

brew update
brew doctor
brew install postgresql

Linux

To install it on a Linux distribution, use your favorite package manager. For Debian-based distributions, the following commands should do.

sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

20.1.2. Configuring Our Application

Is PostgreSQL installed now? Great. Let’s configure our API to use it instead of SQLite.

In your Gemfile, replace sqlite3 with pg.

# Gemfile
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.5.0'

gem 'rails', '5.2.0'
gem 'pg'
gem 'puma', '~> 3.11'
gem 'bootsnap', '>= 1.1.0', require: false
gem 'carrierwave'
gem 'carrierwave-base64'

# Add the kaminari gem
gem 'kaminari'

group :development, :test do
  gem 'byebug', platforms: [:mri, :mingw, :x64_mingw]
  gem 'rspec-rails'
  gem 'factory_bot_rails'
end

group :development do
  gem 'listen', '>= 3.0.5', '< 3.2'
  gem 'spring'
  gem 'spring-watcher-listen', '~> 2.0.0'
end

group :test do
  gem 'shoulda-matchers'
  gem 'webmock'
  gem 'database_cleaner'
end

gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

Run bundle install to get it installed.

bundle install

Next, we need to update the database configuration stored in the config/database.yml file with the following. All we do in there is stop using SQLite and use PostgreSQL instead.

# config/database.yml
default: &default
  adapter: postgresql
  pool: 5
  timeout: 5000

development:
  <<: *default
  database: alexandria_development

test:
  <<: *default
  database: alexandria_test

production:
  <<: *default
  database: alexandria_production

To see if we have done everything correctly, run the following command to create the database and run the migrations.

rails db:create && rails db:migrate

Then run the tests.

rspec

Success (GREEN)

...

Finished in 3.18 seconds (files took 2.36 seconds to load)
189 examples, 0 failures

Alright, everything is in place. Let’s talk about pg_search.

20.1.3. pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full-text search. Well, that’s the gem we are going to use to implement full-text search in Alexandria.

First, let’s add the gem in our Gemfile.

# Gemfile
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.5.0'

gem 'rails', '5.2.0'
gem 'pg'
gem 'puma', '~> 3.11'
gem 'bootsnap', '>= 1.1.0', require: false
gem 'carrierwave'
gem 'carrierwave-base64'
gem 'pg_search'

# Add the kaminari gem
gem 'kaminari'

group :development, :test do
  gem 'byebug', platforms: [:mri, :mingw, :x64_mingw]
  gem 'rspec-rails'
  gem 'factory_bot_rails'
end

group :development do
  gem 'listen', '>= 3.0.5', '< 3.2'
  gem 'spring'
  gem 'spring-watcher-listen', '~> 2.0.0'
end

group :test do
  gem 'shoulda-matchers'
  gem 'webmock'
  gem 'database_cleaner'
end

gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

Followed by the usual bundle install.

bundle install

pg_search comes with a generator that will create a migration for us. Indeed, this gem will create a new table where it will copy the text of the other tables that we will specify.

rails g pg_search:migration:multisearch

Output

Running via Spring preloader in process 85176
      create  db/migrate/20160604162524_create_pg_search_documents.rb

The migration looks like this:

# db/migrate/TIMESTAMP_create_pg_search_documents.rb
class CreatePgSearchDocuments < ActiveRecord::Migration[5.2]
  def self.up
    say_with_time("Creating table for pg_search multisearch") do
      create_table :pg_search_documents do |t|
        t.text :content
        t.belongs_to :searchable, :polymorphic => true, :index => true
        t.timestamps null: false
      end
    end
  end

  def self.down
    say_with_time("Dropping table for pg_search multisearch") do
      drop_table :pg_search_documents
    end
  end
end

Run the migrations.

rails db:migrate && RAILS_ENV=test rails db:migrate

Output

== TIMESTAMP CreatePgSearchDocuments: migrating ==========================
-- Creating table for pg_search multisearch
-- create_table(:pg_search_documents, {})
   -> 0.0454s
   -> 0.0455s
== TIMESTAMP CreatePgSearchDocuments: migrated (0.0456s) =================

20.2. Configuring The Models

Now that pg_search is installed and that its search_documents table has been created, we can configure our models. We need to tell the gem which fields of that model can be used for full text search using the method below:

multisearchable against: [:field1, :field2, :field3]

Let’s update all our models, starting with the Book class. We just have to add the multisearchable line at the top with the fields we want to search for: title, subtitle and description.

# app/models/book.rb
class Book < ApplicationRecord
  include PgSearch
  multisearchable against: [:title, :subtitle, :description]

  belongs_to :publisher, required: false
  belongs_to :author

  validates :title, presence: true
  validates :released_on, presence: true
  validates :author, presence: true

  validates :isbn_10, presence: true, length: { is: 10 }, uniqueness: true
  validates :isbn_13, presence: true, length: { is: 13 }, uniqueness: true

  mount_base64_uploader :cover, CoverUploader
end

Same thing for Author with the given_name and family_name fields.

# app/models/author.rb
class Author < ApplicationRecord
  include PgSearch
  multisearchable against: [:given_name, :family_name]

  has_many :books

  validates :given_name, presence: true
  validates :family_name, presence: true
end

And once again with Publisher and the name field.

# app/models/publisher.rb
class Publisher < ApplicationRecord
  include PgSearch
  multisearchable against: [:name]

  has_many :books

  validates :name, presence: true
end

With this configuration, we will be able to find books based on the name of their publishers - neat!

20.3. Seeding Some Real Data

In the resources/ folder that came with Master Ruby Web APIs, you will find a file named seeds.rb. Copy or move the file into your API under db/. You can override the existing seeds.rb file.

This file contains a big set of authors, publishers and books - in this case, mostly science-fiction books related to Asimov.

Once you’ve copied the file, reset and seed your fresh PostgreSQL database with the new data by running this command.

rails db:reset

Now we have some data to play with - 997 books should have been created.

20.4. The Search Controller

The only thing missing now is a controller to handle client search requests. That’s exactly what we will implement the SearchController for.

Create the files for the search controller and its tests.

touch app/controllers/search_controller.rb \
      spec/requests/search_spec.rb

We also need a new route. This time, we are not going to use the Rails resources method. Instead, we’ll just define the /search/:text route like this:

get '/search/:text', to: 'search#index'

Add this line in the config/routes.rb file.

# config/routes.rb
Rails.application.routes.draw do
  scope :api do
    resources :books
    resources :authors
    resources :publishers

    get '/search/:text', to: 'search#index'
  end

  root to: 'books#index'
end

Let’s write some tests! Here, we’re going to test a search query for the word “ruby”. From that search, we expect to receive 200 OK and three entities that contain “ruby” somewhere in their data: “Ruby Under A Microscope”, “Ruby on Rails Tutorial” and “Sam Ruby.”

To ensure that we get those, we are going to test the returned representation by checking the values of the searchable_id and the searchable_type fields. Since only PgSearch::Document entities are returned from the search controller, we need to use those two fields to identify the related entity in the polymorphic relationship.

# spec/requests/search_spec.rb
require 'rails_helper'

RSpec.describe 'Search', type: :request do

  let(:ruby_microscope) { create(:ruby_microscope) }
  let(:rails_tutorial) { create(:ruby_on_rails_tutorial) }
  let(:agile_web_dev) { create(:agile_web_development) }
  let(:books) { [ruby_microscope, rails_tutorial, agile_web_dev] }

  describe 'GET /api/search/:text' do
    before do
      books
    end

    context 'with text = ruby' do
      before { get '/api/search/ruby' }

      it 'gets HTTP status 200' do
        expect(response.status).to eq 200
      end

      it 'receives a "ruby_microscope" document' do
        expect(json_body['data'][0]['searchable_id']).to eq ruby_microscope.id
        expect(json_body['data'][0]['searchable_type']).to eq 'Book'
      end

      it 'receives a "rails_tutorial" document' do
        expect(json_body['data'][1]['searchable_id']).to eq rails_tutorial.id
        expect(json_body['data'][1]['searchable_type']).to eq 'Book'
      end

      it 'receives a "sam ruby" document' do
        expect(json_body['data'][2]['searchable_id']).to eq agile_web_dev.author.id
        expect(json_body['data'][2]['searchable_type']).to eq 'Author'
      end
    end
  end
end

Run the tests.

rspec spec/requests/search_spec.rb

Failure (RED)

...

Finished in 0.43223 seconds (files took 2.29 seconds to load)
4 examples, 4 failures

...

Alright, let’s fix them. First, since the search controller is going to return documents, we need a new presenter for them.

Create a new folder in the app/presenters folder, named pg_search, and add the document_presenter.rb file inside.

mkdir app/presenters/pg_search && \
  touch app/presenters/pg_search/document_presenter.rb

Since the Document model is namespaced, we need to add an intermediate folder to represent that module. We could also create it at the same level as our other presenters, but it would require changing the code of our query builders and serializers since we didn’t take modules into account.

Thanks to our preparation, creating the document presenter is super easy and it’s just going to work without changing anything, even though this model is handled by an external gem!

Here is the code for this new presenter.

# app/presenters/pg_search/document_presenter.rb
module PgSearch
  class DocumentPresenter < BasePresenter
    related_to    :searchable
    build_with    :content, :searchable_id, :searchable_type
  end
end

The search controller is similar to our other controllers. We use the QueryOrchestrator and the default Serializer without changing anything. The only difference is the use of the PgSearch search method to get a scope containing all the records that match our text search.

To avoid multiple queries to get each searchable, we only included .includes(:searchable) to eager load those polymorphic relationships.

# app/controllers/search_controller.rb
class SearchController < ApplicationController

  def index
    @text = params[:text]
    scope = PgSearch.multisearch(@text).includes(:searchable)
    documents = orchestrate_query(scope)
    render serialize(documents).merge(status: :ok)
  end

end

Our tests should now be working properly.

rspec spec/requests/search_spec.rb

Success (GREEN)

...

Search
  GET /api/search/:text
    with text = ruby
      gets HTTP status 200
      receives a 'ruby_microscope' document
      receives a 'rails_tutorial' document
      receives a 'sam ruby' document

Finished in 0.56063 seconds (files took 2.17 seconds to load)
4 examples, 0 failures

Now, take a look at the representation in your browser by starting the server and accessing /search/ruby.

rails s

You should see the same three entities shown in Figure 1.

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/20/01
Figure 1

I feel like this representation is not good enough. With what we gave, the client will need to build the URL itself by using searchable_type and searchable_id. That’s pretty annoying, especially since it would be easy for us to add an hypermedia link in the representation.

So let’s do it! After all, it will make the lives of the developers using our API much easier.

First, we have to add a custom method to the DocumentPresenter class: resource_url. In this method, we will just use the polymorphic_url method provided by Rails and give it the searchable object defined on the PgSearch::Document model. We can use the resource_url method thanks to the Rails.application.routes.url_helpers module we included in BasePresenter in the previous chapter.

# app/presenters/pg_search/document_presenter.rb
module PgSearch
  class DocumentPresenter < BasePresenter
    related_to    :searchable
    build_with    :content, :searchable_id, :searchable_type, :resource_url

    def resource_url
      polymorphic_url(object.searchable)
    end
  end
end

Let’s see if our tests still pass.

rspec spec/requests/search_spec.rb

Success (GREEN)

...

Search
  GET /api/search/:text
    with text = ruby
      gets HTTP status 200
      receives a 'ruby_microscope' document
      receives a 'rails_tutorial' document
      receives a 'sam ruby' document

Finished in 0.52187 seconds (files took 2.21 seconds to load)
4 examples, 0 failures

More importantly, let’s take a look at our representation in a browser.

The server should still be running - if not, restart it.

Checkout /search/ruby one more time and you should now have the complete URL of each resource as shown in Figure 2.

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/20/02
Figure 2

The full-text search is ready! Go play with it and try to search for different things. All the query and representation builders are working with it so you can use pagination, field picking and so on!

You will see that search results will automatically be paginated. You can navigate the different pages when searching for asimov by using the parameters we created in the previous chapters.

Pull up the second page of results with /search/asimov?per=20&page=2, and you will get a list of books written by Isaac Asimov as shown in Figure 3.

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/20/03
Figure 3

20.5. Pushing Our Changes

It’s now time to push our changes to GitHub. You probably know the drill by now, so try to do it by yourself.

rspec

Success (GREEN)

...

Finished in 7.62 seconds (files took 2.27 seconds to load)
193 examples, 0 failures

Here is the list of steps to push the code.

Check the changes.

git status

Output

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   Gemfile
	modified:   Gemfile.lock
	modified:   app/models/author.rb
	modified:   app/models/book.rb
	modified:   app/models/publisher.rb
	modified:   config/database.yml
	modified:   config/routes.rb
	modified:   db/schema.rb
	modified:   db/seeds.rb

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	app/controllers/search_controller.rb
	app/presenters/pg_search/
	db/migrate/20160606131830_create_pg_search_documents.rb
	spec/requests/search_spec.rb

no changes added to commit (use "git add" and/or "git commit -a")

Stage them.

git add .

Commit the changes.

git commit -m "Implement Full Text Search"

Output

[master 172904b] Implement Full Text Search
 13 files changed, 1673 insertions(+), 35 deletions(-)
 create mode 100644 app/controllers/search_controller.rb
 create mode 100644 app/presenters/pg_search/document_presenter.rb
 rewrite config/database.yml (77%)
 create mode 100644 db/migrate/20160606131830_create_pg_search_documents.rb
 create mode 100644 spec/requests/search_spec.rb

Push to GitHub.

git push origin master

20.6. Wrap Up

In this “short” chapter, we implemented a full-text search feature in Alexandria. Thanks to the pg_search gem, it was pretty easy! Now, the client implementations will be able to have a cool search bar to look for books!