Chapter 8

Caching

Caching is extremely important, not only for your users but also to reduce the cost of running your applications. The more users you have, the more important caching becomes. Caching could be the difference between having an application crashing every 10 minutes and having everything running smoothly, even with 1 million users.

Caching is nothing new - you’ve probably used some caching mechanisms before. It usually means storing something in a cache, a simple key/value database, in order to re-access it faster. In software engineering, any value that is difficult and expensive to compute or retrieve should be cached. The only things that cannot be cached are the ones changing too frequently which makes storing a temporary version useless and a bad practice.

There are only two hard things in Computer Science: cache invalidation and naming things.

—Phil Karlton

Once you’ve stored something in a cache, how do you know it’s not outdated? That’s when cache invalidation comes into play, which, as pointed out by Phil, is not a simple thing.

Luckily, for web development, HTTP comes with everything you need to allow clients to cache responses and avoid transferring the same data over and over again. This type of caching obviously happens on the client, but is made possible by the server and its configuration.

Stuff can also be cached on the server. As you know, we will be generating various representations for our resources: JSON documents, HTML pages and so on. We can cache those static outputs in order to make future requests faster. Some representations are more complex to generate and caching them can shred hundreds of milliseconds off a request.

Key takeaways about caching:

  • It can save a tremendous amount of time for the client and, by extension, the end-user (usually a human who does not like to wait).
  • By reducing the time required to process requests, your application needs less power to run, which ends up being less money spent on servers.
  • Caching allows an application to scale more easily, especially if the requests are mostly about retrieving data.
  • Unfortunately, not everything can be cached. Some real-time data needs to be fetched every time. The rest can be cached for a specific amount of time, from a few seconds to a day, depending on your judgement of how often the data will change.

8.1. Client Caching

The smart people behind the HTTP protocol took a very pragmatic approach to create a useful protocol for web developers. They included everything needed to handle client caching. By defining a few settings on the server, any client will be capable of getting data from the cache instead of requesting it from your server.

The goals of caching in HTTP are to eliminate the need to send requests as much as possible, and if requests have to be made, to reduce the need of sending data in the responses. The first goal can be achieved by using an expiration mechanism, known as Cache-Control, and the second one with a validation mechanism like ETag or Last-Modified.

8.1.1. Preventing Requests Entirely

The fastest way to send an HTTP request is to not send it at all.

The Cache-Control header can be used to define the caching policy for a resource. More specifically, it defines who can cache it, how, and for how long. Here are a few examples:

Cache-Control: max-age=3600
Cache-Control: no-cache
Cache-Control: private, max-age=86400

Let’s explore what these directives mean.

max-age

You may have seen this before as it is widely used for assets like stylesheets, JavaScript files and images. The max-age directive defines, in seconds, how long the resource can be cached. This is not only for browsers, but also for intermediate caches along the way.

For example, Cache-Control: max-age=3600 means that the received response can be cached for 1 hour. In other words, for the next hour, a browser (for example) can simply reuse the cached response instead of making another HTTP request, effectively saving time and bandwidth.

Box 8.1. The max-age Directive In The Wild

If you’ve used Rails before, you know that compiled assets include an MD5 fingerprint. This fingerprint used to invalidate the cached version of the same file. Since Rails puts high max-age values on assets, this is the only way to ensure that clients have the latest version after a deployment.

This method has limited use for web APIs, but could prove useful for very specific scenarios where representations would not change for long periods of time.

The expiration time for a response can also be specified using the Expires header. Note that if both the Expires header and the max-age directives are present in the response, the max-age directive takes precedence over Expires.

public & private

The public and private directives define who can cache the response. public means that anyone and anything can cache it, but is often unnecessary since max-age already sets the response as cacheable. private, on the other hand, defines a response as only being cacheable by a browser, not by intermediaries such as CDNs (Content Delivery Networks).

The following example means that the browser, and nothing else, can cache the response for 60 seconds:

Cache-Control: private, max-age=60

The example below means that anything, browser or CDN, can cache the response for 1 day:

Cache-Control: public, max-age=86400

no-cache & no-store

These two directives are simpler. no-store simply means that nothing can cache the response, be it a browser or an intermediary. This should be used for sensitive data that should never be cached.

no-cache actually means that the response can be cached but cannot be re-used without first checking with the server. To avoid re-transmitting the whole response, no-cache can be combined with the ETag header to check if the response has changed or not.

Other Cache-Control directives

Directives Sent in Requests By The Client
  • max-stale

This directive indicates that the client agrees to receive a response that has expired for not more than the specified amount of seconds. Cache-Control: max-stale=60 means that the client is agreeing to receive a response that has expired for less than 60 seconds.

  • min-fresh

This directive is used to tell the server that the client will only accept a response that will stay fresh (i.e. the latest version) for the specified amount of seconds. Cache-Control: min-fresh=60 means that the client expects the following response to be fresh for at least one minute.

  • no-transform

Some proxies alter the requests in order to cache them more efficiently; for example, by changing the media type of the request body. This directive prevents an intermediate cache from changing anything in the request.

  • only-if-cached

This directive can be added to get the response only if it’s currently cached. A cache receiving this kind of query should either return the cached response or 504 Gateway Timeout.

Directives sent in response by the server
  • no-transform

Same behavior as defined above.

  • must-revalidate

With the must-revalidate directive, the server can specify that the client cache should always revalidate a request if it has become stale.

  • proxy-revalidate

This directive works like must-revalidate and requires any proxy to re-validate requests while the client cache does not have to.

  • s-maxage

This directive is used for shared caches. It can be set to override the expiration date of either the Expires header or the max-age directive.

8.1.2. Prevent Data Transfer

We have just seen how requests can be prevented by using the Cache-Control header. Unfortunately, in most web APIs, this is rarely possible. But if a request has to be made anyway, we can save time and bandwidth by using the ETag header. ETag, which stands for Entity Tag, is meant to hold a validation token identifying a specific version of a response.

The first time a client requests a resource (/users/thibault, for example), the server includes the current token (let’s say 123) in the ETag header.

Request

GET /users/thibault HTTP/1.1
Host: localhost:4567

Response

HTTP/1.1 200 OK
Content-Length: 2048
ETag: "123"

[DATA]

The next time the client sends a request to /users/thibault, it includes the ETag token it received in the If-None-Match header.

When the request is received, the server checks the value of the If-None-Match header and compares it with the current ETag. If the client’s ETag and the server’s ETag match, the representation hasn’t changed and the server can return 304 Not Modified with no body.

However, if the values are different, the representation has changed. The server should return 200 OK with the new representation and the new validation token.

Request

GET /users/thibault HTTP/1.1
Host: localhost:4567
If-None-Match: "123"

Response if the representation didn’t change:

HTTP/1.1 304 Not Modified
ETag: "123"

Response if the representation changed:

HTTP/1.1 200 OK
Content-Length: 2048
ETag: "124"

[DATA]

Any kind of value will do as a validation token. It can be a hash of the representation (MD5, for example), the updated_at attribute of the entity or an internal version number that you update in the server every time data changes. Hashing is simpler but takes more computation time and, as we will see later, works more effectively with server caching.

If you choose to use a timestamp, there is a better header for you: Last-Modified. Associated with the request header If-Modified-Since, it follows the same logic as ETag and the server will return 304 Not Modified if the requested variant has not changed.

Last-Modified: Sat, 28 Apr 1990 02:00:00 GMT

8.1.3. Pre-Condition

The two headers we just discovered, If-None-Match and If-Modified-Since, are used to make conditional HTTP requests.

There are 5 headers which can be used to create conditional HTTP requests. The idea behind this type of request is to allow the client to tell the server that, if the condition is not met, the request should fail and the server should return 412 Precondition Failed. We are not really going to dive into each one of them since we will only be using If-None-Match in this book, but here are quick descriptions.

Some of the descriptions come from the RFC of the HTTP protocol.

  • If-Match: Opposite of If-None-Match, this header indicates that the client wants the request to be performed only if the entity tag matches a current entity on the server. If this is not the case, the server should respond with 412 Precondition Failed.
  • If-Modified-Since: If the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 Not Modified response will be returned without any body.
  • If-None-Match: We have already seen how this header works.
  • If-Range: If a client has a partial copy of an entity in its cache, and wishes to have an up-to-date copy of the entire entity in its cache, it could use the Range request-header with a conditional GET (using either or both of If-Unmodified-Since and If-Match). However, if the condition fails because the entity has been modified, the client would have to make a second request to obtain the entire current entity-body.
  • If-Unmodified-Since: If the requested resource has not been modified since the time specified in this field, the server should perform the requested operation as if the If-Unmodified-Since header were not present.

Conditional requests are powerful tools for developers who understand them. Don’t hesitate to dive deeper into the subject.

8.1.4. How Do Caches Store Different Representations of The Same Resource?

Let’s say we are making the following request to a server.

Request

GET /users HTTP/1.1
Host: example.com
Accept: application/json

Response

GET /users HTTP/1.1
Host: example.com
Cache-Control: max-age=86400
Content-Type: application/json
Content-Length: 128

Everything goes well and the response has been cached. Then we make the following request, asking for another representation.

Request

GET /users HTTP/1.1
Host: example.com
Accept: application/xml

Response

GET /users HTTP/1.1
Host: example.com
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Length: 1024

If caches only used a combination of the HTTP method and the URI as a key for the cached response, the second request would still give us the JSON document. Same method, same URI.

Luckily, cache systems also use a secondary cache key, and, by using the Vary header, we can change how this secondary key is built. Vary lets us set exactly which headers impact the body of the request and should be part of the caching key.

Let’s re-send our two requests.

Request (application/json)

GET /users HTTP/1.1
Host: example.com
Accept: application/json

Response

HTTP/1.1 200 OK
Vary: Accept
Cache-Control: max-age=86400
Content-Type: application/json
Content-Length: 128

Request (application/xml)

GET /users HTTP/1.1
Host: example.com
Accept: application/xml

Response

HTTP/1.1 200 OK
Vary: Accept
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Length: 1024

A cache would understand that these two requests are different. They will both be stored under the same primary key GET /users (or whatever the format for primary keys is for that specific cache). The first request will have the secondary key application/json, and the second one application/xml.

There are other headers that can affect the format or content of the response body. Headers like Accept-Language or Accept-Encoding would also affect HTTP responses in a way that requires them to be stored as different responses.

Request

GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Accept-Language: fr
Accept-Encoding: gzip

Response

HTTP/1.1 200 OK
Vary: Accept, Accept-Language, Accept-Encoding
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Language: fr
Content-Encoding: gzip
Content-Length: 1024

For this response, the secondary cache key would be:

application/xml:fr:gzip

To be honest, the key generation is not very smart. It gets worse when we change the value of one of the Accept- header. Let’s add deflate to the Accept-Encoding field.

Request

GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Accept-Language: fr
Accept-Encoding: gzip,deflate

Response

HTTP/1.1 200 OK
Vary: Accept, Accept-Language, Accept-Encoding
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Language: fr
Content-Encoding: gzip
Content-Length: 1024

The response is the same so we should be able to just get it from the cache, right? Nope.

Since Accept-Encoding has changed, the secondary cache key is now application/xml:fr:gzip,deflate, which is different from the previous one. There is also no normalization, which means different cases and whitespace characters would end up creating more entries in the cache, even though the responses are pretty much the same.

Vary is great since it gives us a way to identify responses. It’s pretty sad that the algorithm behind it is not smarter. There is actually a draft being created to add a new HTTP header, Key, that will allow a server to describe the cache key of a response.

8.1.5. Invalidating Cache

Cache invalidation is hard. That’s why most people don’t do it and just create new and different entries in the cache. As I said before, this is how Rails handles assets, using an MD5 fingerprint in the URI to keep each asset file unique.

A problem arises for the representations of our resources: How should we cache them? Do we need to add a fingerprint every time we make a change?

Well, you could, but it’s not necessary. If you plan your caching strategy adequately, you can get away by simply using the ETag header. Since it is a specific version of a representation, you can know exactly when the client and server are not in sync. Unfortunately, this still requires making HTTP calls, but at least there is no data being transferred.

Using Cache-Control and max-age is the best way to save time, but is much harder to use with web APIs. If you set a max-age of 1 day, you need to ensure that any update sent to the server will reset the expiration date and the cached data. This requires coordination between client and server and can be very error-prone.

Caching is all about fine-tuning for your needs. Some applications cannot work with some delay (e.g. banking apps), while others will be fine if data is updated every hour (e.g. currency conversion app). It’s up to you to decide how small or big the expiration time of your representations should be.

8.1.6. Implementation

Now it’s time to implement everything we’ve just learned. The good news is that Sinatra has a good support of HTTP caching.

The first thing we are going to do is set a value for the ETag header using the following code:

etag Digest::SHA1.hexdigest(users.to_s)

Generated SHA1 Hash

d2cf37f13ac29d46e89e5b1771d134a8dd5399b6

We are using the Sinatra etag method, giving it a SHA1 digest of the users hash (after converting it to a string). Sinatra will now automatically check if the client has sent the ETag header. If the tokens match, it will halt the execution and return 304 Not Modified.

If we change anything in the users hash, the SHA1 hash won’t match anymore and Sinatra will let the route continue its execution instead of halting. The client will then receive 200 OK and the new updated representation.

For example, if we change John to Johnny, the SHA1 hash becomes:

70c69492e91b298235a37340ea0a2b13627b5e83

We will also set the Cache-Control header with the max-age directive. We don’t want the client to have outdated values for the users list, but we consider that caching it for 1 minute on the client is fine. After all, we are not building a highly interactive API between multiple users. We can use the Sinatra method cache_control to define it.

Note that in case of a change sent by the client (POST, PUT, etc), the client should invalidate its cache of users.

It all comes together in the simple API you can see below. Create the folder module_01/chapter_08 and the file webapi_caching.rb inside for the code below:

# webapi_caching.rb
require 'sinatra'
require 'json'
require 'digest/sha1'

users = {
  thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
  simon:    { first_name: 'Simon', last_name: 'Random', age: 26 },
  john:     { first_name: 'John', last_name: 'Smith', age: 28 }
}

before do
  content_type 'application/json'
  cache_control max_age: 60
end

get '/users' do
  etag Digest::SHA1.hexdigest(users.to_s)
  users.map { |name, data| data }.to_json
end

Sadly, we cannot test the cache using curl. Instead, we are going to test using something that almost everyone has: a web browser. All browsers have HTTP caching implemented, some better than others, and we will use that.

Start the server with ruby webapi_caching.rb and access http://localhost:4567/users for the first time. I recommend using the Incognito mode of Chrome to avoid unwanted information in the logs. See Figure 1 for the returned JSON document and Figure 2 for the request result.

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/02/1_08_1
Figure 1
https://s3.amazonaws.com/devblast-mrwa-book/images/figures/02/1_08_2
Figure 2

We can see that our API returned 200 OK and set the validation token correctly in the ETag header.

If we make the same request again, the browser will now get the response from the cache, at least for the next 60 seconds, as you can see in Figure 3. Look at the status code: 200 OK (from cache).

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/02/1_08_3
Figure 3

Awesome, it’s working. Now let’s see if the validation token prevents data transfer by waiting one minute and sending the request again. See Figure 4; we are getting 304 Not Modified back, great! No data was transferred (see how the Content-Length header is not present).

https://s3.amazonaws.com/devblast-mrwa-book/images/figures/02/1_08_4
Figure 4

We have set up some pretty simple HTTP caching in our API. We are not going to dive deeper into it for now.

8.2. Server Caching

We have just seen how to use HTTP caching to allow a client to cache responses and the server to send data only if something has changed. We can go one step further on the server by caching the representations to boost the requests that have not been HTTP cached yet.

Building representations can be costly for a server and you want to be able to cache as many as possible. Of course, not everything can be cached. Let’s look at an example.

Let’s say we have users and books. Each user can have a list of books, like a reading list. These entities are stored inside an SQL database.

We also have the books resource for a specific user available at /users/1/books. From there, the following representation is returned:

{
  "books": [
    { "title": "Something", "ISBN": "123" },
    { "title": "Something Else", "ISBN": "456" },
    { ... },
    { ... },
  ]
}

To create this representation, we had to make at least two SQL queries, one to get the user with id “1” and one to get all the books that belong to this user. It might not take long to do so if we deal with one user, but what if we had one million?

The thing with our fake API is that users don’t frequently update their reading list: the writing ratio is much lower than the reading ratio.

What if we decided to generate the representation as soon as a change was made?

  1. A user sends a POST (or LINK) request to add a book
  2. The server receives the request and links the book with the user.
  3. Right after, the server re-generates the books list as a JSON document and caches it.
  4. The user requests its reading list and gets it super fast because the server can just get it from the cache.

This approach is nice, but it has a problem. There is often more than one representation, and you don’t want to have to cache them all before they are accessed. What we can do instead is use lazy-loading. Any time a GET request is received by the server, it looks in the cache to “see” if the representation is there or not. If it’s there, the server simply sends it back. If it’s not there, the server first generates it, stores it and then sends it back. The keys for the cache entries can be a combination of identifiers, media types, last update timestamp and language.

It’s pretty similar to simply putting an HTTP cache in front of your API. The big difference, however, is that you are pretty much in control of the whole thing. For example, you can do caching using the Russian Doll strategy, as seen in Ruby on Rails.

This approach associated with a real HTTP cache on your server and the right headers for caching on the client can make your APIs blazing fast.

Unfortunately, caching strategies work best for applications with more reading than writing. If your application just spends its time writing to the database without querying much, caching will be much less effective.

The example we took in the code sample above was trivial, but in real applications, some representations can be very costly to build. Think about the representation for a financial quote for example. There are many computations required, but once the quote is done, it doesn’t change much. This is a perfect example for caching.

Caching is awesome if your scenario allows it.

8.2.1. Implementation

Let’s create a simple implementation of server-side caching. Since we don’t store updated_at attributes for our users, we cannot use the same strategy as Rails. Instead, we are just going to use a revision number.

Our users hash now looks like this:

users = {
  revision: 1,
  list: {
    thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
    simon:    { first_name: 'Simon', last_name: 'Random', age: 26 },
    john:     { first_name: 'John', last_name: 'Smith', age: 28 }
  }
}

The revision will be updated anytime a change is made to this hash. I’ve decided to define it for the entire list of users, but for a real application, it would make more sense to keep the revision for each user instead.

Next, we need a way to cache data. We don’t have memcache or redis installed, so we are just going to use a Ruby hash.

cached_data = {}

We also need a method that will cache and/or retrieve data.

def cache_and_return(cached_data, key, &block)
  cached_data[key] ||= block.call
  cached_data[key]
end

Finally, for fun, we will generate a JSON document with 3000 users with the following code:

(1..1000).each_with_object([]) do |i, array|
  users[:list].each do |name, data|
    array << data
  end
end.to_json

Let’s see how it all comes together.

Create a new file named webapi_server_caching.rb in the chapter_08 folder and put the following code in it:

# webapi_server_caching.rb
require 'sinatra'
require 'json'
require 'digest/sha1'

users = {
  revision: 1,
  list: {
    thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
    simon:    { first_name: 'Simon', last_name: 'Random', age: 26 },
    john:     { first_name: 'John', last_name: 'Smith', age: 28 }
  }
}
cached_data = {}

helpers do
  def cache_and_return(cached_data, key, &block)
    cached_data[key] ||= block.call
    cached_data[key]
  end
end

before do
  content_type 'application/json'
end

get '/users' do
  key = "users:#{users[:revision]}"

  cache_and_return(cached_data, key) do
    (1..1000).each_with_object([]) do |i, array|
      users[:list].each do |name, data|
        array << data
      end
    end.to_json
  end
end

put '/users/:first_name' do |first_name|
  user = JSON.parse(request.body.read)
  existing = users[:list][first_name.to_sym]
  users[:list][first_name.to_sym] = user
  users[:revision] += 1
  status existing ? 204 : 201
end

See how we use users:#{users[:revision]} as the cache key for our representation. If we supported more media types, we would probably need to use users:#{users[:revision]}:#{media_type} instead. Using this type of key means we don’t need to worry about invalidating data. Once the revision is updated, we forget we even had the previous revision cached and instead generate and store the new revision with an updated key.

The PUT route was also included to show you that we need to manually update the revision with users[:revision] += 1 to keep our cached data up to date.

Start the sever so we can run some curl requests.

ruby webapi_server_caching.rb

Let’s use the following curl request to see how much time we can save by caching our huge representation:

curl -i -s -w "\n%{time_total}s \n" -o /dev/null http://localhost:4567/users

There are new options in this one, so let’s take a quick look at each one of them.

  • -s: Quiet Mode. Don’t show progress or error messages.
  • -w: Set what to display after a successful request. In this request, we display the time (in seconds) that the request took.
  • -o: Write output to the specified location. In the following case, we discard the output by sending it to /dev/null.

Run it a first time.

0.039s

And a second time.

0.008s

Wow, nice improvement! Almost 5 times faster.

8.3. Wrap Up

Although caching is a powerful tool, it should be used with caution. It can easily create bugs if you implement it incorrectly. For example, users could get outdated data or data that doesn’t match what they asked for (different languages, etc).

With great power comes great responsibility.