Caching is extremely important, not only for your users but also to reduce the cost of running your applications. The more users you have, the more important caching becomes. Caching could be the difference between having an application crashing every 10 minutes and having everything running smoothly, even with 1 million users.
Caching is nothing new - you’ve probably used some caching mechanisms before. It usually means storing something in a cache, a simple key/value database, in order to re-access it faster. In software engineering, any value that is difficult and expensive to compute or retrieve should be cached. The only things that cannot be cached are the ones changing too frequently which makes storing a temporary version useless and a bad practice.
There are only two hard things in Computer Science: cache invalidation and naming things.
—Phil Karlton
Once you’ve stored something in a cache, how do you know it’s not outdated? That’s when cache invalidation comes into play, which, as pointed out by Phil, is not a simple thing.
Luckily, for web development, HTTP comes with everything you need to allow clients to cache responses and avoid transferring the same data over and over again. This type of caching obviously happens on the client, but is made possible by the server and its configuration.
Stuff can also be cached on the server. As you know, we will be generating various representations for our resources: JSON documents, HTML pages and so on. We can cache those static outputs in order to make future requests faster. Some representations are more complex to generate and caching them can shred hundreds of milliseconds off a request.
Key takeaways about caching:
The smart people behind the HTTP protocol took a very pragmatic approach to create a useful protocol for web developers. They included everything needed to handle client caching. By defining a few settings on the server, any client will be capable of getting data from the cache instead of requesting it from your server.
The goals of caching in HTTP are to eliminate the need to send requests as much as possible, and if requests have to be made, to reduce the need of sending data in the responses. The first goal can be achieved by using an expiration mechanism, known as Cache-Control
, and the second one with a validation mechanism like ETag
or Last-Modified
.
The fastest way to send an HTTP request is to not send it at all.
The Cache-Control
header can be used to define the caching policy for a resource. More specifically, it defines who can cache it, how, and for how long. Here are a few examples:
Cache-Control: max-age=3600
Cache-Control: no-cache
Cache-Control: private, max-age=86400
Let’s explore what these directives mean.
max-age
You may have seen this before as it is widely used for assets like stylesheets, JavaScript files and images. The max-age
directive defines, in seconds, how long the resource can be cached. This is not only for browsers, but also for intermediate caches along the way.
For example, Cache-Control: max-age=3600
means that the received response can be cached for 1 hour. In other words, for the next hour, a browser (for example) can simply reuse the cached response instead of making another HTTP request, effectively saving time and bandwidth.
If you’ve used Rails before, you know that compiled assets include an MD5 fingerprint. This fingerprint used to invalidate the cached version of the same file. Since Rails puts high max-age
values on assets, this is the only way to ensure that clients have the latest version after a deployment.
This method has limited use for web APIs, but could prove useful for very specific scenarios where representations would not change for long periods of time.
The expiration time for a response can also be specified using the Expires
header. Note that if both the Expires
header and the max-age
directives are present in the response, the max-age
directive takes precedence over Expires
.
public
& private
The public
and private
directives define who can cache the response. public
means that anyone and anything can cache it, but is often unnecessary since max-age
already sets the response as cacheable. private
, on the other hand, defines a response as only being cacheable by a browser, not by intermediaries such as CDNs (Content Delivery Networks).
The following example means that the browser, and nothing else, can cache the response for 60 seconds:
Cache-Control: private, max-age=60
The example below means that anything, browser or CDN, can cache the response for 1 day:
Cache-Control: public, max-age=86400
no-cache
& no-store
These two directives are simpler. no-store
simply means that nothing can cache the response, be it a browser or an intermediary. This should be used for sensitive data that should never be cached.
no-cache
actually means that the response can be cached but cannot be re-used without first checking with the server. To avoid re-transmitting the whole response, no-cache
can be combined with the ETag
header to check if the response has changed or not.
max-stale
This directive indicates that the client agrees to receive a response that has expired for not more than the specified amount of seconds. Cache-Control: max-stale=60
means that the client is agreeing to receive a response that has expired for less than 60 seconds.
min-fresh
This directive is used to tell the server that the client will only accept a response that will stay fresh (i.e. the latest version) for the specified amount of seconds. Cache-Control: min-fresh=60
means that the client expects the following response to be fresh for at least one minute.
no-transform
Some proxies alter the requests in order to cache them more efficiently; for example, by changing the media type of the request body. This directive prevents an intermediate cache from changing anything in the request.
only-if-cached
This directive can be added to get the response only if it’s currently cached. A cache receiving this kind of query should either return the cached response or 504 Gateway Timeout
.
no-transform
Same behavior as defined above.
must-revalidate
With the must-revalidate
directive, the server can specify that the client cache should always revalidate a request if it has become stale.
proxy-revalidate
This directive works like must-revalidate
and requires any proxy to re-validate requests while the client cache does not have to.
s-maxage
This directive is used for shared caches. It can be set to override the expiration date of either the Expires
header or the max-age
directive.
We have just seen how requests can be prevented by using the Cache-Control
header. Unfortunately, in most web APIs, this is rarely possible. But if a request has to be made anyway, we can save time and bandwidth by using the ETag
header. ETag
, which stands for Entity Tag, is meant to hold a validation token identifying a specific version of a response.
The first time a client requests a resource (/users/thibault
, for example), the server includes the current token (let’s say 123
) in the ETag
header.
Request
GET /users/thibault HTTP/1.1
Host: localhost:4567
Response
HTTP/1.1 200 OK
Content-Length: 2048
ETag: "123"
[DATA]
The next time the client sends a request to /users/thibault
, it includes the ETag
token it received in the If-None-Match
header.
When the request is received, the server checks the value of the If-None-Match
header and compares it with the current ETag
. If the client’s ETag
and the server’s ETag
match, the representation hasn’t changed and the server can return 304 Not Modified
with no body.
However, if the values are different, the representation has changed. The server should return 200 OK
with the new representation and the new validation token.
Request
GET /users/thibault HTTP/1.1
Host: localhost:4567
If-None-Match: "123"
Response if the representation didn’t change:
HTTP/1.1 304 Not Modified
ETag: "123"
Response if the representation changed:
HTTP/1.1 200 OK
Content-Length: 2048
ETag: "124"
[DATA]
Any kind of value will do as a validation token. It can be a hash of the representation (MD5, for example), the updated_at
attribute of the entity or an internal version number that you update in the server every time data changes. Hashing is simpler but takes more computation time and, as we will see later, works more effectively with server caching.
If you choose to use a timestamp, there is a better header for you: Last-Modified
. Associated with the request header If-Modified-Since
, it follows the same logic as ETag
and the server will return 304 Not Modified
if the requested variant has not changed.
Last-Modified: Sat, 28 Apr 1990 02:00:00 GMT
The two headers we just discovered, If-None-Match
and If-Modified-Since
, are used to make conditional HTTP requests.
There are 5 headers which can be used to create conditional HTTP requests. The idea behind this type of request is to allow the client to tell the server that, if the condition is not met, the request should fail and the server should return 412 Precondition Failed
. We are not really going to dive into each one of them since we will only be using If-None-Match
in this book, but here are quick descriptions.
Some of the descriptions come from the RFC of the HTTP protocol.
If-Match
: Opposite of If-None-Match
, this header indicates that the client wants the request to be performed only if the entity tag matches a current entity on the server. If this is not the case, the server should respond with 412 Precondition Failed
.
If-Modified-Since
: If the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 Not Modified
response will be returned without any body.
If-None-Match
: We have already seen how this header works.
If-Range
: If a client has a partial copy of an entity in its cache, and wishes to have an up-to-date copy of the entire entity in its cache, it could use the Range request-header with a conditional GET
(using either or both of If-Unmodified-Since
and If-Match
). However, if the condition fails because the entity has been modified, the client would have to make a second request to obtain the entire current entity-body.
If-Unmodified-Since
: If the requested resource has not been modified since the time specified in this field, the server should perform the requested operation as if the If-Unmodified-Since header were not present.
Conditional requests are powerful tools for developers who understand them. Don’t hesitate to dive deeper into the subject.
Let’s say we are making the following request to a server.
Request
GET /users HTTP/1.1
Host: example.com
Accept: application/json
Response
GET /users HTTP/1.1
Host: example.com
Cache-Control: max-age=86400
Content-Type: application/json
Content-Length: 128
Everything goes well and the response has been cached. Then we make the following request, asking for another representation.
Request
GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Response
GET /users HTTP/1.1
Host: example.com
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Length: 1024
If caches only used a combination of the HTTP method and the URI as a key for the cached response, the second request would still give us the JSON document. Same method, same URI.
Luckily, cache systems also use a secondary cache key, and, by using the Vary
header, we can change how this secondary key is built. Vary
lets us set exactly which headers impact the body of the request and should be part of the caching key.
Let’s re-send our two requests.
Request (application/json
)
GET /users HTTP/1.1
Host: example.com
Accept: application/json
Response
HTTP/1.1 200 OK
Vary: Accept
Cache-Control: max-age=86400
Content-Type: application/json
Content-Length: 128
Request (application/xml
)
GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Response
HTTP/1.1 200 OK
Vary: Accept
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Length: 1024
A cache would understand that these two requests are different. They will both be stored under the same primary key GET /users
(or whatever the format for primary keys is for that specific cache). The first request will have the secondary key application/json
, and the second one application/xml
.
There are other headers that can affect the format or content of the response body. Headers like Accept-Language
or Accept-Encoding
would also affect HTTP responses in a way that requires them to be stored as different responses.
Request
GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Accept-Language: fr
Accept-Encoding: gzip
Response
HTTP/1.1 200 OK
Vary: Accept, Accept-Language, Accept-Encoding
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Language: fr
Content-Encoding: gzip
Content-Length: 1024
For this response, the secondary cache key would be:
application/xml:fr:gzip
To be honest, the key generation is not very smart. It gets worse when we change the value of one of the Accept-
header. Let’s add deflate
to the Accept-Encoding
field.
Request
GET /users HTTP/1.1
Host: example.com
Accept: application/xml
Accept-Language: fr
Accept-Encoding: gzip,deflate
Response
HTTP/1.1 200 OK
Vary: Accept, Accept-Language, Accept-Encoding
Cache-Control: max-age=86400
Content-Type: application/xml
Content-Language: fr
Content-Encoding: gzip
Content-Length: 1024
The response is the same so we should be able to just get it from the cache, right? Nope.
Since Accept-Encoding
has changed, the secondary cache key is now application/xml:fr:gzip,deflate
, which is different from the previous one. There is also no normalization, which means different cases and whitespace characters would end up creating more entries in the cache, even though the responses are pretty much the same.
Vary
is great since it gives us a way to identify responses. It’s pretty sad that the algorithm behind it is not smarter. There is actually a draft being created to add a new HTTP header, Key
, that will allow a server to describe the cache key of a response.
Cache invalidation is hard. That’s why most people don’t do it and just create new and different entries in the cache. As I said before, this is how Rails handles assets, using an MD5 fingerprint in the URI to keep each asset file unique.
A problem arises for the representations of our resources: How should we cache them? Do we need to add a fingerprint every time we make a change?
Well, you could, but it’s not necessary. If you plan your caching strategy adequately, you can get away by simply using the ETag
header. Since it is a specific version of a representation, you can know exactly when the client and server are not in sync. Unfortunately, this still requires making HTTP calls, but at least there is no data being transferred.
Using Cache-Control
and max-age
is the best way to save time, but is much harder to use with web APIs. If you set a max-age
of 1 day, you need to ensure that any update sent to the server will reset the expiration date and the cached data. This requires coordination between client and server and can be very error-prone.
Caching is all about fine-tuning for your needs. Some applications cannot work with some delay (e.g. banking apps), while others will be fine if data is updated every hour (e.g. currency conversion app). It’s up to you to decide how small or big the expiration time of your representations should be.
Now it’s time to implement everything we’ve just learned. The good news is that Sinatra has a good support of HTTP caching.
The first thing we are going to do is set a value for the ETag
header using the following code:
etag Digest::SHA1.hexdigest(users.to_s)
Generated SHA1
Hash
d2cf37f13ac29d46e89e5b1771d134a8dd5399b6
We are using the Sinatra etag
method, giving it a SHA1
digest of the users
hash (after converting it to a string
). Sinatra will now automatically check if the client has sent the ETag
header. If the tokens match, it will halt the execution and return 304 Not Modified
.
If we change anything in the users
hash, the SHA1
hash won’t match anymore and Sinatra will let the route continue its execution instead of halting. The client will then receive 200 OK
and the new updated representation.
For example, if we change John
to Johnny
, the SHA1 hash becomes:
70c69492e91b298235a37340ea0a2b13627b5e83
We will also set the Cache-Control
header with the max-age
directive. We don’t want the client to have outdated values for the users list, but we consider that caching it for 1 minute on the client is fine. After all, we are not building a highly interactive API between multiple users. We can use the Sinatra method cache_control
to define it.
Note that in case of a change sent by the client (POST
, PUT
, etc), the client should invalidate its cache of users.
It all comes together in the simple API you can see below. Create the folder module_01/chapter_08
and the file webapi_caching.rb
inside for the code below:
# webapi_caching.rb
require 'sinatra'
require 'json'
require 'digest/sha1'
users = {
thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
simon: { first_name: 'Simon', last_name: 'Random', age: 26 },
john: { first_name: 'John', last_name: 'Smith', age: 28 }
}
before do
content_type 'application/json'
cache_control max_age: 60
end
get '/users' do
etag Digest::SHA1.hexdigest(users.to_s)
users.map { |name, data| data }.to_json
end
Sadly, we cannot test the cache using curl
. Instead, we are going to test using something that almost everyone has: a web browser. All browsers have HTTP caching implemented, some better than others, and we will use that.
Start the server with ruby webapi_caching.rb
and access http://localhost:4567/users for the first time. I recommend using the Incognito mode of Chrome to avoid unwanted information in the logs. See Figure 1 for the returned JSON
document and Figure 2 for the request result.
We can see that our API returned 200 OK
and set the validation token correctly in the ETag
header.
If we make the same request again, the browser will now get the response from the cache, at least for the next 60 seconds, as you can see in Figure 3. Look at the status code: 200 OK (from cache)
.
Awesome, it’s working. Now let’s see if the validation token prevents data transfer by waiting one minute and sending the request again. See Figure 4; we are getting 304 Not Modified
back, great! No data was transferred (see how the Content-Length
header is not present).
We have set up some pretty simple HTTP caching in our API. We are not going to dive deeper into it for now.
We have just seen how to use HTTP caching to allow a client to cache responses and the server to send data only if something has changed. We can go one step further on the server by caching the representations to boost the requests that have not been HTTP cached yet.
Building representations can be costly for a server and you want to be able to cache as many as possible. Of course, not everything can be cached. Let’s look at an example.
Let’s say we have users and books. Each user can have a list of books, like a reading list. These entities are stored inside an SQL database.
We also have the books
resource for a specific user available at /users/1/books
. From there, the following representation is returned:
{
"books": [
{ "title": "Something", "ISBN": "123" },
{ "title": "Something Else", "ISBN": "456" },
{ ... },
{ ... },
]
}
To create this representation, we had to make at least two SQL queries, one to get the user with id
“1” and one to get all the books that belong to this user. It might not take long to do so if we deal with one user, but what if we had one million?
The thing with our fake API is that users don’t frequently update their reading list: the writing ratio is much lower than the reading ratio.
What if we decided to generate the representation as soon as a change was made?
POST
(or LINK
) request to add a book
This approach is nice, but it has a problem. There is often more than one representation, and you don’t want to have to cache them all before they are accessed. What we can do instead is use lazy-loading. Any time a GET
request is received by the server, it looks in the cache to “see” if the representation is there or not. If it’s there, the server simply sends it back. If it’s not there, the server first generates it, stores it and then sends it back. The keys for the cache entries can be a combination of identifiers, media types, last update timestamp and language.
It’s pretty similar to simply putting an HTTP cache in front of your API. The big difference, however, is that you are pretty much in control of the whole thing. For example, you can do caching using the Russian Doll strategy, as seen in Ruby on Rails.
This approach associated with a real HTTP cache on your server and the right headers for caching on the client can make your APIs blazing fast.
Unfortunately, caching strategies work best for applications with more reading than writing. If your application just spends its time writing to the database without querying much, caching will be much less effective.
The example we took in the code sample above was trivial, but in real applications, some representations can be very costly to build. Think about the representation for a financial quote for example. There are many computations required, but once the quote is done, it doesn’t change much. This is a perfect example for caching.
Caching is awesome if your scenario allows it.
Let’s create a simple implementation of server-side caching. Since we don’t store updated_at
attributes for our users, we cannot use the same strategy as Rails. Instead, we are just going to use a revision number.
Our users hash now looks like this:
users = {
revision: 1,
list: {
thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
simon: { first_name: 'Simon', last_name: 'Random', age: 26 },
john: { first_name: 'John', last_name: 'Smith', age: 28 }
}
}
The revision
will be updated anytime a change is made to this hash. I’ve decided to define it for the entire list of users, but for a real application, it would make more sense to keep the revision for each user instead.
Next, we need a way to cache data. We don’t have memcache
or redis
installed, so we are just going to use a Ruby hash.
cached_data = {}
We also need a method that will cache and/or retrieve data.
def cache_and_return(cached_data, key, &block)
cached_data[key] ||= block.call
cached_data[key]
end
Finally, for fun, we will generate a JSON document with 3000 users with the following code:
(1..1000).each_with_object([]) do |i, array|
users[:list].each do |name, data|
array << data
end
end.to_json
Let’s see how it all comes together.
Create a new file named webapi_server_caching.rb
in the chapter_08
folder and put the following code in it:
# webapi_server_caching.rb
require 'sinatra'
require 'json'
require 'digest/sha1'
users = {
revision: 1,
list: {
thibault: { first_name: 'Thibault', last_name: 'Denizet', age: 25 },
simon: { first_name: 'Simon', last_name: 'Random', age: 26 },
john: { first_name: 'John', last_name: 'Smith', age: 28 }
}
}
cached_data = {}
helpers do
def cache_and_return(cached_data, key, &block)
cached_data[key] ||= block.call
cached_data[key]
end
end
before do
content_type 'application/json'
end
get '/users' do
key = "users:#{users[:revision]}"
cache_and_return(cached_data, key) do
(1..1000).each_with_object([]) do |i, array|
users[:list].each do |name, data|
array << data
end
end.to_json
end
end
put '/users/:first_name' do |first_name|
user = JSON.parse(request.body.read)
existing = users[:list][first_name.to_sym]
users[:list][first_name.to_sym] = user
users[:revision] += 1
status existing ? 204 : 201
end
See how we use users:#{users[:revision]}
as the cache key for our representation. If we supported more media types, we would probably need to use users:#{users[:revision]}:#{media_type}
instead. Using this type of key means we don’t need to worry about invalidating data. Once the revision is updated, we forget we even had the previous revision cached and instead generate and store the new revision with an updated key.
The PUT
route was also included to show you that we need to manually update the revision with users[:revision] += 1
to keep our cached data up to date.
Start the sever so we can run some curl
requests.
ruby webapi_server_caching.rb
Let’s use the following curl
request to see how much time we can save by caching our huge representation:
curl -i -s -w "\n%{time_total}s \n" -o /dev/null http://localhost:4567/users
There are new options in this one, so let’s take a quick look at each one of them.
-s
: Quiet Mode. Don’t show progress or error messages.
-w
: Set what to display after a successful request. In this request, we display the time (in seconds) that the request took.
-o
: Write output to the specified location. In the following case, we discard the output by sending it to /dev/null
.
Run it a first time.
0.039s
And a second time.
0.008s
Wow, nice improvement! Almost 5 times faster.
Although caching is a powerful tool, it should be used with caution. It can easily create bugs if you implement it incorrectly. For example, users could get outdated data or data that doesn’t match what they asked for (different languages, etc).
With great power comes great responsibility.