About Web Caches
This article draws primarily from two sources: AskApache.com - Speed Up Your Site With Caching, and Mark Nottingham's Caching Tutorial. The purpose of this article to help clarify what caching is and how to do it. Think of this article as an brief overview of the two caching articles above.
Let's go over some terminology...
Content Any type of file. e.g. HTML, images, and sounds
Representation A copy of the content stored in a cache.
Cache A storage location meant to be closer or faster than an origin server. Used to improve performance.
Origin Server The server where content originates from.
Fresh Content is considered fresh if it can be sent to a client without checking with the origin server. A fresh representation will be available instantly from the cache. A cached representation is considered fresh if:
- It has an expiry time or other age-controlling header, and is still within the fresh period.
- If the cache has seen the representation recently, and it was modified relatively long ago.
To control how long content will be fresh you can use the HTTP 1.1 Cache-Control response headers. If the origin server is running apache, you can control the Cache-Control headers by editing your .htaccess file using mod_headers.
Another option to control freshness is to set the Expires HTTP header. It tells all caches (browser, proxy, and gateway) how long the associated representation is fresh. Once the content has gone stale, caches will always check back with the origin server to see if the content has changed. The Expires HTTP header is supported by the HTTP 1.0 and HTTP 1.1 protocols. There are problems using the Expires HTTP header. These problems involve synchronizing the time on the origin server and the cache, and forgetting to update the Expires time. Using Cache-Control headers is recommended over the Expires header.
- Validator A HTTP response with an ETag or Last-Modified header. A validated representation will avoid sending the entire representation over again if it hasn't changed. Validation is used by servers and caches to communicate when an representation has changed. By using it, caches avoid having to download the entire representation when they already have a copy locally, but they're not sure if it's still fresh. If a validator isn't present (no ETag or Last-Modified header), and there isn't an Expires or Cache-Control header available, caches will not store a representation at all. Most modern Web servers will generate both ETag and Last-Modified headers to use as validators for static content (i.e., files) automatically.
The most common validator is the time that the document last changed, as communicated in Last-Modified header. When a cache has a stored representation that includes a Last-Modified header, it can use it to ask the server if the representation has changed. To do that, the cache can send an If-Modified-Since request.
For instance, if a client sends a GET request for a image, that request might contain a If-Modified-Since header with the value: Wed, 23 Mar 2011 19:26:45 GMT. This is the time the image last changed, as known by the client's cache. If the image on the server hasn't changed, then the response from the server would be an HTTP 304 Not Modified, meaning that the cached copy of the image is still good. The original image has not changed. If the image had changed, the response would have been HTTP 200 "OK". The changed image would then be resaved to the client's cache.
You can use Redbot.org to check your cache settings.