Setting infinite Expires headers is a great solution to the problem – this way everything downloaded from the server will be just kept in a browser’s cache! It is very effective and ranked 3rd on best practices lists provided by Yahoo! Exceptional performance team and even first on Google PageSpeed’s list.
You can clearly see the difference on this graph below:
Scripts and CSS in the headers load from cache eliminating rendering delay and images load from cache as making document load event happening much sooner.
To solve this problem, simple technique, sometimes called “fingerprinting” or “cache busting” (term usually indicates that techniques used for wrong reasons) can be used to replace Entity Tags and Conditional GET techniques both of which require server requests and will not work with infinite expiration.
The idea is simple — URL of the asset should be unique for each version of the asset, effectively changing every time you update you script or an image. This way browser will not find the asset in it’s cache and request new one from the server storing results in another cache entry and eventually pushing old one out.
Perceived as complex
The problem implementing this lies within current status quo with web servers that use file system having no idea about file versions. And to be fair, URL<->file one-to-one paradigm is just plain simple to understand helping web grow as fast as it did, but it affects knowledge for all developers and almost no systems are built with versioning in mind.
My experience with many teams shows that this technique is perceived as overly complex and people avoid it till the last moment while this quite obvious solution should be common functionality within the web servers.
Another problem, I believe, is that this solution affects many “layers of the pie” – from asset publishing process and HTML modification by designers and front-end developers to software modifications done by backend developers to web server configuration done by system administrators. This means many groups are involved and more people involved, the harder it gets to push through and nobody wants to bring it up.
Solution: SVN Assets
Building HowDoable mostly alone, I don’t have “multi-layer” problem, more over, being a performance geek, I’m concerned with performance probably more then needed at this stage of development. So I spent some time early in the project to make sure my builds and upgrades don’t suffer from “caching problem” and still perform as good as they can with cached content.
What I did was simple – I used a basic and most obvious source of versioning info there can be – source control software, namely Subversion and made sure that my code does not insert a single asset URL into HTML without appending it’s Subversion version to it.
Since many project use Subversion and PHP for their content management, I thought I’d share it with the community and being a strong believer in open source I try to release as much infrastructure free as possible.
So, welcome SVN Assets, a set of tools to make your assets cache using Subversion and simple build process.
I’ve built it for PHP, but will be happy to see it used by developers on other platforms.
Usage is described in README file – just generate data file from Subversion and use assetURL('images/my.jpg')to insert your assets in the code. There is also a command line script that updates all CSS files to point to the proper versions of the files so you don’t have to maintain them by hand and still serve them directly from the file system.
Please, go check it out and let me know what you think about it!
You can also subscribe to a mailing list if you’d like to ask questions and discuss possible improvements:
5 thoughts on ““Caching problem” no more: SVN Assets”
Great suggestion. When I load this blog post I see assets with the version number appended in the querystring, eg:
Many proxies ignore the querystring when it comes to caching, so they will serve an outdated file when you change the version in the querystring. The fix for this is to embed the version in the filename, like this
Thanks for all the work you do to promote web performance.
Yeah, I actually read your post about that and SVN Assets is using the filename.123.ext schema for files.
This blog is WordPress with some theme which have many ways to improve ;)
More over, they have that “multi-layer” challenge I mentioned in the post where one of their goals is easy setup and ?ver=123 is easier to install everywhere while filename.123.ext requires additional “layer” of rewrite rules to be involved.
I was planning to have some Open Source Speedup sessions as part of NY Web Performance Meetup aimed at improving tools like WordPress or Drupal or writing tools like SVN Assets, but didn’t have time to organize one yet.
Sounds great, I know you have been pushing this idea for a while. When I get the chance, I will give it a try with one of my wordpress mu clients.
Specifying file version on directory level is even better than query string or file name.
This is especially useful for CSS images.
Let’s say we have a CSS file /static123/style.css with
Path to image is /static123/bg.jpg
Now if we do file versioning on directory level bg.jpg file will be reloaded.
When we update CSS file new path to css is /static124/style.css and path to image is /static124/bg.jpg
For images we don’t want to cachebust we can use absolute paths:
We can go even wurther with data: URIs.
Actually, SVN Assets now uses filename rewriting and has drop-in .htaccess file that helps with that.
Just uploading to a new folder is not a very good idea as not all files get changed at the same time and invalidating all files will re-load too many items. This is why version is attached directly to SVN version of specific file instead.
data: URIs is another powerful tool and there are many other techniques that can be used, it’s just that this particular issue mapped directly to SVN and that’s why I created it as a separate library.