“Caching problem” no more: SVN Assets

Useless requests
There is a fundamental problem with static assets on the web pages, like images, CSS files and most importantly JavaScript files – they are requested from the server over and over again even if they were not modified between page requests. This slows down page rendering and in case of JavaScripts quite dramatically as JavaScript downloads block downloads for the rest of the assets by default.

Infinite Expiration
Setting infinite Expires headers is a great solution to the problem – this way everything downloaded from the server will be just kept in a browser’s cache! It is very effective and ranked 3rd on best practices lists provided by Yahoo! Exceptional performance team and even first on Google PageSpeed’s list.

You can clearly see the difference on this graph below:
Waterfall diagram
Scripts and CSS in the headers load from cache eliminating rendering delay and images load from cache as making document load event happening much sooner.

Caching problem
But there is a problem with this on the opposite side – when assets change during normal development, e.g. image gets updated or JavaScript or CSS modified to fix a bug or add a new feature, it must be pushed to user’s browser, but it is not requesting it anymore loading the ones it saved to cache. Developers are used to press Shift+Reload or Shift+F5 to force browser to refresh the page, but users don’t do that and that’s why many just don’t use infinite caching technique, they prefer peaceful development without “caching problem” paying the price with degraded user experience.

URL fingerprinting
To solve this problem, simple technique, sometimes called “fingerprinting” or “cache busting” (term usually indicates that techniques used for wrong reasons) can be used to replace Entity Tags and Conditional GET techniques both of which require server requests and will not work with infinite expiration.

The idea is simple — URL of the asset should be unique for each version of the asset, effectively changing every time you update you script or an image. This way browser will not find the asset in it’s cache and request new one from the server storing results in another cache entry and eventually pushing old one out.

Perceived as complex
The problem implementing this lies within current status quo with web servers that use file system having no idea about file versions. And to be fair, URL<->file one-to-one paradigm is just plain simple to understand helping web grow as fast as it did, but it affects knowledge for all developers and almost no systems are built with versioning in mind.

My experience with many teams shows that this technique is perceived as overly complex and people avoid it till the last moment while this quite obvious solution should be common functionality within the web servers.

Multi-layer problem
Another problem, I believe, is that this solution affects many “layers of the pie” – from asset publishing process and HTML modification by designers and front-end developers to software modifications done by backend developers to web server configuration done by system administrators. This means many groups are involved and more people involved, the harder it gets to push through and nobody wants to bring it up.

SVN Assets logoSolution: SVN Assets
Building HowDoable mostly alone, I don’t have “multi-layer” problem, more over, being a performance geek, I’m concerned with performance probably more then needed at this stage of development. So I spent some time early in the project to make sure my builds and upgrades don’t suffer from “caching problem” and still perform as good as they can with cached content.

What I did was simple – I used a basic and most obvious source of versioning info there can be – source control software, namely Subversion and made sure that my code does not insert a single asset URL into HTML without appending it’s Subversion version to it.

Since many project use Subversion and PHP for their content management, I thought I’d share it with the community and being a strong believer in open source I try to release as much infrastructure free as possible.

So, welcome SVN Assets, a set of tools to make your assets cache using Subversion and simple build process.

I’ve built it for PHP, but will be happy to see it used by developers on other platforms.

Usage is described in README file – just generate data file from Subversion and use assetURL('images/my.jpg')to insert your assets in the code. There is also a command line script that updates all CSS files to point to the proper versions of the files so you don’t have to maintain them by hand and still serve them directly from the file system.

Please, go check it out and let me know what you think about it!

You can also subscribe to a mailing list if you’d like to ask questions and discuss possible improvements:

ShowSlow presentation at NY Web Performance Meetup

I gave extended presentation about ShowSlow at New York Web Performance Meetup yesterday and I hope I was able to inspire people to start tracking their web site’s performance over time and thinking about your metrics that they want to collect for their business.

There were a couple questions about automation so here’s the blog post I wrote about automation a couple months ago: Automating Page Speed and YSlow monitoring.

Also, here’s the simple script that can be used with ShowSlow:
http://code.google.com/p/showslow/source/browse/trunk/showslow.sh. By the looks of it you can tell that it’s relatively simple to do.

You might also want to look at about:config preferences for Firefox to see if any of them affect you, e.g. forcing Firebug to open and grade each page, cache storage, automatic updates of extensions and stuff like that. I hope to have it documented better in near future.

Best way to learn more about ShowSlow configuration is to read
documentation on ShowSlow.org

If you’d like to just try the DEMO instance, go ahead and configure your YSlow to post there or just use export menu in PageSpeed or NetExport.

I’d like to also thank our host Logicworks and Stephanie personally for hospitality – it’s great to be welcome!

At our next session, on April 15th, Nicholas Tang is going to demo WebPageTest.org – free, web based performance analysis site that uses AOL Pagetest open source tool developed by Patrick Meenan. This will be first in a series of “Tools” sessions where I hope we’ll cover many useful details of them all.

I hope to see you all next month!

P.S. A few slides I had are on SlideShare, the rest was a demo and is not available, unfortunately.

2010 Talks at Velocity

O'Reilly Velocity 2010 LogoWednesday talk at O’Reilly Velocity Online Conference was great even though it was very short I had lots more to cover.

I’m doing¬†extended talk and demo of ShowSlow next Tuesday at New York Web Performance Group. Will be happy to see you there and chat about performance of your site!

I’m also excited about summer Velocity 2010 conference as I’m giving a talk there as well. It’s the main industry conference on web operations and performance so don’t wait and register and I’ll see you in CA.

Limit URLs, DBUpgrade and HAR beacon in ShowSlow

I added a few features to ShowSlow in a past couple weeks, but being also busy with HowDoable, I didn’t have time to write about them so here you go – a short digest:

Limit URLs using PCRE regexes

You can now limit URLs to be accepted by ShowSlow using PCRE regular expressions (thanks for initial patch from Aaron). All you need to do is to use a regex instead of just prefix in $limitURLs array – ShowSlow will automatically detect that it’s not a URL prefix and match against it using preg_match.

DBUpgrade for easy data schema upgrades

DBUpgradeI’ve started to use another open source project of mine called DBUpgrade to help you with database schema upgrades going from version to version. Going forward, if you’ll need to upgrade the schema, all you’ll have to do is run php dbupgrade.php or just make (which is going to run svn update too). tables.sql will still contain latest schema so feel free to update manually.
DBUpgrade requires MySQLi module to be configured with your PHP, it is also recommended if you’re using MySQL 4.1.3 and later (and you should be).

HAR beacon

And finally, I worked with Jan “Honza” Odvarko who is one of the lead developer of Firebug and related extensions to add HTTP Archive (HAR) support to ShowSlow and beacon support to NetExport Firebug extension (use v0.7b12+) that allows to save contents of Net Panel and later view it. HAR is also supported by HTTPWatch, a tool that created original xml-based export format HAR was based upon.

By default, beacons are sent to http://www.showslow.com/beacon/har/ but you can reconfigure it to point to your own instance using extensions.firebug.netexport.beaconServerURL Firefox config variable (at about:config).

NetExport beacon menu screenshot

You can see sample result here. For configuration options, see documentation on showslow.org

That was a lot of stuff – I should be reporting about developments more often.

Go ahead and try it all, let me know how it works and if you have any troubles using these features. If you feel you need more features in ShowSlow, go ahead, submit them to our UserVoice forum and email the mailing list to discuss and gain supporters: http://groups.google.com/group/showslow.

Update: See also post by Honza about this: http://www.softwareishard.com/blog/firebug/share-har-logs-online/