I have no idea why feed publishers do not care to return Last-modified header so readers can use conditional GET and 304s instead of full thing over and over again in case when feed contents don’t change. I beleive they just don’t care about RSS traffic money they spend on these absolutely useless data transfers. Also they don’t care about speed of clients reader (which needs to reparse data every time without need for it).
Anyway, in addition to using conditional GET, fetcher now checks if content really changed (comparing hashed raw feed content to previous one) and skips parsing fase. Hopefully it’ll make it sagnificantly faster and will save us some CPU cycles.
I’m thinking about writing an article about popular blogs (e.g. Engadget, Joel on Software and some more) that don’t use conditional GET. Check out yellow pressish headline: “Bloggers are wasting investor’s money”.
Now links in descriptions are being absolutized relative to original feed URL upon fetching. This is supposed to fix missing images and broken links which are caused by feed item being moved from original feed (preprocessed feed, then friends feed and so on).
I wrote perl class that allows changing all links in HTML code in one call (will be used for some other features as well) and a subclass that absolutizes it based on some URL.
It will be my first OpenSource Perl module that goes up on CPAN – it will be available soon at http://search.cpan.org/~sergeyche/
OPML generation is fixed so I hope you’ll get correct results.
You can regenerate OPML without changing the list of feeds by just clicking here.
If you still have bugs, please let me know by making a comment to this post including bug description and OPML feed URL.
I’m sorry to say that but all items shared so far (all 21 of them) were lost during old item deletion process testing.
Luckily only 2 of them were not mine, sorry, Dima, I hope that overall system performance will make you happy and you’ll forget those two tiny messages.
P.S. I’ll be making daily backups of the database to avoid this problem in the future.
Now XML::RSS::LibXML supports content namespace and I fixed preprocessor so it adds in-feed controls to both <description> and to <conten:encoded> so those feeds that use it will look fine once updated.