The Making Of LOL Feeds

Last week I wrote and released my LOL Feeds site. It takes RSS or Atom feeds from the web and makes a series of lolcat-style images on a web page. It’s really way funnier than it sounds.

Initially I wanted to be able to auto-generate Jerk City comic strips based on my friends’ twitters, but when that seemed hard I opted for lolcats style images. After all we’d been seeing a lot of the lolcats on twitter – they’re displayed when the site is undergoing maintenance.

The original version of the script was very very clever. It used the Google AJAX Feed API and the Flickr API to pull in feeds and random images of cats from Flickr, combine them together with a PHP script I wrote to generate transparent PNGs of text live onto the page. It used the browser’s own text-flowing algorithms to lay out the text. It was however amazingly slow.

Browsers only allow a low number of concurrent connections to one site – four or eight I think – and this made the text crawl in. Also while the Google AJAX Feed API and Flickr API are pretty snappy they’re way slower than doing it server side. I was sad about this because I’m kind of in love with fully dynamic client-side applications (just look at my home page) but I actually wanted this to see the light of day.

So a rewrite ensued. I pulled down a static set of cute cats from the cute cat group on Flickr, filtered by creative commons license of course. I removed a ton I didn’t like and added a couple of pictures of cats I know well. I reworked the script that generated single words into one that could place words on a JPEG. I had to write my own text layout algorithms but it turned out to be pretty simple.

I used Magpie RSS to generate a page that referenced my image generation script. To make the image generation stable (so that everyone looking at the same feed would get the same images for the same text) the image is selected based on a hash of the message. Nothing is actually random.

Then I glued it together with a good helping of mod_rewrite

This version worked pretty well. There were some issues with non-ascii characters and even funny characters like ? and & due to some URL encoding issues but it worked well. I posted about it told some friends and waited to see what would happen.

Things were going fine till I hit metafiler. Dreamhost, my friendly cheap hosting provider (100% carbon neurtral – yay!) noticed my image generation script was slowing down my shared machine and turned it off.

Adding an image caching layer was pretty straight-forward. I turned the image generation script from a .php to a .inc and included it in my feed processing script. Instead of generating an image every time it generates an MD5 of image parameters and uses a cached image if one exists. I was able to update the existing image generation script to use the same code and send a redirect to the cached version. In the past 5 days the script has generated just over 50,000 unique images which have been accessed almost half a million times. That’s a lot of cats saying stupid things. And a lot of saved CPU cycles.

I also wrote a Facebook Platform application, but that’s a story for another day.

Out with the old, in with the goo(gle)

Some time ago I reworked my home page to feature content from various other sites I post to (blogs, flickr, delicious) by using some JSON tricks to pull in their feeds. I blogged about how to do this with Feedburner’s JSON API, so that my actual page was just static HTML and all the work was done client-side.

Last week I decided to revisit this using Google’s new AJAX feeds API. Feedburner‘s API never seemed to be well supported (it came out of a hackathon) and it forced me to serialize my requests. In the process I neatened up a bunch of the code.

Google’s API is pretty straight-forward. It uses a loader model that is similar to Dojo‘s dojo.require, so you load the main Google library:

<script src=""

and then ask it to load their feed library:

google.load('feeds', '1');

They have a handy way of setting a callback to be called when the libraries are loaded:

google.setOnLoadCallback(function () { /* just add code */ });

By putting all three of these together we have a straight-forward way to execute code at the right time.

I refactored the code that inserts the feed data into the page a lot. I fleshed out the concept of input filters from simply filtering the title to filtering the whole item objects. This allows for a more flexible transformation from the information that is presented in the RSS feeds to information that I want to present to visitors to my page. In practice I only used it to remove my name from Twitter updates. Instead of hard-coding the DOM node creation like I did in the previous version of the code I moved to a theme model. The theme function takes a feed entry and returns a DOM node to append to the target DOM node.

The flexibility of Google’s API let me abandon my separate code path for displaying my Flickr images. Previously I used Flickr’s own JSON feed API but since Google’s feed API supports returning RSS extensions I used the Flickr’s MediaRSS compliant feeds to show thumbnails and links. They even provide a handy cross-browser implementation of getElementsByTagNameNS (google.feeds.getElementsByTagNameNS) for us to use.

I’m tempted to write a client-side implementation of Jeremy Keith‘s lifestream thing using this API.

Take a look at the code running on my home page or check out the script.

Insecurity is Ruby on Rails Best Practice

Update: This post is really really out of date. Please disregard most of what’s written here.

Ruby on Rails by default encourages developers to develop insecure web applications. While it’s certainly possible to develop secure sites using the Rails framework you need to be aware of the issues at hand and many technologies that make Rails a powerful easy to use platform will work against you.

Cross Site Request Forgery
CSRF is the new bad guy in web application security. Everyone has worked out how to protect their SQL database from malicious input, and RoR saves you from ever having to worry about this. Cross site scripting attacks are dying and the web community even managed to nip most JSON data leaks in the bud.

Cross Site Request Forgery is very simple. A malicious site asks the user’s browser to carry out an action on a site that the user has an active session on and the victim site carries out that action believing that the user intended that action to occur. In other words the problem arises when a web application relies purely on session cookies to authenticate requests.

Let’s look at a simple example. I want my site to add me to your 37Signals Highrise contact list.

37 Signals Highrise
Working out how to do this is pretty simple. I go to Highrise and take a look at the “Add a person” form in Firebug or using View Source. It’s a fairly straightforward form that submits to and has a bunch of fields. I can easily recreate that form on my own server. Since Highrise uses a different domain for each user I ask the user to enter their domain name. I use the same username I use everywhere else so it should be pretty easy to fool my user into giving me their Highrise domain name. Most other sites do not have unique per-user URLs.

The page is here and the source is here. Provided you’re logged in and you enter your highrise domain correctly clicking “Add me” will add my name to your contacts list because 37Signals’ servers have no idea that the request is not coming from their application.

It’s fairly straight-forward to modify a form like this to automatically submit on page load. It’s pretty easy to put it in an hidden IFRAME so the user doesn’t even know what’s going on.

By using a unique per-user URL the guys at 37 signals have made this exploit non-trivial. This is not the default behavior from the Ruby on Rails framework. By default action URLs are very predictable. For example if we take a look at the social bookmarking site Magnolia we see predictable URLs all over the place.

As soon as you log in to Magnolia you are presented with a form for adding a bookmark by typing in its URL. This is a fantastic user experience. Unfortunately for Magnolia’s users anyone that submits this form on their behalf can add bookmarks and the form’s action ( is the same for all users. A trivial page that adds a bookmark to a visitor’s Magnolia account would look like this (try it):

    <body onload="document.getElementById('f').submit()">
        <form id="f" method="post"
            <input type="text" value=""
                name="url" id="url" type="hidden"></input>

But it gets even worse. Since by default Rails allows GET as well as POST submissions you can call an action from an IMG tag (try it):

<img src="">

Magnolia is not unique in the behavior. Unless a site’s developer has gone out of their way to prevent it, this class of attacks will affect every Rails site. Most of the popular sites I’ve looked at exhibit some vulnerabilities.

Other Rails sites I’ve looked at attempt to do their input validation in JavaScript rather than in Ruby which leaves them open to JavaScript injection and hence XSS attacks. This is a far more serious attack that I can cover separately if there is interest.

Easy Solutions
There aren’t any good easy solutions to this. A first step is to do referrer checking on every request and block GET requests in form actions. Simply checking the domain on the referrer may not be enough security if there’s a chance that HTML could be posted somewhere in the domain by an attacker the application would be vulnerable again.

Better Solutions
Ideally we want a shared secret between the HTML that contains the form and the rails code in the action. We don’t want this to be accessible to third parties so serving as JavaScript isn’t an option. The way other platforms like Drupal achieve this is by inserting a hidden form field into every form that’s generated that contains a secret token, either unique to the current user’s current session or (for the more paranoid) also unique to the action. The action then has to check that the hidden token is correct before allowing processing to continue.

This is a pain in the arse to write by hand.

What is really required at the Rails level is a of form API that can generate and consume forms securely. In Drupal all form HTML is generated and parsed by the framework. This allows application developers to protect themselves from XSRF without even knowing they are.

There’s a plugin called Secure Action but I’m not sure how well it works. The dependence on a static shared salt rather than a randomly generated secret in the user’s session concerns me. The way it puts the signature in the URL makes me nervous too. It’s better than nothing though.

OpenID for the mathematically challenged

The other day I got the OpenID bee in my bonnet and grabbed James Walker‘s module and installed it on my server. Actually I grabbed it from CVS, and then discovered that the CVS version is half-ported to some new Drupal 6 form API, so I ended up using the DRUPAL-5 tag.

Anyway, I use Dreamhost which I love for many many reasons (primarilly it’s really cheap and seems to work really well). Unfortunately they don’t build their PHP with BCMath or even GMP, which means my PHP can’t do the hard math that’s required for crypto. Luckily there’s a mode of OpenID that doesn’t require any work on the relaying party side. So I made a small change that allows James’ module to work in this “dumb” mode.

Index: openid.install
RCS file: /cvs/drupal-contrib/contributions/modules/openid/openid.install,v
retrieving revision 1.2
diff -u -p -r1.2 openid.install
--- openid.install      25 Mar 2007 06:38:00 -0000      1.2
+++ openid.install      16 May 2007 22:59:56 -0000
@@ -2,24 +2,6 @@

- * OpenID module requires bcmath
- */
-function openid_requirements($phase) {
-  if ($phase == 'runtime') {
-    $requirements['bcmath']['title'] = t('BCMath');
-    if (function_exists('bcadd')) {
-      $requirements['bcmath']['severity'] = REQUIREMENT_OK;
-      $requirements['bcmath']['value'] = t('Enabled');
-    }
-    else {
-      $requirements['bcmath']['severity'] = REQUIREMENT_ERROR;
-      $requirements['bcmath']['description'] = t('OpenID needs the bcmath extension for encryption.');
-    }
-  }
-  return $requirements;
* Implementation of hook_install
function openid_install() {
Index: openid.module
RCS file: /cvs/drupal-contrib/contributions/modules/openid/openid.module,v
retrieving revision 1.2
diff -u -p -r1.2 openid.module
--- openid.module       25 Mar 2007 06:38:00 -0000      1.2
+++ openid.module       16 May 2007 22:59:56 -0000
@@ -133,10 +133,14 @@ function openid_login_form_submit($formi

$idp_endpoint = $services[0]['uri'];
$_SESSION['openid_idp_endpoint'] = $idp_endpoint;
-  $assoc_handle = openid_association($claimed_id, $idp_endpoint);
-  if (empty($assoc_handle)) {
-    drupal_set_message(t('OpenID Association failed'), 'error');
-    return;
+  // if we have BCMath, we should use OpenID smart mode
+  if (function_exists('bcadd')) {
+      $assoc_handle = openid_association($claimed_id, $idp_endpoint);
+      if (empty($assoc_handle)) {
+        drupal_set_message(t('OpenID Association failed'), 'error');
+        return;
+      }

Also, I put the patch up on

The Sidekick ID and the iPhone

There were two interesting announcements today. First the Sidekick ID which had been previously leaked was formally announced and reviews have started to show up. Secondly Apple announced that the OS X Leopard will ship three months late – more than two years after the previous release of OS X. This slip is being seen as evidence that Apple is having trouble building as many products at once as it wants to.

In the four years I was at Danger we were building exactly one product at a time. We failed to separate the development of the hardware, the OS and the applications. Separating the client and server schedules was a slow and painful process. In the two years since I’ve left things seem to have improved. The fact that they’re able to ship two products (even if they are quite similar) is really exciting. That Danger is succeeding where Apple, with their 30 years of experience, is beginning to stumble is cause for congratulations.

Making dynamic static pages

UPDATE: I changed how this works and blogged about it.
I wanted my home page to reflect what was going on in my life, or at least what content I was generating. There’s the concept of a lifestream floating around at the moment, but I was happy just to have a few sources (a couple of blogs, my bookmarks, my twitters and my flickr stream) shown, split out my source. The catch was I wanted to do it without implementing a web service to back it.

If you want to pull content from a bunch of different servers without writing any server-side code your only options are Flash and JSON. They’re both ways of getting around the web security model that’s protected us for so long. Flash is kind of complicated, requires proprietary, expensive tools to work with while JSON comes for free. The idea behind JSON is that we can use the browser’s tag to load a script from another serve that contains data encoded as JavaScript data structures rather than code.

A few key services like Flickr and offer JSON versions of their feeds, but most do not. In steps FeedBurner who in January added a JSON version of the feeds they serve, and since you can ask FeedBurner to host any feed you please we can use them as our high-availability, standardizing, caching feed proxy. I\’d already set up FeedBurner for this blog, so I just added feeds for my LiveJournal and twitter accounts and looked up how to get access to the JSON feeds for my Flickr and accounts.

When you call a JSON script you can often pass in the name of a callback function to be called when the data is returned. I wrote a simple one to process a JSON response from FeedBurner and turn some of the most recent items into HTML list items:

var max_items = 3;
var target = null;
var filter = null;
function fb(o) {
  if (!target) return;
  for (var i=0; i<o.feed.items.length && i<max_items; i++) {
    var item = o.feed.items[i];
    var li = document.createElement('li');
    var a = document.createElement('a');
    if (!item.title) item.title =;
    if (filter) item.title = filter(item.title)

The function depends on a couple of external variables that are set before the JSON feed is called. They are target, the DOM object to append the list items to and filter an optional function to post-process titles. I had to add filter since my twitter feed comes back a little weird.

Getting the most recent posts from this blog into a bullet list is now pretty straight-forward:

<ul id="ianloic-list" />
<script type="text/javascript">
target = document.getElementById('ianloic-list');
<script type="text/javascript"

Getting my LiveJournal in was identical and twitter just required me to write a filter function to chop up the title a little and do some unescaping. For some reason the twitter feed is both HTML entity encoded and JavaScript string escaped when I get it back from FeedBurner.

Flickr and each require some custom code since I want to handle each of them specially. For I link to the tag pages of each tag on each bookmark and for Flickr I embed a thumbnail for each image. Take a look at the source code of to see how that’s done or drop me if you’d like more explanation.

Burning your Drupal feed in two easy steps

FeedBurner provides all kinds of neat stats, but it didn’t seem straight-forward to “burn” my blog feed since I’m using Drupal 5. After a little fiddling I think I’ve got a pretty good idea how to make it work in probably the simplest way possible. In fact, it doesn’t require and Drupal configuration at all.

  1. First I set up a FeedBurner account and burned my feed. The feed Drupal produces for me is: Now when I access I get the contents of that feed. It’s pretty simple, but so far nobody is going to see that feed.
  2. Then I simply told Apache to redirect all requests for that feed, except the ones from the FeedBurner bot to my FeedBurner feed. With the slight of hand magic of mod_rewrite this is pretty straight forward. In the root of every Drupal install there’s an .htaccess file containing a bunch of stuff. I just added a few lines to the mod_rewrite.c block of that file:
      # Rewrite rss.xml to
      # unless FeedBurner is requesting the feed
      RewriteCond %{HTTP_HOST} ^ianloic\.com$ [NC]
      RewriteCond %{HTTP_USER_AGENT} !FeedBurner.*
      RewriteRule ^rss.xml$ [L,R=301]

    This will cause Apache to send a 301 redirect to any time anyone requests, unless their HTTP User Agent begins with FeedBurner.

    Now I’ve got access to all the FeedBurner statistics and fun features. Since I didn’t actually touch the Drupal configuration I’m pretty sure a similar approach can be taken to applying FeedBurner to any feed.

Tag Clouds Two Point Oh?

[flickr-photo:id=15085782,size=m] Tag clouds bore me. They’re a relatively effective way of indicating quickly what topics are popular but that’s it. From’ cloud I can see that the site is for nerds – web nerds specifically. Flickr’s tag cloud tells me that people tag events and place names but that’s about it. My personal tag clouds on these sites tell me even less. My tag cloud tells me almost nothing – its a huge block of dark-blue and light-blue text. The Flickr one isn’t much better – it tells me mostly that I took a bunch of photos kayaking in the Queen Charlotte Islands, or perhaps more specifically, I got around to tagging my kayaking photos.

I’m more interested in seeing what’s going on right now and seeing how these topics are related. Since this is a graph visualization exercize I threw graphviz at the problem. After a bit of preliminary experimentation I ended up defining a graph based on recent tags pulled from an RSS feed. Each tag is represented as a node and any tags which appear together on the same post have arcs between them. Tag text gets scaled up a little with frequency. The effect isn’t perfect. Its pretty boring when there isn’t much data like on this site:

With a bit more data, like from my recent delicious feed things can get cluttered but we can see what I’m interested in right now:

This idea isn’t fully developed. The complexity of laying these graphs out in a sensible manner increases pretty rapidly as the number of nodes and arcs increases and so does the visual clutter. I’d like to experiment with client-side graph layout (ie: implementing graphviz in JavaScript) and doing something more sensible with synonym tags – ie: tags which always appear together. Synonym tags are somewhat interesting, but can distract from the relationships between concepts. Treating all tags that are coincident over a small number of posts as synonyms may often result in false synonyms, and collapsing synonyms will make it easier to scale to more posts, so I expect that that may be a productive path to go down in scaling these visualizations up to encompass more posts.

Oh, and the final demonstration – my friend Dan is looking for and apartment and is a Ruby on Rails web application developer: