Tag Clouds Two Point Oh?

[flickr-photo:id=15085782,size=m] Tag clouds bore me. They’re a relatively effective way of indicating quickly what topics are popular but that’s it. From del.icio.us’ cloud I can see that the site is for nerds – web nerds specifically. Flickr’s tag cloud tells me that people tag events and place names but that’s about it. My personal tag clouds on these sites tell me even less. My del.icio.us tag cloud tells me almost nothing – its a huge block of dark-blue and light-blue text. The Flickr one isn’t much better – it tells me mostly that I took a bunch of photos kayaking in the Queen Charlotte Islands, or perhaps more specifically, I got around to tagging my kayaking photos.

I’m more interested in seeing what’s going on right now and seeing how these topics are related. Since this is a graph visualization exercize I threw graphviz at the problem. After a bit of preliminary experimentation I ended up defining a graph based on recent tags pulled from an RSS feed. Each tag is represented as a node and any tags which appear together on the same post have arcs between them. Tag text gets scaled up a little with frequency. The effect isn’t perfect. Its pretty boring when there isn’t much data like on this site:

With a bit more data, like from my recent delicious feed things can get cluttered but we can see what I’m interested in right now:

This idea isn’t fully developed. The complexity of laying these graphs out in a sensible manner increases pretty rapidly as the number of nodes and arcs increases and so does the visual clutter. I’d like to experiment with client-side graph layout (ie: implementing graphviz in JavaScript) and doing something more sensible with synonym tags – ie: tags which always appear together. Synonym tags are somewhat interesting, but can distract from the relationships between concepts. Treating all tags that are coincident over a small number of posts as synonyms may often result in false synonyms, and collapsing synonyms will make it easier to scale to more posts, so I expect that that may be a productive path to go down in scaling these visualizations up to encompass more posts.

Oh, and the final demonstration – my friend Dan is looking for and apartment and is a Ruby on Rails web application developer:

Syntax Highlighting for Drupal

[flickr-photo:id=252312738, size=m] While writing my last post, I felt the need to post some source code examples and I wanted them to be pretty. Looking around drupal.org, I failed to find what I wanted. There were a few options, the codefilter module, but that only supported PHP highlighting, the geshifilter module, but that doesn’t support Drupal 5.x which I’m running, or patches against codefilter to add GeSHi support.

So I did what was probably the wrong thing and wrote my own. At least I didn’t write it from scratch, I based it largely on codefilter, with some inspiration from the patches to codefilter that add GeSHi support.

I hacked up GeSHi a little as it wants to link keywords of most languages to reference sites. While this sounds like a good idea in theory it was linking HTML keywords off to some random site I didn’t really like and didn’t think was that good, so I disabled that functionality.

Using the module is pretty straightforward. You wrap your source code in tags that look like

<code language="LANGUAGE">...</code>

where LANGUAGE is a supported language. If there’s an enter in your block then it treats it as a block otherwise it renders it inline. Also, some whitespace is trimmed, so you can force a single line to be treated as a block by putting an enter at the start or the end.

Right now it’s being maintained in the same source control as I’m using for my web site, but I’ll move it into Trac and Subversion eventually. For the time being it’s attached.

Flickr for Dojo

I’ve been working on a little Dojo based application which talks to Flickr, so I put together a little library which uses Dojo to talk to Flickr using it’s rest JSON interface. Use It’s pretty simple to use, just include the JavaScript file: <script src=”flickr.js”></script> Tell the library what your keys are: flickr.keys(API_KEY, SECRET_KEY); And […]

Flickr Authentication Security

[flickr-photo:id=1187679,size=m] Recently Flickr closed a little security hole I found in their API authentication. I was able to convince their servers to hand out a token to me based on a user’s cookies and the API key and secret key of an application the user had used. Then with the JSON form of the Flickr API I had full access to the user’s account.

The there two flaws in Flickr’s security that exposed this problem. The first was that the security is based on the assumption that applications can keep a key secret. This is easy for web applications that make server to server API calls, but for anything that a user downloads and especially open source software it’s impossible to keep the key secret. My experiment used the secret key from Flock which is open source – the secret key can be found in subversion, and the secret key from Flickr’s own MacOS X uploader application which can be easilly extracted from the download from their site. Secondly the Flickr server was giving out new authentication tokens without requiring user approval.

The exploit itself is a little state-machine making a series of Flickr API calls and using one IFRAME. It goes like this:

  • Request a frob (via JSON)
  • Request authentication (via an IFRAME)
  • Request the auth token (via JSON)
  • Do evil (via JSON)

In my case the evil consisted of posting a comment on the user’s most recent photo.

The security hole is now closed, but if you’re interested in seeing how to access the Flickr API entirely from JavaScript in a web page take a look at the attached exploit. You’ll also need the MD5 library.

Rules of RDF

[flickr-photo:id=100583394,size=m] At Flock I’ve become the RDF expert. It turns out, in the context of building on the Mozilla platform RDF can be a really flexible, advanced and performant way of modelling data and binding it to ui, however it can be very confusing. There’s some rules I’ve found come in handy:

  1. There are no nodes. There are only arcs. Nodes only exist in terms of being the source or target of an arc.
  2. There are no arcs. We all agree to interpret RDF triples as a directed graph, but really, they’re just triples, just statements.
  3. There is no XML. Most of the time when we see a representation of RDF triples its in a serialized XML form. There are many different, valid ways to express the same RDF graph as XML. The tree of the XML document doesn’t match the RDF graph (usually). Don’t try to treat RDF as XML.

I originally posted this in my livejournal but it didn’t make sense to most of my non-technical readers, and probably not much sense to most of my technical readers. Hence this blog.