Tag Archives: ajax

Not solving the wrong problem

I like a great deal of what Google does for the open web. They sponsor standards work, they are working on an open source browser, they are building documentation on the state of the web for web developers. It’s all really great. Today they posted what they called A Proposal For Making AJAX Crawlable. It seems like a great idea. More and more of the web isn’t reached by users clicking on a conventional <a href=”http://… link but by executing JavaScript that dynamically loads content off of the server. It’s somewhere between really hard and impossible for web crawlers to fully and correctly index sites that work that way without the sites’ developers taking crawlers into account.

Google’s proposal is to define a convention for URLs that contain state information in the anchor and to define a convention for retrieving the canonical, indexable contents of the an URL with such an anchor tag. First let me dismiss the suggestion that you make a headless browser available over HTTP to render your AJAX pages to HTML out of hand. If it’s so easy for HtmlUnit to render your AJAX to HTML, surely Google can do it. And basically offering HtmlUnit as a web service on your server doesn’t sound that secure or scalable to me.

The bigger question is that if your solution requires the server to be able to serve the correct HTML for any state, would you come up with the same solution as Google? There is a simple, straight-forward solution that works today and is used on sites all over the internet. If the content you serve includes the static, non AJAX URLs in anchor HREFs but uses JS click handlers to do AJAX loads then crawlers can scrape all of your pages, users of modern browsers get the full shiny experience and users on old mobile browsers that don’t support JS get to work for free!

To do this you can either make your AJAX templates include onclick handlers or you can write a simple piece of JS to do the right thing when any link is clicked on. A contrived example using jQuery might look like:

      $(function(event) {
        $('body').click(function(event) {
          var href = $(event.target).attr('href');
          // don't try to AJAX absolute URLs
          if (href.match('https?://')) return;
          // don't let the normal browser navigation operate
          event.preventDefault()
          // based on event.target.href, decide what AJAX URL to load.
          $('#ajaxframe').load('/load-fragment', {path: href});
          // update the URL bar
          document.location.hash=href;
        });
      });

This will intercept clicks on relative anchor tags and let your page JS do its AJAX magic. It doesn’t require special conventions. If you build your site this way you’ll probably find that the state that is in your URL fragments is a the relative URL for the page on your site. So http://www.example.com/random/page and http://www.example.com/#/random/page have the same meaning. That turns out to be a pretty good convention. After all, aren’t our URLs supposed to refer to resources anyway?

Google AJAX APIs outside the browser

Google just announced their new Language API this morning. Unfortunately their API is another one of their AJAX APIs – that are designed to be used from JavaScript in web pages. These APIs are pretty cool for building client-side web applications – I used their AJAX Feeds API in my home page – but I had some ideas for server software that could use a translation API.

I remembered John Resig’s hack from a few months back, getting enough of the DOM in Rhino to run some of the jQuery unit tests. I pulled that down, wrote the bits of DOM that were missing and now I’ve got a Java and JavaScript environment for accessing Google’s AJAX APIs. Apart creating stubs for some methods that are called the main functionality I had to implement was turning Google’s APIs’ asynchronous script loading into the Rhino shell’s load(url) calls. They use <script src="... and document.createElement("script"), but both are pretty easy to catch. The upshot of this is that everything is synchronous. This subtly changes a lot of the interface. For example, my Language API example looks like this:

load('ajax-environment.js');
load('http://www.google.com/jsapi');
google.load('language', '1');
google.language.translate("Hello world", "en", "es", 
  function(result) { 
    print(result.translation);
  });

it of course prints: Hola mundo.

I’ve put the source up on github. Have a play, tell me what you think.

Out with the old, in with the goo(gle)

Some time ago I reworked my home page to feature content from various other sites I post to (blogs, flickr, delicious) by using some JSON tricks to pull in their feeds. I blogged about how to do this with Feedburner’s JSON API, so that my actual page was just static HTML and all the work was done client-side.

Last week I decided to revisit this using Google’s new AJAX feeds API. Feedburner‘s API never seemed to be well supported (it came out of a hackathon) and it forced me to serialize my requests. In the process I neatened up a bunch of the code.

Google’s API is pretty straight-forward. It uses a loader model that is similar to Dojo‘s dojo.require, so you load the main Google library:

<script src="http://www.google.com/jsapi?key=YOURAPIKEY"
  type="text/javascript"></script>

and then ask it to load their feed library:

google.load('feeds', '1');

They have a handy way of setting a callback to be called when the libraries are loaded:

google.setOnLoadCallback(function () { /* just add code */ });

By putting all three of these together we have a straight-forward way to execute code at the right time.

I refactored the code that inserts the feed data into the page a lot. I fleshed out the concept of input filters from simply filtering the title to filtering the whole item objects. This allows for a more flexible transformation from the information that is presented in the RSS feeds to information that I want to present to visitors to my page. In practice I only used it to remove my name from Twitter updates. Instead of hard-coding the DOM node creation like I did in the previous version of the code I moved to a theme model. The theme function takes a feed entry and returns a DOM node to append to the target DOM node.

The flexibility of Google’s API let me abandon my separate code path for displaying my Flickr images. Previously I used Flickr’s own JSON feed API but since Google’s feed API supports returning RSS extensions I used the Flickr’s MediaRSS compliant feeds to show thumbnails and links. They even provide a handy cross-browser implementation of getElementsByTagNameNS (google.feeds.getElementsByTagNameNS) for us to use.

I’m tempted to write a client-side implementation of Jeremy Keith‘s lifestream thing using this API.

Take a look at the code running on my home page or check out the script.