Detecting advanced CSS features

I decided to do some fancy typography in my résumé, but I wanted to have a sane fallback for browsers that don’t support the latest features. I have my name rotated 90 degrees using CSS3 transforms, but simply applying the transform: rotate(-90deg) CSS property isn’t enough to get the effect I want. To make it look right I also need to measure the text (since its dimensions will change depending on the viewer’s fonts) and absolutely position it based on that. Standard CSS fallback (where unrecognized rules are ignored) doesn’t give a good result. At all.

After a bunch of fiddling and cross-browser testing I found that testing the existence (!==undefined) of properties on element.style worked well. So I put all of my rotation rules in a CSS class that’s applied only if the style properties exist.

<style type="text/css">
  h1.rotated {
    -webkit-transform-origin: left bottom;
    -moz-transform-origin: left bottom;
    -o-transform-origin: left bottom;
    transform-origin: left bottom;

    -webkit-transform: rotate(-90deg);
    -moz-transform: rotate(-90deg);
    -o-transform: rotate(-90deg);
    transform: rotate(-90deg);

    position: absolute;
    left: 0px;
    top: 0px;
    margin-top: 16px;
  }
</style>
<script type="text/javascript">
    // rotate the title, if that's even possible
    var h1 = document.getElementsByTagName('h1')[0];
    var s = document.body.style;

    // test for the presence of CSS3 transform properties
    if (s.transform !== undefined ||
        s.WebkitTransform !== undefined ||
        s.MozTransform !== undefined ||
        s.OTransform !== undefined) {
      // move
      h1.setAttribute('style',
          'top:' + ( h1.offsetWidth - h1.offsetHeight) + 'px');
      // and rotate
      h1.setAttribute('class', 'rotated');
    }
</script>

Of course, if I could just check for transform: support rather than -webkit-transform:, -moz-transform: and -o-transform: but we’re not there yet. Anyway, based on browsershots.org it seems to do the right thing on every known browser. It shows rotated text in very recent Webkit and Gecko browsers, and in Opera nightlies, and unrotated, correctly text everywhere else.

Implementing Information Retrieval Systems

I believe that it’s valuable to implement software components that you use from scratch merely to gain a better understanding of the systems you use. I’ve never implemented an operating system from scratch or designed and implemented a programming language like many of my friends but I’d really like to one day. Part of the problem is that looking at a large, complicated, system from the outside is intimidating.

A couple of years ago I spent many months working with CLucene, the C++ port of the Lucene IR (full text search). Both CLucene and Lucene are huge, complicated and often appear to be buggy. While learning about the Lucenes I did a bunch of reading about full text search theory. It never seemed that complicated, but all of the implementations I came across clearly were.

A basic full text search engine works like this:

  • collect all of the words in the input documents
  • simplify the words so that similar ones are the same, so I can search for “look” and get “looked”, “looking” and “looker”
  • organize the words in a special tree data structure called a trie that lets you look up the words in linear time, the leaves of the trie for each word indicate which documents it’s present in
  • to search for a word you walk the trie and get the list of documents

Yesterday I spent a couple of hours walking (Tel Aviv public transport was failing me) and I worked through most of how to implement this in my head. I also thought about how to implement phrase matching. I’m not really sure how phrase matching works in other systems (like I said, they’re too complicated for me to understand easily) but I represent each instance of a word in a document and then link from one instance to the instance of the next word in the document. That way after walking down the trie to find the first word in the phrase I can walk along each word in the phrase, comparing it to the next word in the document.

Last night I implemented it in Python. I called it Tripe and it’s less than 250 lines of Python. It doesn’t have a stemmer worth talking about and it’s pretty space inefficient but it seems to generally work. There’s a library file called tripe.py and then some test utilities: tripe_add.py (add documents), tripe_search.py (search the index), tripe_dot.py (visualize the index with Graphviz). Take a look:

% echo "The cat sat on the mat." | ./tripe_add.py test.tripe 1
% ./tripe_dot.py test.tripe|dot -Tpng -o test1.png

% echo "The quick brown fox..." | ./tripe_add.py test.tripe 2
% ./tripe_dot.py test.tripe|dot -Tpng -o test2.png

% echo "There is a light that never goes out." | ./tripe_add.py test.tripe 3
% ./tripe_dot.py test.tripe|dot -Tpng -o test3.png

% ./tripe_search.py test.tripe the
matched in document 1 at 15
matched in document 1 at 0
matched in document 2 at 0
% ./tripe_search.py test.tripe cat
matched in document 1 at 4

I first implemented this as a tree in memory, but I found that implementing persistent storage was really easy too. I’ve always been hesitant to implement structured binary file formats, and I know that efficient implementation can be really complicated, but a naive implementation like I did was relatively straight-forward.

So now what? There are already a bunch of full-text search engines out there, but they all seem relatively complicated. I wonder if it’s worth tidying up what I have, extending it enough to be useful for site search applications and publishing it. In the meantime it’s on github.