Python generator fun

I know Python’s iterators and generators aren’t that new anymore, but at heart I’m still a Python 1.5 programmer. I’ve come to iterators and generators in Python from my experience in JavaScript.

This morning I wanted to generate a list of names (for nodes in my NFA/DFA pattern matching code) that looked like: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, AA, AB, AC, AD, AE, AF, AG, AH, AI, AJ, AK, AL, AM, AN, AO, AP, AQ, AR, AS, AT, AU, AV, AW, AX, AY, AZ, BA, BB, BC, BD, etc… A couple of minutes of fiddling I came up with:

def name_generator():
  from string import uppercase
  from itertools import chain
  for n in chain([''], name_generator()):
    for c in uppercase:
      yield n+c

It think the code is pretty simple, pretty readable, and pretty efficient. It’s not rocket science, but I feel that iterators and generators have paid off well for Python.

Posted in Default | Tagged | 1 Comment

Implementing text pattern matching languages in LLVM

We use pattern matching languages all day long. From shell filename matching rules (fnmatch) in our shells and shell utilities like find and locate to regular expression matching in programming languages, configuration files and shell utilities like grep and sed. These have typically been implemented by parsing the pattern into data structures and walking those data structures as input is processed. Obviously, these work fairly well – we couldn’t live without them, but can we do better? I’m thinking particularly about patterns that are matched against many times like the input to GNU locate (matched against every path on your system) or the routing table of your web application (matched for every request)?

Pattern matching languages are programming languages. If we were looking to speed up long-running conventional programs an obvious approach would be to use a virtual machine with a JIT to end up with efficient native code at the cost of slower startup.

I’ve been experimenting with this and the LLVM compiler infrastructure project. LLVM is a set of libraries and UNIX utilities that provide an assembler, optimizer, interpreter, and native compiler for a high level SSA intermediate form. So far I’ve implemented a simplified subset of the POSIX fnmatch function’s functionality as a proof of concept. It’s pretty hacky, but it’s up on github.

The performance isn’t great, it’s 20% slower than GLIBC in my trivial testcase, but so far all I’ve got is a really naiive implementation. I’m going to implement an NFA / DFA based algorithm which should be more efficient, and easier extend to full regular expressions. Hopefully it’ll look more useful then.

Posted in Default | Tagged , , | 1 Comment

Understanding the OAuth vulnerability

Last night’s OAuth Security Advisory 2009.1 was a little light on the details. The blog post wasn’t much better. I was peripherally involved in the OAuth spec development and I couldn’t work out what the advisory meant without a bunch of thinking and spec reading so I thought I’d try to explain it in simpler terms here.

For my example I’ll use the real service Twitter and a theoretical service Twitten that lets users post to to Twitter in LOL-speak and authenticates via OAuth. Alice and Bob will be my attacker and victim.

Alice’s normal authentication process goes like this:

  1. Alice loads twitten.com/login
  2. Twitten creates a regular HTTP session for Alice
  3. Twitten asks Twitter for an unauthorized token
  4. Twitten redirects Alice to an URL on the Twitter servers that will allow her to authorize the token
  5. Alice clicks OK to authorize the token
  6. Twitter redirects Alice back to Twitten
  7. Twitten exchanges its unauthorized token for an access token (associated with Alice’s account) with Twitter and stores it in Alice’s session
  8. Alice makes inane posts on Twitter via Twitten

The vulnerability here is in step 4. If instead of going to the authorization URL Alice convinces Bob to go there and authorize Twitten she can gain access to his account. Like this:

  1. Alice loads twitten.com/login
  2. Twitten creates a regular HTTP session for Alice
  3. Twitten asks Twitter for an unauthorized token
  4. Twitten tells Alice what URL to go to to authorize the token, but she doesn’t go there
  5. Alice tells Bob, “if you love Twitter and kittens try out Twitten – go to http://twitter.com/oauth/authorize/….” (the authorization URL from Twitten)
  6. Bob loads the authorization URL with his Twitter credentials and authorizes the token
  7. Twitten requests Twitter to exchange the unauthorized token for an access token (associated with Bob’s Twitter account) and stores it in Alice’s session
  8. Alice goes to twitten.com and posts “OMG PWNd” to Bob’s twitter account

I’m not really sure how to address this issue. It’s fundamentally hard to establish trust between three parties over insecure communications. Hopefully more experienced people than me will come up with clever answers.

Update: changed wording to match Eran’s suggestion, his blog post on the subject is excellent reading.

Posted in Default | Tagged , | 6 Comments

Restarting an AIR application

For reasons too complicated and secret to go into here I’m writing an Adobe AIR application that needs to restart itself occasionally. I didn’t find any clear documents describing how to do this but after some reverse engineering and experimentation, here’s what I came up with.

The air.swf movie that’s used by web pages to install and launch AIR applications calls some internal, undocumented APIs to do this, and so can you!

namespace {
  import adobe.utils.ProductManager;
  import mx.core.Application;
  public class Restart {
    public static function restart() : void {
      // request that a new instance of the application be launched
      new ProductManager('airappinstaller').launch('-launch ' +
        Application.application.nativeApplication.applicationID + ' ' +
        Application.application.nativeApplication.publisherID);
      // exit the current instance
      Application.application.nativeApplication.exit(0);
    }
  }
}

I don’t know if or when this will break but it’s pretty straight-forward, once you work it out. It feels to me like there might be some kind of race condition, but it seems to work alright. Oh also, I’ve only tried this on Mac.

Posted in Default | Tagged , | Leave a comment

Mozilla and WebKit, browser platform wars.

This post began as a comment on Matthew Gertner’s blog post The Browser Platform Wars. It’s a rant not an article, don’t take it personally.

In my experience (8 years building Mozilla based products and playing with WebKit since it was first released as WebCore in 2003) there are a few clear technical and social differences that can make WebKit a more attractive platform for developers than Mozilla. There are plenty of reasons that Firefox is a better product than Safari (I definitely prefer Firefox over Safari on my Mac), but that’s a different story.

The scale and complexity of the Mozilla codebase is daunting. Mozilla advocates will say that that’s because Mozilla provides more functionality, but the reality is that even if you don’t want all that functionality you still have to dig through and around it to get your work done. Much of the Mozilla platform is poorly documented, poorly understood and incomplete (the C++/JS binding security stuff was the most recent example I’ve looked at) while WebKit is smaller, simpler and newer. They use common c++ idioms instead of proprietary systems like XPCOM.

The scale of the Mozilla organization is also daunting. Mozilla’s web presence is vast and is filled with inaccurate, outdated content. Their goals are vague and mostly irrelevant to developers. By contrast WebKit’s web site is simple and straight-forward. Its audience is developers, it sets out goals that matter and make sense to developers, it explains clearly the process for participating and contributing in the project.

WebKit is designed for embedding. Within Apple there are several customers for the WebKit library – Desktop Safari, iPhone Safari, Dashboard, AppKit and more. Since WebKit already serves a variety of purposes it’s likely to work for other applications which third party developers will want to build. By comparison the Mozilla platform really only has one first-class customer – Firefox.

The WebKit community has welcomed non-employee contributors. They’ve even welcomed contributors who work for Apple’s competitors. There are WebKit reviewers from Google, Nokia and the open source community. By comparison, Songbird and Flock don’t have any Mozilla committers or reviewers who weren’t previously Mozilla Corporation employees even though they are two of the largest non-MoCo platform customers.

Perhaps I’m short-sighted, but I don’t see a clear path forward for Mozilla in competing with WebKit as a platform for web content display. The long history of Mozilla have left them with a large, complicated codebase that’s not getting smaller. The rapid growth and defensive attitude of the organization (probably brought on by the Netscape / IE wars) has left it without a culture that welcomes friendly competition. I think that Mozilla’s focus on the product above the platform is the right decision for them. I’m just glad we have an alternative web content platform.

Posted in Default | Tagged , , , , , , | 8 Comments

Three Months of ActionScript

I’ve been working largely in ActionScript 3 for the past three months. After spending four years working primarily in JavaScript I didn’t expect to encounter too many problems with her cousin. That expectation was pretty well borne out.

I was particularly excited at the chance to play with ActionScript’s optional strict typing since that has been under consideration for future versions of JavaScript / ECMAScript. AS3’s strict typing hasn’t felt optional to me because I’ve been largely using the Flex Builder IDE and it excitedly warns me every time I fail to specify a type on a variable or function signature, so I put them everywhere. Instead of implementing encapsulation using conventions like I do in JS I use AS3’s Interfaces. This felt good at first but I quickly realized that I couldn’t do the kinds of tricks that I’d become accustomed to in JS. I can’t create a typed object in a function and return it without defining a class for it in its own file. This has made the higher level structure of my software far closer to Java than to JavaScript.

On the other hand, inside my functions I’m pretty much writing JavaScript. All of the standard JS built-in objects (Object, Array, Math, etc) behave like modern JS implementations. I use anonymous functions to manipulate and filter arrays. I use Objects and Arrays as my primary data structures.

At this point I’m pretty used to AS3. I know how to design my code and I can be pretty productive, not needing to rewrite everything too many times. I’d like to see Adobe mix the Java-style classes and interfaces up with the JavaScript habit of declaring anonymous functions and objects as you go. I’d be thrilled if I could write something like:

var x : IMyInterface = new IMyInterface {
method: function() : void { ... },
field: 42
};

I see that they’re extending the language to support limited generics support so hopefully that won’t be the end of it. I’m looking forward to experimenting more on the platform.

Posted in Default | Tagged , , , , | Leave a comment

A Different Model For Web Services Authorization

In my last post I set out to describe how easy it is to extract private keys from desktop software. As I was concluding I stumbled on an alternative approach that might be more secure in some circumstances. I didn’t really go into details, so here’s an expansion of the idea.

Current API authentication mechanisms including Flickr Auth, OAuth, Yahoo BBAuth and Google AuthSub work by allowing users to grant an application the right to act for the user on the site. Some implementations allow the user to grant access to only a subset of the functionality — Flickr lets users grant just read access, Google lets users grant access to just one class of data (for example Google Docs but not Google Calendar). But still, the user doesn’t know which specific operations they’re allowing to be carried out. This wouldn’t matter so much if the user can trust the application that they’re granting access to, but there’s no way to provide for that completely.

Another approach might be to give third-party applications the ability to submit sets of changes to be applied to a user’s account, but require the user to approve them through the service provider’s web site before they’re applied to the the user’s real data. For example, a Flickr upload might look like this:

  • Client makes API request flickr.transaction.create(total_files=20, total_data=20M)
  • Flickr responds with transaction id X and a maximum lifetime for the transaction
  • Flickr responds with an URL to direct the user’s browser to to:
    • associate the transaction with an account, and
    • preview, permit and commit the changes

To a user this isn’t going to look or feel to different from the current Flickr upload scenario where on completion they’re presented with a set of uploaded photos and offered the opportunity to make metadata changes.

The key advantage here from a security standpoint is that users approve actions not applications. The challenge of expressing a set of changes to a user clearly is not small but unlike the challenge of identifying and trusting applications it’s solvable.

This model probably only makes sense for write-oriented sessions, and for infrequent, large write sessions (such as Flickr uploads) rather than frequent, small writes (like Twitter posting).

These ideas aren’t well thought out in my head, they’re fresh from me trying to come up with a workable model for API authentication and authorization that doesn’t depend on trusting the identity of clients. I’d really welcome feedback and ideas on how to take this forward and feedback implementers on the feasibility of adopting a model like this.

Posted in Default | Tagged , , , | 8 Comments

No More Secrets

Using secret keys to identify applications communicating across the internet has become popular as people have copied the very successful Flickr authentication API. Unfortunately people trust that they can keep these keys secret from attackers, even as they distribute applications that contain the secret keys to other people. I decided to see how hard it would be to extract the secret key and API key from Flickr’s new XULRunner based Uploadr.

While the Uploadr is nominally open source, they don’t include the key in the source code. In the README they instruct anyone building it to acquire their own key and modify the source code to include that key:

You'll need your own API key and secret from Flickr to build Uploadr.
These can be obtained at http://flickr.com/services/api/.  The key
and secret must be placed in flKey.cpp as indicated.  This file is at:
  UPLOADR/MacUploadr.app/Contents/Resources/components/flKey.cpp

The API key is stored as a string.  The secret is stored as individual
characters so it is not easily readable from the binary.

Such noble goals.

Getting the API key was as simple as running tcpdump:

sudo tcpdump -i en1 -vvv -n -s 0 -w /tmp/DumpFile.dmp

running the Uploadr, and then looking through the dump file to find where the API key is passed. Look:

api_key=a5c4c07991a15c6b56b88005a76e7774

In theory getting the secret key is harder, but it turned out to be even easier. It’s stored in the source code for the flKey::Sign method:

	// The shared secret
#ifdef XP_MACOSX
	bytes[0] = '-';
	bytes[1] = '-';
	bytes[2] = '-';
	bytes[3] = '-';
	bytes[4] = '-';
	bytes[5] = '-';
	bytes[6] = '-';
	bytes[7] = '-';
	bytes[8] = '-';
	bytes[9] = '-';
	bytes[10] = '-';
	bytes[11] = '-';
	bytes[12] = '-';
	bytes[13] = '-';
	bytes[14] = '-';
	bytes[15] = '-';
#endif

All I need to do is find that code, disassemble it and reassemble the string. I ran the Uploadr under gdb:

gdb '/Applications/Flickr Uploadr.app/Contents/MacOS/xulrunner'

and once it was up and running tried to find something to break on that would expose the secret key. There weren’t symbols for the flKey::Sign method that’s defined in the source file but I found I could break on MD5Init that’s called from that function. So:

break MD5Init
kill
run

And now I stop in the middle of the signature function – and from the stack trace I can see where flKey::Sign begins:

Breakpoint 1, 0x149a086b in MD5Init ()
(gdb) where
#0  0x149a086b in MD5Init ()
#1  0x149a1588 in flKey::Sign ()
#2  0x017bb45d in NS_InvokeByIndex_P ()
...

All I have to do is disassemble flKey::Sign:

(gdb) disassemble 0x149a1588
Dump of assembler code for function _ZN5flKey4SignERK9nsAStringRS0_:
0x149a1498 <_ZN5flKey4SignERK9nsAStringRS0_+0>:	push   %ebp
0x149a1499 <_ZN5flKey4SignERK9nsAStringRS0_+1>:	mov    %esp,%ebp
0x149a149b <_ZN5flKey4SignERK9nsAStringRS0_+3>:	push   %edi
0x149a149c <_ZN5flKey4SignERK9nsAStringRS0_+4>:	push   %esi
...
0x149a153e <_ZN5flKey4SignERK9nsAStringRS0_+166>:	movb   $0xXX,(%eax)
0x149a1541 <_ZN5flKey4SignERK9nsAStringRS0_+169>:	movb   $0xXX,0x1(%eax)
0x149a1545 <_ZN5flKey4SignERK9nsAStringRS0_+173>:	movb   $0xXX,0x2(%eax)
0x149a1549 <_ZN5flKey4SignERK9nsAStringRS0_+177>:	movb   $0xXX,0x3(%eax)
0x149a154d <_ZN5flKey4SignERK9nsAStringRS0_+181>:	movb   $0xXX,0x4(%eax)
0x149a1551 <_ZN5flKey4SignERK9nsAStringRS0_+185>:	movb   $0xXX,0x5(%eax)
0x149a1555 <_ZN5flKey4SignERK9nsAStringRS0_+189>:	movb   $0xXX,0x6(%eax)
0x149a1559 <_ZN5flKey4SignERK9nsAStringRS0_+193>:	movb   $0xXX,0x7(%eax)
0x149a155d <_ZN5flKey4SignERK9nsAStringRS0_+197>:	movb   $0xXX,0x8(%eax)
0x149a1561 <_ZN5flKey4SignERK9nsAStringRS0_+201>:	movb   $0xXX,0x9(%eax)
0x149a1565 <_ZN5flKey4SignERK9nsAStringRS0_+205>:	movb   $0xXX,0xa(%eax)
0x149a1569 <_ZN5flKey4SignERK9nsAStringRS0_+209>:	movb   $0xXX,0xb(%eax)
0x149a156d <_ZN5flKey4SignERK9nsAStringRS0_+213>:	movb   $0xXX,0xc(%eax)
0x149a1571 <_ZN5flKey4SignERK9nsAStringRS0_+217>:	movb   $0xXX,0xd(%eax)
0x149a1575 <_ZN5flKey4SignERK9nsAStringRS0_+221>:	movb   $0xXX,0xe(%eax)
0x149a1579 <_ZN5flKey4SignERK9nsAStringRS0_+225>:	movb   $0xXX,0xf(%eax)
...
0x149a168e <_ZN5flKey4SignERK9nsAStringRS0_+502>:	pop    %esi
0x149a168f <_ZN5flKey4SignERK9nsAStringRS0_+503>:	pop    %edi
0x149a1690 <_ZN5flKey4SignERK9nsAStringRS0_+504>:	pop    %ebp
0x149a1691 <_ZN5flKey4SignERK9nsAStringRS0_+505>:	ret
End of assembler dump.

And sure enough if I reconstruct those bytes, the secret key is: REDACTEDREDACTED

That wasn’t too hard was it? The fact that the code around the key was open source made it easier to find but even if it was closed source if I knew what kind of algorithm I was looking for it wouldn’t be hard to find and examine. It’s pretty easy to trace forward from user input or backward from network IO to find the signature algorithm, and the secret’s always the input to that.

Unfortunately OAuth follows the same model as Flickr, placing some value in a consumer’s “private” key. They do mention in Appendix B.7 that it’s often possible for attackers to access a secret it’s still a vital part of the protocol.

Since it’s not possible to securely identify clients we should move to a model where users approve actions against their accounts rather than granting carte-blanche actions to clients. For example Flickr could put photos that are uploaded into a staging area and require approval on the web site before including them in a user’s photo stream. This would allow users to make a informed decisions about third-party access to their accounts.

Updated: at Flickr’s request I’ve removed the “secret” key.

Updated again: I made another post about this topic, expanding on the final paragraph of this post.

Posted in Default | Tagged , , | 8 Comments

Closed Source

So now that I’m working on something that’s proprietary, closed source and in stealth mode, I’m finally doing stuff and learning how to do things that are really cool! Typical. Perhaps I should just start queuing up blog posts about the stuff I’ve discovered to push out live once we launch something other people could look at.

Posted in Default | Tagged | 3 Comments

Flash Development with Flex Builder

Dear Lazyweb, I’ve started doing Flash and Flex development. For me the Flex Builder IDE is significantly better than the Flash CS4 IDE, but when you build a SWF in Flex Builder it includes all of the MX widgetry. That’s too heavyweight for building simple Flash applets. Is it possible to get around that so I can use Flex Builder for my non-visual Flash development without bringing in MX?

Update: David Zuckerman a Flex Builder Developer dropped in to provide this invaluable advice:

Hey Ian, have you tried an ActionScript project in Flex Builder? It sheds most of the Flex weight, but you’ll still need to modify your project settings. In your setttings, go to the “ActionScript Build Path” tab, and open up the Flex SDK that’s listed there. Remove everything but playerglobal.swc (you can keep utilities.swc if you want). That should give you just the language definition, and none of Flex

I haven’t moved the project over to this fully but it looks like it works perfectly. Thanks David!

Posted in Default | Tagged , , | 5 Comments