Cloudy with a chance of downtime

AWS went down again last Friday. I wouldn’t normally care, I only run non-critical toy projects out of their infrastructure, but I know that it disrupted a friend’s wedding and that’s just not cool.

Amazon’s public statement about the event is fairly detailed and fairly believable. In one of their northern Virginia datacenters “each generator independently failed“. They don’t state how many generators they have, but their vagueness and references to “primary and backup power generators” seem to indicate that they have two.

Since they had UPS systems, a power outage with generator failure from 7:24pm PDT meant that the datacenter only lost power between 8:04pm PDT and 8:24pm PDT, and apparently many systems had power restored from 8:14pm, PDT. So why was the outage for customers so long?

The majority of EBS servers had been brought up by 12:25am PDT on Saturday. However, for EBS data volumes that had in-flight writes at the time of the power loss, those volumes had the potential to be in an inconsistent state.

I always understood that the value of having a UPS was two-fold, you could survive small power interruptions and you could safely shut down so that when power was restored your systems would return without requiring manual intervention. The Amazon cloud does not seem to be good at the latter.

At the most basic level it would seem prudent to force EBS servers to switch to a more cautious mode as soon as grid power is lost. If a server is running on batteries or even on a generator then forcing disks to remain in a consistent state is a pretty basic precaution. How hard is it to mount -o remount,sync automatically? Obviously there’s performance degradation with that, but it seems a small price to pay in the rare occasion when there’s clear and present risk of data loss. Who wouldn’t take an occasional performance hit in exchange for reliable disks and shorter outages?

Bringing back EC2 instances is a harder problem. Fundamentally the machines that run EC2 instances don’t know or care much about the VM images that run on them. That’s what makes them easy to manage, that’s what makes it easy to spin up new images. On the other hand my simple web service that went down for hours last week does simply boot up. Because it’s deployed into this automatically managed cloud it has to. Had I been running on my own hardware in the exact same datacenter my downtime would have been  on the order of 20 minutes rather than hours.

Because we’re building on top of a system involving half a million servers for compute alone we’re subject to the complexities of very large scale systems, even for our very simple systems. Each time a set of cascading failures causes extensive downtime we have to ask ourselves if the benefits of such complicated systems outweigh the cost.

Seven Inches

This year at Google I/O I got a Nexus 7, the new tablet from Google and Asus. First of all Android 4.1 Jelly Bean is great – it has a ton of incremental improvements over the already excellent ICS, plus Google Now, which promises to be a really useful daily tool.

Last year I got the iPad styled Galaxy Tab 10.1 at I/O. It’s a beautiful piece of hardware and the Honeycomb OS it came with was lovely. Honeycomb’s spectacular GMail and Calendar apps have only improved slightly in ICS and JB and remain one of the main reasons I like Android so much. Nonetheless I never found myself actually using my Galaxy Tab. I would take it on planes to play games or watch movies and use it to read books in hotels or occasionally at home but it never became part of my daily life.

I’m writing this on my Nexus 7. I’ve taken it pretty much everywhere I’ve been since I first opened it up. It fits into every bag I carry, can squeeze into my back pocket and is comfortable to use one handed on a busy BART train in the morning. It has the first touch keyboard that feels like the right size – larger tablets fail at thumb typing, and Google’s predictive keyboard is invaluable.

Overall I feel like I have a new tool, not just another gadget.

Simply logging JavaScript calls.

When debugging complicated JavaScript one thing I find myself constantly doing is using console.log() to print out what functions are being called in what order. JavaScript is single-threaded and event driven so it’s often not entirely clear what functions will be called in what order.

Traditionally I’ve done something like this:

function foo(bar) {

but last night I came up with something better. It’s probably not completely portable but it seems to work fine in recent Chrome / Safari / Firefox, which is really all I’m going to be using for debugging anyway:

function logCall() {
  console.log( + '(' +
    .map(JSON.stringify).join(', ') + 

Just add logCall() to the start of functions and the function call (including serialized arguments) will be logged to the console. Easy and foolproof.

LXC on Ubuntu 11.04 Server

For a while I’ve been interested in Linux Containers (LXC), new way of providing Linux virtual machines on Linux hosts. Unlike machine virtualization systems like Xen, VMWare, KVM and VirtualBox, LXC is an OS-level virtualization system. Instead of booting a complete disk image Linux Containers share the same kernel as the host and typically use a filesystem that is a sub-tree of the host’s. I’d previously tried to get this running on a machine that was connected by WiFi, but basically that doesn’t work. Bridged network interfaces don’t play nicely with wireless network interfaces. I just set up a machine on wired ethernet, found a great introductory article, and I’m up and running.

Stéphane Graber‘s article is great, but it hides a bunch of the details away in a script he wrote. Here I’m going to explain how I got LXC up and running on my Ubuntu 10.04 system as simply as possible.

Install the required packages
Everything you need to set up and run LXC is part of Ubuntu these days, but it’s not all installed by default. Specifically you need the LXC tools, debootstrap (which can create new Ubuntu / Debian instances) and the Linux bridge utilities.

apt-get install lxc debootstrap bridge-utils

Set up the network
To put the containers on the network we use a bridge. This is a virtual ethernet network in the kernel that passes ethernet frames back and forth between the containers and the physical network. Once we’ve set this up our current primary network interface (eth0) becomes just a conduit for the bridge (br0) so the bridge should be used as the new primary interface.

Add the bridge in /etc/networking/interfaces:

# LXC bridge
auto br0
iface br0 inet dhcp
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

And change eth0 to be manually configured, ie: it doesn’t DHCP:

auto eth0
iface eth0 inet manual

Now bring up the interface:

ifup br0

Set up Control Groups
Control Groups are a fairly new Linux mechanism for isolating groups of processes as well as managing the resources allocated to them. The feature is exposed to userland via a filesystem that must be mounted, so put this in your /etc/fstab:

cgroup          /cgroup         cgroup

Create the mount point, and mount it:

mkdir /cgroup
mount /cgroup

Now we’re ready to create containers
An LXC container is a directory under /var/lib/lxc. That directory contains a configuration file named config, a filesystem table called fstab and a root filesystem called rootfs. The easiest way to do that is to use the lxc-create script. First we create a configuration file that describes the networking configuration, let’s call it network.conf: = veth = up = br0

and then run lxc-create to build the container:

lxc-create -n mycontainer -t natty -f network.conf

Note, -n let’s you specify the container name (ie: the directory under /var/lib/lxc), -f lets you specify a base configuration file and -t lets you indicate which template to use.

Container templates are implemented as scripts under /usr/lib/lxc/templates. Given a destination path they will install a base Linux system customized to run in LXC. There are template scripts to install Fedora, Debian and recent Ubuntus as well as minimal containers with just busybox or an sshd. Template scripts cache completed root filesystems under /var/cache/lxc. I’ve found the container template scripts to be interesting and quite readable.

Starting a container
Starting the container is as simple as:

lxc-start --name mycontainer --daemon

If you leave off the --daemon then you’ll be presented with the container’s console. Getting a console is as simple as:

lxc-console --name mycontainer

Keeping containers straight
Each time a container is started it will be allocated a random MAC (ethernet) address. This means that it’ll be seen by your network as a different machine each time it’s started and it’ll be DHCPed a new address each time. That’s probably not what you want. When it requests an address from the DHCP server your container will pass along its hostname (eg: mycontainer). If your DHCP server can be configured to allocate addresses based on hostname then you can use that. Mine doesn’t so I assign static ethernet addresses to my containers. There’s a class of ethernet addresses that are “locally administered addresses”. These have a most-significant byte of xxxxxxxx10, ie: x2, x6, xA
or xE. These addresses will never be programmed into a network card. I chose an arbitrary MAC address block and started allocating addresses to containers. They’re allocated by adding the following line after the network configuration in the /var/lib/lxc/mycontainer/config file: = 66:5c:a1:ab:1e:01

Managing your containers
There are a few handy tools for container management, beyond the lxc-start and lxc-console mentioned before.

lxc-stop --name mycontainer does what you’d probably expect.

lxc-ls lists available containers on one line and running containers on the next. It’s a shell script so you can read it and work out how to find which containers are running in your own scripts.

lxc-ps --lxc lists all process across all containers.

There are more like lxc-checkpoint, lxc-freeze and lxc-unfreeze that look pretty exciting but I haven’t had a chance to play with them. There are also a set of tools for restricting a container’s access to resources so that you can prioritize some above others. That’s a story for another day.

We already have information, why do we need more data?

Jawbone Up
I love gadgets and metrics and pretty graphs. I’ve been using Endomondo to track my cycling, primarily my commute. I know it’s about 6.5km an I ride it in between 22 and 26 minutes. I can even share or embed my workout here. I love that shit. When I get to work in 21 minutes I feel awesome. What it doesn’t tell me is that riding to work is good for my health. I don’t need a fancy app on my $500 mobile phone to tell me that.

Fancy pedometers like the Fitbit or Jawbone’s newly announced Up are neat gadgets, but are they really going to help anyone become healthier? More importantly are they going to help stem the slide of some countries into unhealthy patterns like obesity?

The target market for these devices is affluent and health conscious. They already know that they should be eating less fat, more fiber and exercising more. They can afford a good bicycle, a personal trainer or fancy running shoes. Anyone who is forking over a hundred dollars for a pedometer already knows more about the impact of their lifestyle on their health than these gadgets will give them.

These gadgets aren’t doing anything to educate less health literate Americans about living healthier lifestyles. They aren’t doing anything to address childhood nutrition here or around the world. They’re a distraction from the real problems we face.

I still kind of want one.

The full Bradley Manning / Adrian Lamo logs

(06:08:53 PM) What’s your greatest fear?
(06:09:24 PM) bradass87: dying without ever truly living
via Manning-Lamo Chat Logs Revealed | Threat Level |

Everyone should read (apparently) this full chat log. In it Manning, a 22 year old depressed man, questioning his gender reached out to Lamo for someone to talk to. While Manning poured out his story of growing up with abusive, alcoholic parents Lamo repeatedly assured him that he was safe to talk to. Manning denied having a “doctrine”, but he did have a clear sense of morality:
(1:11:54 PM) bradass87: and… its important that it gets out… i feel, for some bizarre reason
(1:12:02 PM) bradass87: it might actually change something
(1:13:10 PM) bradass87: i just… dont wish to be a part of it… at least not now… im not ready… i wouldn’t mind going to prison for the rest of my life, or being executed so much, if it wasn’t for the possibility of having pictures of me… plastered all over the world press… as a boy…

via Manning-Lamo Chat Logs Revealed | Threat Level |

When Lamo was chatting to him, Manning had been demoted and had lost his access to classified networks. Any good or ill he had done by leaking to WikiLeaks was over. He was going to be discharged. He’d started to set up his identity as a woman. He was just 22, trying to serve his country, serve humanity and work out who he was.

Emissions taxing and trading in Australia

POLLUTION After many years of proposals, counter-proposals, coups and general disappointment the Australian Government has announced its scheme to allow the economic impact of carbon pollution to be managed by the free market.

I’m really really proud that as a country we’ve reached the point where there’s a plan in place. It’s taken longer than it should have. In 2007 it was a policy of both major parties yet for a while more recently it had been a policy of neither. We’re one of the first countries in the world to pull this off.

The plan announced by Julia Gillard sets an initial price of AUD $23 per ton of CO2 produced, along with subsidies for many industries and tax cuts to low and middle income Australians. A few heavily polluting industries will take a hit, but that’s the idea. Businesses that aren’t pollution-centric seem to largely support the scheme.

There will be a transition in 2015 to a free market emission trading scheme. Amusingly Tony Abbot. leader of the conservative, supposedly free market Liberal Party opposes the idea of allowing markets to determine prices. There’s been some shrill, populist freak-out over new taxes, but the impact is likely to be small enough peoples’ lives that it won’t be an issue at the next election. I’m expecting the bursting housing bubble will be more of a worry.

As we sunset foursquare APIv1 and announce some amazing new milestones for APIv2, now seemed like as good a time as any to reflect on some of the decisions we made in APIv2 and see how they’re holding up. Fortunately, we were able to draw on both our experience with APIv1 and from tight iteration with our client developers (especially Anoop, who is awesome!). We’re pretty happy with the result, but we still made a few mistakes. Hopefully there are some lessons for anybody else out there who gets stuck designing an API.

via APIv2: Woulda, coulda, shoulda | Foursquare Engineering Blog.