Category Archives: Default

The Problem With Democracy

The problem with democracy is that the people of a country lose the right to consider themselves independent of government policies. If a government does or says awful things then the people of the democracy have to bear responsibility.

Citizens of totalitarian regimes can’t really be held responsible for what their governments do. I don’t consider the Chinese people as a whole responsible for the repression of Uyghurs or Ai Weiwei. I don’t hold the Saudi people responsible for the domestic repression of religious minorities or for their government’s support for the coup in Egypt.

Australians have elected governments that lock up asylum seekers in private prisons in Australia and in our poor neighbors. The conditions are awful, there is endemic violence (including sexual violence) against the asylum seekers. It’s an embarrassment, but more than that it’s a national shame – a stain on our character as a people.

Israelis just reelected Likud and Benjamin Netanyahu, right after he declared his opposition to ending the occupation and expressed deeply racist concerns about the idea that non-Jewish Israeli citizens actually voting in the election. Not all Israelis voted for him, but as a democracy, Israel as a whole carries the responsibility for his words and actions.

By being the citizen of a democracy we are taking on responsibility for the actions of our government, whether we are in the majority or not. We have the responsibility to speak up and advocate for what is just and right and true. But we can’t escape our shared responsibility for our nations’ crimes and mistakes.

The Kevin Bacon of Music

Who is the most connected musician? Mathematics had Paul Erd?s, film has Kevin Bacon, but who is the center of the musical world. At some point I flippantly suggested Brian Eno because of this long career as a producer. Rich Trott brought this up again recently when linking to his awesome Music Routes site.

Rich has now published the data that backs Music Routes. This let me write some code to calculate who the most connected musician is. The criteria I chose was to find the artist with the lowest average distance from all other musicians in the largest network of connected musicians. I started researching shortest path algorithms and quickly discovered that the Floyd-Warshall algorithm will give me the shortest path between all nodes in a graph in only O(n³). And scipy has an implementation.

I ended up with an IPython notebook to calculate the most connected artist. Rich’s data includes 10479 people on 4835 tracks. He admits that it ultimately reflects his tastes because he has entered most (perhaps all) of the data himself, but that’s still a serious number of data points. The Floyd-Warshall was the slowest part of the calculation. I left it running over night and once it was complete saved off the results in case I ever wanted to run this again.

The result? The artist in the largest group of connected artists (10065) with the shortest average distance from each other artist in the network was Jim Keltner, a session drummer with an average distance of 3.226. He’s followed closely by Paul McCartney (3.239), Bob Dylan’ (3.322) and Elvis Costello (3.333).

This is an alright result, but it really feels kind of like “small data”. I downloaded the discogs.com database dump with 3.6M artists on 5.3M releases. I need to work out how to run an all-pairs shortest-path algorithm for that without needing terabytes of RAM.

 

Streaming royalties: taking over, but what are the details

So TechCrunch posted that Kobalt, a company who collects digital music publishing revenue for artists has announced that Spotify revenue has overtaken iTunes revenue by 13% in Europe. That’s interesting, not surprising given the trends but is missing answers to a few key questions:

  • What are the overall revenues – does this represent an overall drop in incomes with iTunes revenue dropping quicker than Spotify revenue grows, or are artists incomes holding steady?
  • How does the buy vs stream ratio look across the spectrum of music popularity? Pop vs long tail?

TC of course failed to link to Kobalt’s blog post. Here it is. It doesn’t contain much more content.

Horse Meat

I like horse meat. It’s delicious and healthy. And not so different from beef. I’m really enjoying watching the unfolding European horse meat scandal. Even countries like France where horse is regularly eaten are outraged that they’ve been lied to.

A madrinha alerta

The scandal has exposed the complicated supply chain in the European cheap meat trade. It’s exposed other lovely facts like that a “beef burger” in the UK only needs to be 47% beef. What’s the rest of it? Pretty much anything, but generally protein powder and highly processed meat off-cuts.

In the 1990s the UK banned mechanically recovered meat, commonly referred to as “pink slime”, after it was linked to the spread of CJD, the human form of mad cow disease. Pink slime was replaced by “de-sinewed meat” in cheap meat products until last year when it was reclassified and no longer allowed in cheap burgers. The way I look at it, this means, since we’re in the process of eliminating pink slime, we’re about 15 years away from this scandal here in the US.

I think it’s great whenever people are exposed to their food chain. We need to demand more accountability, transparency and integrity. If that means we can’t afford to eat meat every day, but the meat we do it is of higher quality then that’s a fine outcome.

I’d go so far as to say I’m Lovin’ It.

2012, The Year of the Linux Personal Computer

2012 Q3 PC sales: 87.5M
2012 Q3 Android sales: 122.5M

For sure, many would-be PC buyers were waiting for Windows 8 and refreshed models that were waiting for Windows 8 to be released, but that still means that last quarter 1.4 times as many Linux computers were sold than Windows computers.

You might try to argue that an Android device isn’t a personal computer, but apart from writing software, everything I do on my computer I do on my Android devices. You might argue that Android isn’t “Linux” enough, but it’s certainly largely open source and runs a Linux kernel. There’s plenty I don’t like about the way Android is put together compared to a traditional stock Unix system, but hey, look at the numbers – 122.5M Linux computer shipped last quarter. That’s a whole lot of Freedom!

Cloudy with a chance of downtime

AWS went down again last Friday. I wouldn’t normally care, I only run non-critical toy projects out of their infrastructure, but I know that it disrupted a friend’s wedding and that’s just not cool.

Amazon’s public statement about the event is fairly detailed and fairly believable. In one of their northern Virginia datacenters “each generator independently failed“. They don’t state how many generators they have, but their vagueness and references to “primary and backup power generators” seem to indicate that they have two.

Since they had UPS systems, a power outage with generator failure from 7:24pm PDT meant that the datacenter only lost power between 8:04pm PDT and 8:24pm PDT, and apparently many systems had power restored from 8:14pm, PDT. So why was the outage for customers so long?

The majority of EBS servers had been brought up by 12:25am PDT on Saturday. However, for EBS data volumes that had in-flight writes at the time of the power loss, those volumes had the potential to be in an inconsistent state.

I always understood that the value of having a UPS was two-fold, you could survive small power interruptions and you could safely shut down so that when power was restored your systems would return without requiring manual intervention. The Amazon cloud does not seem to be good at the latter.

At the most basic level it would seem prudent to force EBS servers to switch to a more cautious mode as soon as grid power is lost. If a server is running on batteries or even on a generator then forcing disks to remain in a consistent state is a pretty basic precaution. How hard is it to mount -o remount,sync automatically? Obviously there’s performance degradation with that, but it seems a small price to pay in the rare occasion when there’s clear and present risk of data loss. Who wouldn’t take an occasional performance hit in exchange for reliable disks and shorter outages?

Bringing back EC2 instances is a harder problem. Fundamentally the machines that run EC2 instances don’t know or care much about the VM images that run on them. That’s what makes them easy to manage, that’s what makes it easy to spin up new images. On the other hand my simple web service that went down for hours last week does simply boot up. Because it’s deployed into this automatically managed cloud it has to. Had I been running on my own hardware in the exact same datacenter my downtime would have been  on the order of 20 minutes rather than hours.

Because we’re building on top of a system involving half a million servers for compute alone we’re subject to the complexities of very large scale systems, even for our very simple systems. Each time a set of cascading failures causes extensive downtime we have to ask ourselves if the benefits of such complicated systems outweigh the cost.

Seven Inches

This year at Google I/O I got a Nexus 7, the new tablet from Google and Asus. First of all Android 4.1 Jelly Bean is great – it has a ton of incremental improvements over the already excellent ICS, plus Google Now, which promises to be a really useful daily tool.

Last year I got the iPad styled Galaxy Tab 10.1 at I/O. It’s a beautiful piece of hardware and the Honeycomb OS it came with was lovely. Honeycomb’s spectacular GMail and Calendar apps have only improved slightly in ICS and JB and remain one of the main reasons I like Android so much. Nonetheless I never found myself actually using my Galaxy Tab. I would take it on planes to play games or watch movies and use it to read books in hotels or occasionally at home but it never became part of my daily life.

I’m writing this on my Nexus 7. I’ve taken it pretty much everywhere I’ve been since I first opened it up. It fits into every bag I carry, can squeeze into my back pocket and is comfortable to use one handed on a busy BART train in the morning. It has the first touch keyboard that feels like the right size – larger tablets fail at thumb typing, and Google’s predictive keyboard is invaluable.

Overall I feel like I have a new tool, not just another gadget.

LXC on Ubuntu 11.04 Server

For a while I’ve been interested in Linux Containers (LXC), new way of providing Linux virtual machines on Linux hosts. Unlike machine virtualization systems like Xen, VMWare, KVM and VirtualBox, LXC is an OS-level virtualization system. Instead of booting a complete disk image Linux Containers share the same kernel as the host and typically use a filesystem that is a sub-tree of the host’s. I’d previously tried to get this running on a machine that was connected by WiFi, but basically that doesn’t work. Bridged network interfaces don’t play nicely with wireless network interfaces. I just set up a machine on wired ethernet, found a great introductory article, and I’m up and running.

Stéphane Graber‘s article is great, but it hides a bunch of the details away in a script he wrote. Here I’m going to explain how I got LXC up and running on my Ubuntu 10.04 system as simply as possible.

Install the required packages
Everything you need to set up and run LXC is part of Ubuntu these days, but it’s not all installed by default. Specifically you need the LXC tools, debootstrap (which can create new Ubuntu / Debian instances) and the Linux bridge utilities.

apt-get install lxc debootstrap bridge-utils

Set up the network
To put the containers on the network we use a bridge. This is a virtual ethernet network in the kernel that passes ethernet frames back and forth between the containers and the physical network. Once we’ve set this up our current primary network interface (eth0) becomes just a conduit for the bridge (br0) so the bridge should be used as the new primary interface.

Add the bridge in /etc/networking/interfaces:

# LXC bridge
auto br0
iface br0 inet dhcp
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

And change eth0 to be manually configured, ie: it doesn’t DHCP:

auto eth0
iface eth0 inet manual

Now bring up the interface:

ifup br0

Set up Control Groups
Control Groups are a fairly new Linux mechanism for isolating groups of processes as well as managing the resources allocated to them. The feature is exposed to userland via a filesystem that must be mounted, so put this in your /etc/fstab:

cgroup          /cgroup         cgroup

Create the mount point, and mount it:

mkdir /cgroup
mount /cgroup

Now we’re ready to create containers
An LXC container is a directory under /var/lib/lxc. That directory contains a configuration file named config, a filesystem table called fstab and a root filesystem called rootfs. The easiest way to do that is to use the lxc-create script. First we create a configuration file that describes the networking configuration, let’s call it network.conf:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0

and then run lxc-create to build the container:

lxc-create -n mycontainer -t natty -f network.conf

Note, -n let’s you specify the container name (ie: the directory under /var/lib/lxc), -f lets you specify a base configuration file and -t lets you indicate which template to use.

Container templates are implemented as scripts under /usr/lib/lxc/templates. Given a destination path they will install a base Linux system customized to run in LXC. There are template scripts to install Fedora, Debian and recent Ubuntus as well as minimal containers with just busybox or an sshd. Template scripts cache completed root filesystems under /var/cache/lxc. I’ve found the container template scripts to be interesting and quite readable.

Starting a container
Starting the container is as simple as:

lxc-start --name mycontainer --daemon

If you leave off the --daemon then you’ll be presented with the container’s console. Getting a console is as simple as:

lxc-console --name mycontainer

Keeping containers straight
Each time a container is started it will be allocated a random MAC (ethernet) address. This means that it’ll be seen by your network as a different machine each time it’s started and it’ll be DHCPed a new address each time. That’s probably not what you want. When it requests an address from the DHCP server your container will pass along its hostname (eg: mycontainer). If your DHCP server can be configured to allocate addresses based on hostname then you can use that. Mine doesn’t so I assign static ethernet addresses to my containers. There’s a class of ethernet addresses that are “locally administered addresses”. These have a most-significant byte of xxxxxxxx10, ie: x2, x6, xA
or xE. These addresses will never be programmed into a network card. I chose an arbitrary MAC address block and started allocating addresses to containers. They’re allocated by adding the following line after the network configuration in the /var/lib/lxc/mycontainer/config file:

lxc.network.hwaddr = 66:5c:a1:ab:1e:01

Managing your containers
There are a few handy tools for container management, beyond the lxc-start and lxc-console mentioned before.

lxc-stop --name mycontainer does what you’d probably expect.

lxc-ls lists available containers on one line and running containers on the next. It’s a shell script so you can read it and work out how to find which containers are running in your own scripts.

lxc-ps --lxc lists all process across all containers.

There are more like lxc-checkpoint, lxc-freeze and lxc-unfreeze that look pretty exciting but I haven’t had a chance to play with them. There are also a set of tools for restricting a container’s access to resources so that you can prioritize some above others. That’s a story for another day.

We already have information, why do we need more data?

Jawbone Up
I love gadgets and metrics and pretty graphs. I’ve been using Endomondo to track my cycling, primarily my commute. I know it’s about 6.5km an I ride it in between 22 and 26 minutes. I can even share or embed my workout here. I love that shit. When I get to work in 21 minutes I feel awesome. What it doesn’t tell me is that riding to work is good for my health. I don’t need a fancy app on my $500 mobile phone to tell me that.

Fancy pedometers like the Fitbit or Jawbone’s newly announced Up are neat gadgets, but are they really going to help anyone become healthier? More importantly are they going to help stem the slide of some countries into unhealthy patterns like obesity?

The target market for these devices is affluent and health conscious. They already know that they should be eating less fat, more fiber and exercising more. They can afford a good bicycle, a personal trainer or fancy running shoes. Anyone who is forking over a hundred dollars for a pedometer already knows more about the impact of their lifestyle on their health than these gadgets will give them.

These gadgets aren’t doing anything to educate less health literate Americans about living healthier lifestyles. They aren’t doing anything to address childhood nutrition here or around the world. They’re a distraction from the real problems we face.

I still kind of want one.