Python's collections.defaultdict

Published:

Today I again came across code that I was able to make simpler, clearer and safer using collections.defaultdict. I keep coming across experienced Python programmers that don't know about it. Perhaps it's time to spread the good word.

The defaultdict type is a dict subclass that takes a factory function to supply default values for keys that haven't been set yet. For example

from collections import defaultdict
frequency = defaultdict(lambda:0)
for c in 'the quick brown fox jumps over the lazy dog':
  frequency[c] = frequency[c] + 1
Will count the frequency of characters in a string.

I often use defaultdict for dicts of dicts (defaultdict(dict)) and dicts of lists (defaultdict(list)).

defaultdict replaces some pretty simple code, for example the above code could be written:

frequency = dict()
for c in 'the quick brown fox jumps over the lazy dog':
  if c in dict:
    frequency[c] = frequency[c] + 1
  else:
    frequency[c] = 1
but I find using defaultdict is not just shorter but also much clearer.

The other classes in the collections class, especially OrderedDict and Counter (which is an implementation of the pattern I just implemented here on top of defaultdict) seem useful, but I've never found myself actually using them, whereas defaultdict is a common part of my repertoire these days.