Today I again came across code that I was able to make simpler, clearer
and safer using
collections.defaultdict
. I keep coming across experienced Python programmers that don't know
about it. Perhaps it's time to spread the good word.
The defaultdict
type is a dict subclass
that takes a factory function to supply default values for keys that
haven't been set yet. For example
from collections import defaultdict frequency = defaultdict(lambda:0) for c in 'the quick brown fox jumps over the lazy dog': frequency[c] = frequency[c] + 1Will count the frequency of characters in a string.
I often use defaultdict
for dicts of
dicts (defaultdict(dict)
) and dicts of
lists (defaultdict(list)
).
defaultdict
replaces some pretty simple
code, for example the above code could be written:
frequency = dict() for c in 'the quick brown fox jumps over the lazy dog': if c in dict: frequency[c] = frequency[c] + 1 else: frequency[c] = 1but I find using
defaultdict
is not just
shorter but also much clearer.
The other classes in the
collections
class, especially OrderedDict
and
Counter
(which is an implementation of
the pattern I just implemented here on top of
defaultdict
) seem useful, but I've
never found myself actually using them, whereas
defaultdict
is a common part of my
repertoire these days.