|
| 1 | +# Collections Deep Dive — Part 1: defaultdict, Counter, OrderedDict |
| 2 | + |
| 3 | +[← Back to Overview](./collections-deep-dive.md) · [Part 2: deque, namedtuple, ChainMap →](./collections-deep-dive-part2.md) |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +Python's `collections` module provides specialized container types that go beyond the built-in `list`, `dict`, and `set`. This part covers the three dict-like types: `Counter`, `defaultdict`, and `OrderedDict`. |
| 8 | + |
| 9 | +## Why This Matters |
| 10 | + |
| 11 | +Every program needs to store and organize data. The built-in types handle most cases, but they have gaps. Need to count how often each word appears? `Counter`. Need a dict that automatically handles missing keys? `defaultdict`. Learning these tools saves you from writing (and debugging) boilerplate code. |
| 12 | + |
| 13 | +## `Counter` — count things |
| 14 | + |
| 15 | +The most intuitive way to count occurrences: |
| 16 | + |
| 17 | +```python |
| 18 | +from collections import Counter |
| 19 | + |
| 20 | +# Count letters in a string: |
| 21 | +letter_counts = Counter("mississippi") |
| 22 | +print(letter_counts) |
| 23 | +# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1}) |
| 24 | + |
| 25 | +# Count words in a list: |
| 26 | +words = ["apple", "banana", "apple", "cherry", "banana", "apple"] |
| 27 | +word_counts = Counter(words) |
| 28 | +print(word_counts) |
| 29 | +# Counter({'apple': 3, 'banana': 2, 'cherry': 1}) |
| 30 | + |
| 31 | +# Most common items: |
| 32 | +word_counts.most_common(2) |
| 33 | +# [('apple', 3), ('banana', 2)] |
| 34 | +``` |
| 35 | + |
| 36 | +Counter supports math operations: |
| 37 | + |
| 38 | +```python |
| 39 | +a = Counter("aabbcc") |
| 40 | +b = Counter("aabbd") |
| 41 | + |
| 42 | +a + b # Counter({'a': 4, 'b': 4, 'c': 2, 'd': 1}) |
| 43 | +a - b # Counter({'c': 2}) — only positive counts |
| 44 | +a & b # Counter({'a': 2, 'b': 2}) — minimum of each |
| 45 | +a | b # Counter({'a': 2, 'b': 2, 'c': 2, 'd': 1}) — maximum of each |
| 46 | +``` |
| 47 | + |
| 48 | +## `defaultdict` — dicts with automatic defaults |
| 49 | + |
| 50 | +A `defaultdict` never raises `KeyError` — it creates a default value automatically for missing keys: |
| 51 | + |
| 52 | +```python |
| 53 | +from collections import defaultdict |
| 54 | + |
| 55 | +# Group items by category: |
| 56 | +animals = [("cat", "Felix"), ("dog", "Rex"), ("cat", "Whiskers"), ("dog", "Buddy")] |
| 57 | + |
| 58 | +groups = defaultdict(list) # Missing keys get an empty list |
| 59 | +for category, name in animals: |
| 60 | + groups[category].append(name) |
| 61 | + |
| 62 | +print(groups) |
| 63 | +# defaultdict(<class 'list'>, {'cat': ['Felix', 'Whiskers'], 'dog': ['Rex', 'Buddy']}) |
| 64 | +``` |
| 65 | + |
| 66 | +Compare with regular dict: |
| 67 | +```python |
| 68 | +# Without defaultdict — verbose: |
| 69 | +groups = {} |
| 70 | +for category, name in animals: |
| 71 | + if category not in groups: |
| 72 | + groups[category] = [] |
| 73 | + groups[category].append(name) |
| 74 | + |
| 75 | +# With defaultdict — clean: |
| 76 | +groups = defaultdict(list) |
| 77 | +for category, name in animals: |
| 78 | + groups[category].append(name) |
| 79 | +``` |
| 80 | + |
| 81 | +Common default factories: |
| 82 | +```python |
| 83 | +defaultdict(list) # Missing keys → empty list [] |
| 84 | +defaultdict(int) # Missing keys → 0 |
| 85 | +defaultdict(set) # Missing keys → empty set set() |
| 86 | +defaultdict(str) # Missing keys → empty string "" |
| 87 | +defaultdict(dict) # Missing keys → empty dict {} |
| 88 | +``` |
| 89 | + |
| 90 | +Counting with `defaultdict(int)`: |
| 91 | +```python |
| 92 | +word_count = defaultdict(int) |
| 93 | +for word in "the cat sat on the mat".split(): |
| 94 | + word_count[word] += 1 |
| 95 | +# {'the': 2, 'cat': 1, 'sat': 1, 'on': 1, 'mat': 1} |
| 96 | +``` |
| 97 | + |
| 98 | +## `OrderedDict` — dict that remembers insertion order |
| 99 | + |
| 100 | +Since Python 3.7, regular dicts maintain insertion order. So when is `OrderedDict` still useful? |
| 101 | + |
| 102 | +```python |
| 103 | +from collections import OrderedDict |
| 104 | + |
| 105 | +# OrderedDict considers order in equality checks: |
| 106 | +d1 = OrderedDict([("a", 1), ("b", 2)]) |
| 107 | +d2 = OrderedDict([("b", 2), ("a", 1)]) |
| 108 | +d1 == d2 # False — different order! |
| 109 | + |
| 110 | +# Regular dicts do not: |
| 111 | +{"a": 1, "b": 2} == {"b": 2, "a": 1} # True |
| 112 | + |
| 113 | +# OrderedDict has move_to_end: |
| 114 | +od = OrderedDict([("a", 1), ("b", 2), ("c", 3)]) |
| 115 | +od.move_to_end("a") # a moves to end: OrderedDict([('b', 2), ('c', 3), ('a', 1)]) |
| 116 | +od.move_to_end("c", last=False) # c moves to start: OrderedDict([('c', 3), ('b', 2), ('a', 1)]) |
| 117 | +``` |
| 118 | + |
| 119 | +Use `OrderedDict` when order matters for equality comparison or when you need `move_to_end()`. Otherwise, use a regular dict. |
| 120 | + |
| 121 | +## Common Mistakes |
| 122 | + |
| 123 | +**Forgetting that defaultdict creates entries on access:** |
| 124 | +```python |
| 125 | +d = defaultdict(list) |
| 126 | +if d["missing_key"]: # This CREATES the key with an empty list! |
| 127 | + pass |
| 128 | + |
| 129 | +# Use "key in d" to check without creating: |
| 130 | +if "missing_key" in d: |
| 131 | + pass |
| 132 | +``` |
| 133 | + |
| 134 | +**Using Counter with non-hashable items:** |
| 135 | +```python |
| 136 | +# Lists are not hashable: |
| 137 | +Counter([[1, 2], [3, 4]]) # TypeError! |
| 138 | +# Convert to tuples first: |
| 139 | +Counter([(1, 2), (3, 4)]) # OK |
| 140 | +``` |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +| [← Overview](./collections-deep-dive.md) | [Part 2: deque, namedtuple, ChainMap →](./collections-deep-dive-part2.md) | |
| 145 | +|:---|---:| |
0 commit comments