Skip to content

Latest commit

 

History

History
123 lines (93 loc) · 6.24 KB

File metadata and controls

123 lines (93 loc) · 6.24 KB

🧰 How to Choose a Data Structure

PATTERNS.md tells you which technique a problem wants. This page is one level lower: once you know the technique, which container do you put the data in so the operations you need are fast? Picking the wrong structure is what turns a clean O(n) idea into a TLE.

The whole skill comes down to three questions. Ask them in order:

  1. What operation must be fast? (lookup? min/max? insert at ends? sorted order?)
  2. Do I need ordering? (insertion order, sorted order, or none?)
  3. Do I need uniqueness or key→value mapping?

Your answers point at one structure almost every time. The tables below are just that logic, pre-computed.


1. When the problem says this → reach for this

The fastest way to choose: match the wording of the problem to a structure.

The problem mentions… Reach for Why
"unique", "seen before", "contains duplicate" Set (set) O(1) membership test
"count", "frequency", "most common", "anagram" Hash map / Counter (Counter) O(1) tally per element
"key → value", "look up by id", "cache" Hash map (dict) O(1) keyed access
"top K", "K largest/smallest", "median of a stream" Heap (heapq) O(log n) to keep the best K
"next greater", "nearest smaller", "valid parentheses" Stack LIFO matches the access order
"sliding window max/min", "first negative in window" Monotonic Deque (deque) O(1) push/pop at both ends
"process in arrival order", "level by level", "BFS" Queue (deque) FIFO
"shortest path", "connected", "network", "grid" Graph (adjacency list = dict[node] → list) models relationships
"starts with", "prefix", "autocomplete", "dictionary" Trie O(L) prefix lookup
"merge groups", "are these connected", "count islands by union" Union-Find (DSU) near-O(1) grouping
"keep sorted", "floor/ceil", "kth smallest with inserts" Balanced BST / sorted list (SortedList) O(log n) ordered ops
"range sum/min with updates" Fenwick / Segment Tree O(log n) query + update
"LRU cache", "most-recently-used" Hash map + Doubly Linked List O(1) move-to-front

2. When you need this operation fast → use this

The same idea, indexed by capability instead of wording.

I need fast… Best structure Operation cost
Lookup / insert / delete by key Hash map (dict) O(1) average
Membership test Hash set (set) O(1) average
Index access (a[i]) Dynamic array (list) O(1)
Insert / remove at both ends deque O(1)
Insert / remove at one end (LIFO/FIFO) Stack / Queue O(1)
Repeated min or max retrieval Heap (heapq) O(log n) push/pop, O(1) peek
Elements always in sorted order Balanced BST / SortedList O(log n)
Prefix matching on strings Trie O(L)
Grouping / connectivity queries Union-Find ~O(1) amortized
Range queries with updates Segment Tree / Fenwick O(log n)

Rule of thumb: if you find yourself scanning the whole collection (for x in data) inside another loop just to find one element, you almost certainly want a hash map, a heap, or a sorted structure instead.


3. The Python toolbox (what to actually import)

Since this repo is Python 3, here's the concrete mapping from "I want X" to the standard-library type.

You want Use Note
Hash map dict also collections.defaultdict for auto-default values
Hash set set frozenset if you need it hashable
Frequency count collections.Counter .most_common(k) is handy
Stack list (append / pop) already O(1)
Queue / Deque collections.deque popleft() is O(1) (a list is O(n))
Min-heap heapq for a max-heap, push -value
Sorted container sortedcontainers.SortedList 3rd-party; or bisect on a list for mostly-static data
LRU ordering collections.OrderedDict .move_to_end(); or functools.lru_cache for memoization
Tuple as key dict[(r, c)] great for grids and visited-states

4. Complexity cheat-sheet

Keep this in your head — it's what the choice ultimately rests on.

Structure Access Search Insert Delete
Array / list O(1) O(n) O(1)* end / O(n) middle O(n)
Hash map / set O(1) avg O(1) avg O(1) avg
Stack / Queue (deque) O(n) middle O(n) O(1) ends O(1) ends
Heap (heapq) O(1) peek O(n) O(log n) O(log n) pop-min
Balanced BST / SortedList O(log n) O(log n) O(log n) O(log n)
Trie O(L) O(L) O(L)
Union-Find ~O(1) ~O(1)
Linked List O(n) O(n) O(1) at node O(1) at node

* amortized


5. Composite structures (the ones that confuse people)

Some classic problems aren't one structure — they're two working together. These pairings come up constantly:

  • LRU Cache = Hash map (for O(1) lookup) + Doubly Linked List (for O(1) reordering).
  • Median of a data stream = a max-heap for the lower half + a min-heap for the upper half (the "two heaps" trick).
  • Dijkstra / weighted shortest path = Graph (adjacency list) + a min-heap of (distance, node).
  • Sliding-window maximum = a monotonic deque that drops elements that can never be the max.
  • Counting islands / friend circles = Grid traversal + Union-Find, or just BFS/DFS with a visited set.
  • Detect a cycle in a linked list = the list + two pointers (slow/fast), no extra space.

TL;DR decision flow

Need key → value lookup?           → Hash map
Just membership / uniqueness?      → Hash set
Counting things?                   → Counter
Repeated min/max or top-K?         → Heap
LIFO / FIFO / both ends?           → Stack / Queue / Deque
Must stay in sorted order?         → Balanced BST / SortedList
Prefixes / autocomplete?           → Trie
Connectivity / grouping?           → Union-Find
Relationships / paths?             → Graph
Range queries with updates?        → Segment Tree / Fenwick

Pair this with PATTERNS.md: patterns tell you the plan, data structures tell you the tools.