Python code that builds large strings by repeated concatenation using += in a loop. The metric was time to construct a string from N parts. Since Python strings are immutable, each += creates a new string and copies all previous content, making repeated concatenation O(N²).
Using ''.join(list_of_parts) instead of repeated +=. The join method pre-scans the list to determine the total length, allocates once, and copies each part into the final buffer — O(N) total.
For concatenating 1M short strings (avg 10 chars), join was 280x faster than +=.
| Method | 10K parts (ms) | 100K parts (ms) | 1M parts (ms) |
|---|---|---|---|
+= in loop |
2.1 | 210 | 28,000 |
''.join(list) |
0.4 | 3.8 | 100 |
io.StringIO |
0.5 | 4.2 | 112 |
# BAD: O(N²) — each += copies the entire string built so far
result = ""
for chunk in data: # 1M chunks
result += chunk # copies all of result each time
# GOOD: O(N) — single allocation
result = ''.join(data)
# When building conditionally (can't use a simple list):
parts = []
for chunk in data:
if should_include(chunk):
parts.append(transform(chunk))
result = ''.join(parts)
# For very large string building (100MB+), io.StringIO is equivalent
import io
buf = io.StringIO()
for chunk in data:
buf.write(chunk)
result = buf.getvalue()
# For bytes: b''.join() or io.BytesIO- CPython optimization for
+=: CPython (3.x) has a hack that detectss += xwhenshas refcount 1 and resizes in-place. This makes small examples appear fast, but it's fragile — any additional reference tos(e.g., passing to a function, appending to a list) defeats the optimization. Never rely on it. - f-strings in loops:
result = f"{result}{chunk}"is equivalent to+=— same O(N²) problem. f-strings are for formatting, not accumulation.
CPython 3.8+. The O(N²) behavior is a fundamental property of immutable strings in any language — the same advice applies to Java String (use StringBuilder), Go string (use strings.Builder), etc.