Skip to content

Commit 9b4703b

Browse files
connorsheaclaude
andcommitted
Use per-attribute node.attribute() for cell t/s/r
Follow-up to the rows_generator optimization. Fetch the three cell attributes with individual node.attribute('t'/'s'/'r') C calls instead of building a per-cell attribute_hash and indexing it. With hundreds of thousands of cells the per-cell Hash allocation dominates, so three cheap lookups are both faster and leaner. Same 20,000-row x 40-col benchmark, vs the attribute_hash version: | Cell-attr approach | Median | Allocations | |------------------------|---------|-------------| | node.attribute_hash | ~1.10s | 9,226,711 | | node.attribute() x3 | ~1.08s | 7,626,711 | => ~3-4% faster, 1.6M (~17%) fewer allocations (deterministic: ~640K cells x the per-cell hash no longer built). Output byte-identical across rows, simple_rows, and rows_with_meta_data on both default- and prefixed-namespace files. Suite green (47 examples). The row-opener node keeps attribute_hash (only ~one per row, and it preserves the exact meta-data row hash). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 710d0ee commit 9b4703b

1 file changed

Lines changed: 7 additions & 6 deletions

File tree

lib/creek/sheet.rb

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -131,12 +131,13 @@ def rows_generator(include_meta_data = false, use_simple_rows_format = false)
131131
cells[cell] = convert(node.value, cell_type, cell_style_idx)
132132
end
133133
elsif node_name == name_c && node_type == opener
134-
# attribute_hash avoids the namespaces lookup + merge that
135-
# Reader#attributes performs on every call; we only need t/s/r.
136-
attributes = node.attribute_hash
137-
cell_type = attributes['t']
138-
cell_style_idx = attributes['s']
139-
cell = attributes['r']
134+
# Fetch the three attributes individually rather than via
135+
# attribute_hash/attributes: with hundreds of thousands of cells
136+
# the per-cell Hash allocation dominates, so three cheap C lookups
137+
# are both faster and leaner than building and indexing a hash.
138+
cell_type = node.attribute('t')
139+
cell_style_idx = node.attribute('s')
140+
cell = node.attribute('r')
140141
elsif node_name == name_row && node_type == opener
141142
row = node.attribute_hash
142143
row['cells'] = {}

0 commit comments

Comments
 (0)