Skip to content

Commit c236168

Browse files
committed
add stack overflow guards
1 parent 5c2d616 commit c236168

3 files changed

Lines changed: 230 additions & 118 deletions

File tree

README.md

Lines changed: 79 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,45 @@
1919
| **Streaming** | `CBOR.stream` for CBOR sequence reading |
2020
| **Performance** | ~30% faster than msgpack; 1.3–3× faster than simdjson for selective access |
2121

22-
### ⚠️ Limitations
22+
### ⚠️ Limitations & Design Decisions
2323

24-
- No indefinite-length item support (RFC 8949 Section 4.2.1)
24+
| Limitation | Reason |
25+
|-----------|--------|
26+
| No indefinite-length items | Use CBOR.stream mode instead. |
27+
28+
**Determinism Guarantees:**
29+
- Encoding is deterministic *within a single mruby build*
30+
- Hash field order follows insertion order (per mruby hash impl)
31+
- Float width (16/32/64) is compile-time fixed via `MRB_USE_FLOAT32`
32+
- Symbol encoding strategy is global; don't mix `no_symbols` / `symbols_as_string` / `symbols_as_uint32` in the same program
33+
- **Not deterministic across builds** if you rebuild mruby with different CFLAGS or config
34+
35+
**Recursion Depth Limits:**
36+
Default `CBOR_MAX_DEPTH` depends on mruby profile:
37+
- `MRB_PROFILE_MAIN` / `MRB_PROFILE_HIGH`: 512
38+
- `MRB_PROFILE_BASELINE`: 64
39+
- Constrained / other: 32
40+
41+
Exceeding this raises `RuntimeError: "CBOR nesting depth exceeded"`. Override by setting `CBOR_MAX_DEPTH` at compile time.
42+
43+
---
44+
45+
## 📊 Performance Notes
46+
47+
- **Encoding:** ~30% faster than msgpack (SBO + incremental writes)
48+
- **Lazy decoding:** 1.3–3× faster than simdjson for selective access
49+
- **Shared refs:** Tags 28/29 deduplication is O(1) amortized
50+
- **Float encoding:** No overhead; width selection happens once per value at encode time
51+
52+
**When to use lazy decoding:**
53+
- Decoding large payloads where you only access a subset of fields
54+
- Streaming/telemetry where you care about specific fields
55+
- Network protocols where you validate before deserializing
56+
57+
**When to use eager decoding:**
58+
- Small payloads
59+
- You need the full object in memory instantly
60+
- Simplicity over optimization
2561

2662
---
2763

@@ -47,7 +83,7 @@ lazy["hello"][1].value # => 2 (constant-time after first access)
4783

4884
### CBOR::Lazy – On-Demand Access
4985

50-
`decode_lazy` returns a `CBOR::Lazy` object wrapping the raw buffer **without decoding**. Navigate with `[]` or `dig`, then call `.value` when you need the actual value.
86+
`decode_lazy` returns a `CBOR::Lazy` object wrapping the raw buffer **without decoding the value**. Navigate with `[]` or `dig`, then call `.value` when you need the actual value.
5187
```ruby
5288
lazy = CBOR.decode_lazy(big_payload)
5389

@@ -159,21 +195,41 @@ Convenience Types:
159195
### Symbol Handling
160196

161197
Three strategies for encoding Ruby symbols:
198+
162199
```ruby
163-
# 1. Default: convert to strings (no tag)
200+
# 1. Default: strip symbols (no tag, no round-trip)
164201
CBOR.no_symbols
202+
sym = :hello
203+
encoded = CBOR.encode(sym) # Encodes as plain string "hello"
204+
decoded = CBOR.decode(encoded) # => "hello" (not a symbol!)
165205

166-
# 2. Use tag 39 + string
206+
# 2. Use tag 39 + string (RFC 8949, interoperable)
167207
CBOR.symbols_as_string
208+
sym = :hello
209+
encoded = CBOR.encode(sym)
210+
decoded = CBOR.decode(encoded) # => :hello (symbol preserved)
168211

169-
# 3. Use tag 39 + uint32 (mruby-to-mruby only)
212+
# 3. Use tag 39 + uint32 (mruby presym only, fastest)
170213
CBOR.symbols_as_uint32
171214
sym = :hello
172215
encoded = CBOR.encode(sym)
173-
decoded = CBOR.decode(encoded) # => :hello
216+
decoded = CBOR.decode(encoded) # => :hello (symbol preserved, same mruby only)
174217
```
175218

176-
> **⚠️ Warning:** `symbols_as_uint32` is mruby instance–specific. Only use it when both encoder and decoder run on the same mruby executable and when all symbols are interned at compile time, see https://github.com/mruby/mruby/blob/master/doc/guides/symbol.md#preallocate-symbols
219+
**Mode Comparison:**
220+
221+
| Mode | Encoding | Interop | Round-trip | Speed |
222+
|------|----------|---------|-----------|-------|
223+
| `no_symbols` | Plain string | ✅ All | ❌ No | Fast |
224+
| `symbols_as_string` | Tag 39 + string | ✅ All | ✅ Yes | Good |
225+
| `symbols_as_uint32` | Tag 39 + uint32 | ❌ mruby only | ✅ Yes | Fastest |
226+
227+
> **⚠️ Warning:** `symbols_as_uint32` requires:
228+
> - **Same mruby build** — encoder and decoder must use the same mruby executable (same `libmruby.a`)
229+
> - **Compile-time symbols** — all symbols must be interned at build time (see [presym docs](https://github.com/mruby/mruby/blob/master/doc/guides/symbol.md#preallocate-symbols))
230+
> - **No runtime symbol creation** — decoding fails if the symbol ID doesn't exist in the decoder's presym table
231+
>
232+
> Use only for internal mruby-to-mruby IPC on the same build. For external/user data, use `symbols_as_string`.
177233
178234
---
179235

@@ -208,6 +264,20 @@ rake test
208264

209265
---
210266

267+
## ⚠️ Error Handling
268+
269+
| Error | When | Example |
270+
|-------|------|---------|
271+
| `ArgumentError` | Invalid encode options | `CBOR.encode(obj, bad_option: true)` |
272+
| `RangeError` | Integer out of bounds | Encoding a Bigint larger than uint64 |
273+
| `RuntimeError` | Nesting depth exceeded | Deeply nested structures beyond `CBOR_MAX_DEPTH` |
274+
| `RuntimeError` | Truncated/invalid CBOR | `CBOR.decode(incomplete_buffer)` |
275+
| `TypeError` | Type mismatch in registered tags | Field marked as String gets an Array |
276+
| `KeyError` | Lazy access to missing key | `lazy["nonexistent"]` (use `.dig` to get nil instead) |
277+
| `NotImplementedError` | Presym on non-presym mruby | `CBOR.symbols_as_uint32` on build without presym |
278+
279+
---
280+
211281
## 🔗 Specification
212282

213283
- **CBOR (RFC 8949):** https://tools.ietf.org/html/rfc8949
@@ -217,4 +287,4 @@ rake test
217287

218288
## 📄 License
219289

220-
Apache License 2.0
290+
Apache License 2.0

src/mrb_cbor.c

Lines changed: 73 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,9 @@ typedef struct {
101101
mrb_undef_value() = disabled
102102
mrb_hash = obj -> share_idx mapping */
103103
mrb_value seen;
104+
105+
/* Recursion depth counter to prevent stack overflow */
106+
mrb_int depth;
104107
} CborWriter;
105108

106109
/* Forward declarations */
@@ -816,6 +819,7 @@ cbor_writer_init(CborWriter *w, mrb_state *mrb)
816819
w->arena_index = mrb_gc_arena_save(mrb);
817820

818821
w->seen = mrb_undef_value();
822+
w->depth = 0;
819823
}
820824

821825
static size_t next_pow2(size_t x)
@@ -1253,8 +1257,16 @@ static void
12531257
encode_value(CborWriter* w, mrb_value obj)
12541258
{
12551259
mrb_state* mrb = w->mrb;
1260+
1261+
if (likely(w->depth < CBOR_MAX_DEPTH)) {
1262+
w->depth++;
1263+
} else {
1264+
mrb_raise(mrb, E_RUNTIME_ERROR, "CBOR nesting depth exceeded");
1265+
}
1266+
12561267
if (!mrb_undef_p(w->seen)) {
12571268
if (!mrb_immediate_p(obj) && encode_check_shared(w, obj)) {
1269+
w->depth--;
12581270
return;
12591271
}
12601272
}
@@ -1309,6 +1321,8 @@ encode_value(CborWriter* w, mrb_value obj)
13091321
}
13101322
} break;
13111323
}
1324+
1325+
w->depth--;
13121326
}
13131327

13141328

@@ -1472,6 +1486,45 @@ mrb_cbor_register_tag(mrb_state *mrb, mrb_value self)
14721486

14731487
static void encode_value(CborWriter *w, mrb_value obj); /* forward */
14741488

1489+
/* Map Ruby type to CBOR major type for schema validation */
1490+
static uint8_t
1491+
cbor_type_major(mrb_state *mrb, mrb_value val)
1492+
{
1493+
switch (mrb_type(val)) {
1494+
case MRB_TT_INTEGER:
1495+
return mrb_integer(val) >= 0 ? 0 : 1;
1496+
case MRB_TT_STRING:
1497+
return mrb_str_is_utf8(val) ? 3 : 2;
1498+
case MRB_TT_ARRAY:
1499+
return 4;
1500+
case MRB_TT_HASH:
1501+
return 5;
1502+
case MRB_TT_FALSE:
1503+
case MRB_TT_TRUE:
1504+
#ifndef MRB_NO_FLOAT
1505+
case MRB_TT_FLOAT:
1506+
#endif
1507+
return 7;
1508+
#ifdef MRB_USE_BIGINT
1509+
case MRB_TT_BIGINT:
1510+
if (mrb_bint_size(mrb, val) <= 8) {
1511+
return mrb_bint_sign(mrb, val) >= 0 ? 0 : 6;
1512+
}
1513+
return 6;
1514+
#endif
1515+
case MRB_TT_SYMBOL: {
1516+
mrb_value mode = cbor_sym_strategy(mrb);
1517+
if (mrb_cmp(mrb, mode, mrb_fixnum_value(2)) == 0) {
1518+
return 6;
1519+
}
1520+
mrb_value str = mrb_sym2str(mrb, mrb_symbol(val));
1521+
return mrb_str_is_utf8(str) ? 3 : 2;
1522+
}
1523+
default:
1524+
return 6;
1525+
}
1526+
}
1527+
14751528
typedef struct {
14761529
CborWriter *w;
14771530
mrb_value obj;
@@ -1490,66 +1543,21 @@ encode_registered_tag_foreach(mrb_state *mrb, mrb_value sym, mrb_value mask, voi
14901543

14911544
mrb_value val = mrb_iv_get(mrb, obj, mrb_symbol(sym));
14921545

1493-
if (mrb_integer_p(mask)) {
1494-
/* actual_major is an internal bitmask index, not a CBOR value — uint8_t is correct here */
1495-
uint8_t actual_major = 6;
1496-
1497-
switch (mrb_type(val)) {
1498-
case MRB_TT_INTEGER: {
1499-
mrb_int i = mrb_integer(val);
1500-
actual_major = (i >= 0) ? 0 : 1;
1501-
break;
1502-
}
1503-
case MRB_TT_STRING: {
1504-
actual_major = mrb_str_is_utf8(val) ? 3 : 2;
1505-
break;
1506-
}
1507-
case MRB_TT_ARRAY:
1508-
actual_major = 4;
1509-
break;
1510-
case MRB_TT_HASH:
1511-
actual_major = 5;
1512-
break;
1513-
case MRB_TT_FALSE:
1514-
case MRB_TT_TRUE:
1515-
#ifndef MRB_NO_FLOAT
1516-
case MRB_TT_FLOAT:
1517-
#endif
1518-
actual_major = 7;
1519-
break;
1520-
#ifdef MRB_USE_BIGINT
1521-
case MRB_TT_BIGINT:
1522-
if (mrb_bint_size(mrb, val) <= 8) {
1523-
actual_major = (mrb_bint_sign(mrb, val) >= 0) ? 0 : 1;
1524-
} else {
1525-
actual_major = 6;
1526-
}
1527-
break;
1528-
#endif
1529-
case MRB_TT_SYMBOL: {
1530-
mrb_value mode = cbor_sym_strategy(mrb);
1531-
if (mrb_cmp(mrb, mode, mrb_fixnum_value(2)) == 0) {
1532-
actual_major = 6;
1533-
} else {
1534-
mrb_value str = mrb_sym2str(mrb, mrb_symbol(val));
1535-
actual_major = mrb_str_is_utf8(str) ? 3 : 2;
1536-
}
1537-
break;
1538-
}
1539-
default:
1540-
actual_major = 6;
1541-
}
1542-
1546+
if (likely(mrb_integer_p(mask))) {
1547+
uint8_t actual_major = cbor_type_major(mrb, val);
15431548
mrb_int allowed = mrb_integer(mask);
1544-
if (unlikely(!((allowed >> actual_major) & 1))) {
1549+
if (likely(((allowed >> actual_major) & 1))) {
1550+
encode_len(w, 3, (uint64_t)slen);
1551+
cbor_writer_write(w, (const uint8_t*)sname, (size_t)slen);
1552+
encode_value(w, val);
1553+
} else {
15451554
mrb_raisef(mrb, E_TYPE_ERROR,
15461555
"CBOR tag field type mismatch for ivar %v", sym);
15471556
}
1557+
} else {
1558+
mrb_raise(mrb, E_TYPE_ERROR, "mask is not a Integer");
15481559
}
15491560

1550-
encode_len(w, 3, (uint64_t)slen);
1551-
cbor_writer_write(w, (const uint8_t*)sname, (size_t)slen);
1552-
encode_value(w, val);
15531561
return 0;
15541562
}
15551563

@@ -1594,65 +1602,21 @@ decode_registered_tag_foreach(mrb_state *mrb, mrb_value sym, mrb_value mask, voi
15941602
mrb_value map_key = mrb_str_new_static(mrb, sname, slen);
15951603
mrb_value val = mrb_hash_fetch(mrb, ctx->payload, map_key, mrb_undef_value());
15961604

1597-
if (!mrb_undef_p(val) && mrb_integer_p(mask)) {
1598-
uint8_t actual_major = 6;
1599-
1600-
switch (mrb_type(val)) {
1601-
case MRB_TT_INTEGER: {
1602-
mrb_int i = mrb_integer(val);
1603-
actual_major = (i >= 0) ? 0 : 1;
1604-
break;
1605-
}
1606-
case MRB_TT_STRING: {
1607-
actual_major = mrb_str_is_utf8(val) ? 3 : 2;
1608-
break;
1609-
}
1610-
case MRB_TT_ARRAY:
1611-
actual_major = 4;
1612-
break;
1613-
case MRB_TT_HASH:
1614-
actual_major = 5;
1615-
break;
1616-
case MRB_TT_FALSE:
1617-
case MRB_TT_TRUE:
1618-
#ifndef MRB_NO_FLOAT
1619-
case MRB_TT_FLOAT:
1620-
#endif
1621-
actual_major = 7;
1622-
break;
1623-
#ifdef MRB_USE_BIGINT
1624-
case MRB_TT_BIGINT:
1625-
if (mrb_bint_size(mrb, val) <= 8) {
1626-
actual_major = (mrb_bint_sign(mrb, val) >= 0) ? 0 : 1;
1627-
} else {
1628-
actual_major = 6;
1629-
}
1630-
break;
1631-
#endif
1632-
case MRB_TT_SYMBOL: {
1633-
mrb_value mode = cbor_sym_strategy(mrb);
1634-
if (mrb_cmp(mrb, mode, mrb_fixnum_value(2)) == 0) {
1635-
actual_major = 6;
1636-
} else {
1637-
mrb_value str = mrb_sym2str(mrb, mrb_symbol(val));
1638-
actual_major = mrb_str_is_utf8(str) ? 3 : 2;
1639-
}
1640-
break;
1605+
if (!mrb_undef_p(val)) {
1606+
if (likely(mrb_integer_p(mask))) {
1607+
uint8_t actual_major = cbor_type_major(mrb, val);
1608+
mrb_int allowed = mrb_integer(mask);
1609+
if (likely(((allowed >> actual_major) & 1))) {
1610+
mrb_iv_set(mrb, ctx->obj, mrb_symbol(sym), val);
1611+
} else {
1612+
mrb_raisef(mrb, E_TYPE_ERROR,
1613+
"CBOR tag field type mismatch for ivar %v", sym);
16411614
}
1642-
default:
1643-
actual_major = 6;
1644-
}
1645-
1646-
mrb_int allowed = mrb_integer(mask);
1647-
if (unlikely(!((allowed >> actual_major) & 1))) {
1648-
mrb_raisef(mrb, E_TYPE_ERROR,
1649-
"CBOR tag field type mismatch for ivar %v", sym);
1615+
} else {
1616+
mrb_raise(mrb, E_TYPE_ERROR, "mask is not a Integer");
16501617
}
16511618
}
16521619

1653-
mrb_iv_set(mrb, ctx->obj, mrb_symbol(sym),
1654-
mrb_undef_p(val) ? mrb_nil_value() : val);
1655-
16561620
return 0;
16571621
}
16581622

0 commit comments

Comments
 (0)