NameError: tree_reduce is not defined in mlx_lm/models/cache.py _BaseCache.nbytes

## Summary

`mlx_lm/models/cache.py` imports `tree_flatten, tree_map, tree_unflatten` from `mlx.utils` but the `_BaseCache.nbytes` property (line 322) calls `tree_reduce`, which is never imported. Any code path that reads `.nbytes` on a `_BaseCache` instance (or a subclass that inherits it) raises:

```
NameError: name 'tree_reduce' is not defined
```

## Repro

Running `qwen3.5-397b-a17b-mlx` (a Qwen3.5 MoE, `model_type: qwen3_5_moe`) in LM Studio 0.4.12+1 with its bundled `mlx-lm==0.31.3`:

```bash
curl -sS -X POST http://127.0.0.1:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5-397b-a17b-mlx","stream":false,"max_tokens":20,
       "messages":[{"role":"user","content":"Say hello."}]}'
```

returns:

```json
{"error":"Error in iterating prediction stream: NameError: name 'tree_reduce' is not defined"}
```

Both streaming and non-streaming paths fail — `_BaseCache.nbytes` is read before the first token is produced.

## Why MoE models surface this

Dense architectures use cache subclasses that override `nbytes` with their own implementation and never hit `_BaseCache.nbytes`. MoE architectures (Qwen3.5 MoE, etc.) inherit the base implementation, which is why MoE users see it and dense-model users don't.

## Root cause

`mlx_lm/models/cache.py` line 10:

```python
from mlx.utils import tree_flatten, tree_map, tree_unflatten
```

line 322:

```python
@property
def nbytes(self):
    return tree_reduce(lambda a, x: a + x.nbytes, (self.keys, self.values), 0)
```

`tree_reduce` is used but not imported. One-line fix — PR to follow.

## Environment

- macOS 15.x, Apple Silicon (M3 Ultra)
- `mlx==0.31.1`, `mlx-lm==0.31.3`
- Confirmed on `main` at `62f38ae`
- Also reproduces in LM Studio bundled runtimes 1.4.0, 1.5.0, 1.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NameError: tree_reduce is not defined in mlx_lm/models/cache.py _BaseCache.nbytes #1164

Summary

Repro

Why MoE models surface this

Root cause

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

NameError: tree_reduce is not defined in mlx_lm/models/cache.py _BaseCache.nbytes #1164

Description

Summary

Repro

Why MoE models surface this

Root cause

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions