Version: v1.0.0-PREDICTIVE-AUTOGRAD-GODCORE
Backend: NumPy (CPU)
Paradigm: Reverse-mode autodiff (dynamic computation graphs)
PAE is engineered for:
- Correctness first (broadcasting gradients, stable softmax/xent, numerical checks)
- Debuggability (graph capture, anomaly detection, gradient health)
- Extensibility (clear op registry, single-file drop-in, backend abstraction points)
- Determinism (seed control and explicit graph lifetimes)
A Tensor wraps:
data: np.ndarraygrad: Optional[np.ndarray]requires_grad: bool- graph metadata:
_ctx(operation context),_parents,_op
Tensor(data, requires_grad=False, name=None, dtype=None)Tensor.zeros(shape, requires_grad=False)Tensor.ones(shape, requires_grad=False)Tensor.randn(shape, requires_grad=False, seed=None, scale=1.0)Tensor.uniform(shape, low=0.0, high=1.0, requires_grad=False, seed=None)
t.shape,t.ndim,t.dtypet.Ttranspose (2D shortcut)
t.detach()returns a Tensor not connected to the graphTensor.no_grad()context manager disables graph building
t.graph_summary()prints node counts and op histogramTensor.set_anomaly_mode(True/False)enables NaN/Inf checks with op contextTensor.set_predictive_mode(True/False)enables op-stat collection & fusion hintsTensor.get_predictive_report()returns a report of common patterns and suggestions
t.backward(grad=None, retain_graph=False)Rules:- If
tis scalar andgradis None, uses1.0. - Accumulates gradients into
leaf.grad(NumPy arrays). - Topological traversal over recorded parents.
PAE follows NumPy broadcasting:
- For an operation
z = x + ywhereyis broadcast, PAE reducesdz/dyalong broadcast axes.
+ - * / **neg()exp(), log()tanh(), sigmoid(), relu(), gelu()
sum(axis=None, keepdims=False)mean(axis=None, keepdims=False)var(axis=None, keepdims=False, unbiased=False)std(axis=None, keepdims=False, unbiased=False)
reshape(*shape)transpose(*axes)/permute(*axes)squeeze(axis=None)unsqueeze(axis)- slicing:
t[a:b, ...] concat([t1,t2,...], axis=...)
matmulvia@softmax(axis=-1)(stable)log_softmax(axis=-1)(stable)cross_entropy(logits, targets, reduction='mean')(stable; fused path recommended)layer_norm(normalized_shape, eps=1e-5)
dropout(p=0.5, training=True, seed=None)(deterministic with seed)
Tensor.zero_grads(params)Tensor.clip_grad_norm(params, max_norm, eps=1e-12)
(You can implement Adam/SGD externally; this engine keeps gradients correct and stable.)
- Ensure
requires_grad=Trueon leaves you want gradients for. - Ensure you didn't call
.detach()or run underTensor.no_grad().
- Turn on anomaly mode:
Tensor.set_anomaly_mode(True)
- Use gradient clipping:
Tensor.clip_grad_norm(params, 1.0)
- Prefer stable losses:
- use
Tensor.cross_entropy(it uses log-sum-exp stabilization)
- use
- Verify shapes. If the forward uses implicit broadcasting, the backward will reduce over broadcast axes.
- If you reshaped or summed, confirm
axisandkeepdimsusage.
This library embeds a minimal SAVE3-compatible scaffold (TrustModelBeta + HandoverEnvelope) to comply with your file standards. It is inert unless you call it. It exists to standardize provenance and handover metadata when you later split engine responsibilities across agents/modules.
Run:
python predictive_autograd_engine.pyIt performs:
- Numerical gradient checks (finite differences) on key ops
- Softmax + cross-entropy sanity checks
- Broadcasting gradient checks
- Matmul gradient checks
If a test fails, it prints the op context and the offending node.