Unify K-FAC and autocorrelation Hessian save layout#276
Open
jammastergirish wants to merge 9 commits intomainfrom
Open
Unify K-FAC and autocorrelation Hessian save layout#276jammastergirish wants to merge 9 commits intomainfrom
jammastergirish wants to merge 9 commits intomainfrom
Conversation
Both `bergson hessian` paths now write under `run_path/<method>/` with a `hessian_config.yaml` recording the method, and a single helper makes it easy to discover which approximation produced a saved directory. Changes: - `bergson hessian --method autocorrelation runs/foo` now writes to `runs/foo/autocorrelation/` (was: `runs/foo/`). K-FAC family methods already use this layout — autocorrelation now matches. - New module `bergson.hessians.io` with `load_hessian_config(path)` and `hessian_method(path)` helpers. Reads `hessian_config.yaml` from a saved Hessian directory and returns the parsed config / method string. - `assert_autocorrelation_hessian` in `process_grads.py` now uses the new helper so there's a single read path for hessian metadata. Behavior **not** changed: - On-disk artifact formats (sharded safetensors for K-FAC family, GradientProcessor `.pth` files for autocorrelation) are kept as-is. They encode different math; converging them would be a much larger change with no clear benefit. - `bergson build` (with `skip_hessians=False`) still writes its autocorrelation Hessian alongside the gradient index at `run_path/`. The Hessian is coupled with the index in this mode, so namespacing it under `<method>/` would split the run. - `trackstar.py` and `bergson/hessians/pipeline.py` call `build()` / `approximate_hessians()` directly with their own custom subdirectories (e.g. `value_hessian/`, `query_hessian/`); those layouts are unchanged. Breaking change: - Users invoking `bergson hessian --method autocorrelation` and then pointing a downstream `--preprocess_cfg.hessian_path` at the run directory must update the path to include the trailing `/autocorrelation` segment. This matches the K-FAC convention that has always required `--hessian_method_path runs/foo/kfac`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-trips a HessianConfig through save_yaml + load_hessian_config for each method, verifies hessian_method returns the right string, and checks that a missing config file raises FileNotFoundError with the expected filename in the message. Pure-python, no GPU required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
luciaquirke
reviewed
May 6, 2026
luciaquirke
reviewed
May 6, 2026
luciaquirke
reviewed
May 6, 2026
luciaquirke
reviewed
May 6, 2026
Collaborator
|
Seems like there's some claude bloat in this implementation, could you please do a pass for conciseness before getting human reviews? I want to avoid the code base getting too big and complex |
luciaquirke
reviewed
May 7, 2026
luciaquirke
reviewed
May 7, 2026
luciaquirke
reviewed
May 7, 2026
…relation_hessian Addresses Lucia's review feedback on PR #276: - Push the `<run_path>/<method>` mutation out of `Hessian.execute` and into `build()` itself, gated on a new optional `hessian_cfg` parameter and `skip_index=True`. Trackstar still calls `build()` without hessian_cfg, so its routing is unchanged. - Inline `assert_autocorrelation_hessian` into its only caller (`mix_autocorrelation_matrices`); it was used in 2 places. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
luciaquirke
reviewed
May 8, 2026
| preprocess_cfg.save_yaml(index_cfg.partial_run_path / "preprocess_config.yaml") | ||
| if not index_cfg.skip_hessians: | ||
| HessianConfig(method="autocorrelation").save_yaml( | ||
| (hessian_cfg or HessianConfig(method="autocorrelation")).save_yaml( |
Collaborator
There was a problem hiding this comment.
I think we can just save it if it's not None here? so no if not index_cfg.skip_hessians: check
luciaquirke
reviewed
May 8, 2026
| assert method == "autocorrelation", ( | ||
| f"Hessian at '{path}' was computed with method '{method}'; " | ||
| f"mix_autocorrelation_matrices only supports autocorrelation." | ||
| ) |
luciaquirke
approved these changes
May 8, 2026
Collaborator
luciaquirke
left a comment
There was a problem hiding this comment.
Awesome, LGTM with YAML saving logic fix
luciaquirke
reviewed
May 8, 2026
| setup_reproducibility() | ||
|
|
||
| if hessian_cfg is not None and index_cfg.skip_index: | ||
| index_cfg.run_path = index_cfg.run_path + f"/{hessian_cfg.method}" |
Collaborator
There was a problem hiding this comment.
Mutating the run path is a red flag right
luciaquirke
reviewed
May 8, 2026
Collaborator
luciaquirke
left a comment
There was a problem hiding this comment.
I think I'm going to pause this one on reflection, I think the unification needs more thought because it's not really clear which run path strategy we want to commit to between k-fac and autocorrelation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linear task
Summary
bergson hessianhad two different save layouts depending on the method:kfac/tkfac/shampoo) wrote torun_path/<method>/.run_path/(because it piggy-backs onbergson build).Both already serialize a
hessian_config.yamlrecording the method, but at different relative locations. This PR aligns them under the K-FAC convention so the savedhessian_config.yamlalways lives at<run_path>/<method>/hessian_config.yamland unambiguously identifies which method produced the directory.What changed
Layout
bergson hessian --method kfac runs/fooruns/foo/kfac/runs/foo/kfac/(unchanged)bergson hessian --method tkfac runs/fooruns/foo/tkfac/runs/foo/tkfac/(unchanged)bergson hessian --method shampoo runs/fooruns/foo/shampoo/runs/foo/shampoo/(unchanged)bergson hessian --method autocorrelation runs/fooruns/foo/runs/foo/autocorrelation/Where the routing lives
K-FAC-family writes mutate `run_path` inside `approximate_hessians` (`bergson/hessians/hessian_approximations.py`). The autocorrelation flow piggybacks on `build()`, which now takes an optional `hessian_cfg` parameter; when supplied alongside `skip_index=True`, `build` performs the same `<run_path>//` mutation and serializes the supplied `HessianConfig` to `hessian_config.yaml`. The `Hessian` CLI command no longer rewrites `index_cfg.run_path` itself — it just passes `hessian_cfg` through to `build`. `trackstar.py` and `bergson/hessians/pipeline.py` continue to call `build()` without a `hessian_cfg`, so their custom subdir layouts are unchanged.
Reading back the method
The canonical way to identify the method that produced a saved directory is to load the on-disk config directly:
```python
from bergson.config import HessianConfig
method = HessianConfig.load_yaml(str(path / "hessian_config.yaml")).method
```
`mix_autocorrelation_matrices` uses this same one-liner inline (loop over `query_path` and `index_path`) — no extra helper is kept, since it was only needed in two places.
What is not changed
Breaking change
Users invoking `bergson hessian --method autocorrelation` and then pointing a downstream `--preprocess_cfg.hessian_path` at the run directory must update the path to include the trailing `/autocorrelation` segment, e.g.:
```diff
bergson hessian runs/foo --method autocorrelation
```
This matches the K-FAC convention which has always required `--hessian_method_path runs/foo/kfac`.
Verification
Ran on the cluster against Pythia-14m / pile-10k: