Thank you for considering contributing to Neural. Every contribution helps make cache optimization more accessible to developers who can't afford to rewrite their applications from scratch.
Neural currently targets Intel/AMD x86 via perf mem. The biggest gaps in coverage are:
- ARM SPE — Cortex-A75+ has Statistical Profiling Extension; the plumbing exists but needs a real device to validate
- AMD IBS — Instruction-Based Sampling differs from Intel PEBS; parser edge cases likely
- Virtualized environments — perf counter availability varies wildly across hypervisors; better detection and fallback logic is needed
- Older kernels —
perf memAPI changed across kernel versions; compatibility testing welcome
- New advice types — the 6 current types in
morph/advice.pyare a starting point; NUMA locality, TLB pressure, and prefetch-distance tuning are good candidates - Struct-level analysis —
observe/dwarf.pyhas pyelftools integration but field-offset advice (STRUCT_REORDER,AOS_TO_SOA) is only emitted when DWARF info is present; improving robustness here has high impact - Bug fixes — especially around perf output parsing, which is fragile against kernel version differences
- Documentation — real-world examples with measured before/after numbers are more useful than anything we can write ourselves
- Testing — the shim and DWARF paths are hard to test without real hardware; contributions of test fixtures (perf output samples, ELF binaries with DWARF) are valuable
- Performance —
AccessPatternAnalyzeris O(N·W) and fast enough for 50k events, but large workloads with millions of samples could use profiling
- ARM SPE backend validation on real hardware
- Better counter-only mode: symbol-level model from
perf reporteven withoutperf mem - Smarter arena sizing using DWARF object sizes when available
neural diff— compare two runs side-by-side
- Multi-process support (profiling servers, not just short-lived commands)
- NUMA-aware arena placement
- Export to established profiler formats (Firefox Profiler, Brendan Gregg flamegraphs)
- IDE integration for inline advice
git clone https://github.com/your-org/neural.git
cd neural
# Install (no runtime deps — stdlib only)
pip install -e .
# Optional: struct-level DWARF resolution
pip install pyelftools
# Run tests (requires gcc for shim tests)
python3 -m unittest discover -s tests -p "test_*.py" -v
# Verify your perf setup
perf --version
cat /proc/sys/kernel/perf_event_paranoid # should be ≤ 1 for full sampling- Pure Python 3.10+, zero required runtime dependencies — keep it that way
- New optional dependencies are fine but must degrade gracefully with a one-time warning (see
observe/dwarf.pyfor the pattern) - Type annotations on all public functions
- Tests for any new logic —
tests/usesunittestwith no external test runner required - The shim is C; keep it C99-compatible and free of external libraries beyond
libcandpthreads
Follow conventional commits:
feat: add ARM SPE sampling backend
fix: handle perf stat output with scaled counts on busy CPUs
test: add fixture for AMD IBS perf mem dump format
docs: add real-world Redis benchmark in USAGE.md
- Fork and create a branch from
main - Make changes with focused, descriptive commits
- Ensure
python3 -m unittest discover -s tests -p "test_*.py"passes - If touching the C shim, confirm
tests/test_shim.pypasses (requiresgcc) - Update
ASSUMPTIONS.mdif you made a design call the spec left ambiguous - Open a PR with a description that includes:
- What problem this solves
- How you tested it (hardware used, kernel version, perf version if relevant)
- Any known limitations
- Add the detection method to
morph/advice.pyfollowing the existing pattern - Add a trigger test and a false-positive test to
tests/test_advice.py - Document the trigger conditions in
CHECKLIST.mdunder §3.4
- Add a parser in
profiler/core/parser.py - Add detection in
profiler/core/perf_checker.py - Wire it into
CacheProfiler._extract_samples()inprofiler/core/profiler.py - Add a test fixture in
tests/— even a small synthetic perf dump is better than nothing
Open a GitHub issue and include:
perf --versionanduname -rcat /proc/sys/kernel/perf_event_paranoid- CPU model (
lscpu | grep "Model name") - The exact
neuralcommand you ran - Full terminal output (the
[neural]log lines are especially useful)
Perf output format varies across kernel versions and CPU vendors. The more environment detail you include, the faster we can reproduce it.
- GitHub Issues — bug reports, feature requests, hardware compatibility reports
- Discussions — design questions, "is this the right approach" conversations
| File | Why |
|---|---|
ASSUMPTIONS.md |
Design decisions and open questions — good starting points for contribution |
neural/profiler/core/profiler.py |
The main pipeline; understand this before touching anything else |
neural/observe/pattern.py |
Core learning logic |
neural/morph/advice.py |
Where new advice types go |
neural/morph/shim/neural_alloc.c |
The C shim — read this before touching arena or free-list logic |
Questions? Open an issue — hardware-specific bugs especially benefit from back-and-forth.