Skip to content

Latest commit

 

History

History
136 lines (96 loc) · 6.1 KB

File metadata and controls

136 lines (96 loc) · 6.1 KB

Contributing to Neural

Thank you for considering contributing to Neural. Every contribution helps make cache optimization more accessible to developers who can't afford to rewrite their applications from scratch.

🌟 How You Can Help

🔬 Perf Backend & Hardware Support (Most Valuable Contribution!)

Neural currently targets Intel/AMD x86 via perf mem. The biggest gaps in coverage are:

  • ARM SPE — Cortex-A75+ has Statistical Profiling Extension; the plumbing exists but needs a real device to validate
  • AMD IBS — Instruction-Based Sampling differs from Intel PEBS; parser edge cases likely
  • Virtualized environments — perf counter availability varies wildly across hypervisors; better detection and fallback logic is needed
  • Older kernelsperf mem API changed across kernel versions; compatibility testing welcome

🛠️ Other Ways to Contribute

  • New advice types — the 6 current types in morph/advice.py are a starting point; NUMA locality, TLB pressure, and prefetch-distance tuning are good candidates
  • Struct-level analysisobserve/dwarf.py has pyelftools integration but field-offset advice (STRUCT_REORDER, AOS_TO_SOA) is only emitted when DWARF info is present; improving robustness here has high impact
  • Bug fixes — especially around perf output parsing, which is fragile against kernel version differences
  • Documentation — real-world examples with measured before/after numbers are more useful than anything we can write ourselves
  • Testing — the shim and DWARF paths are hard to test without real hardware; contributions of test fixtures (perf output samples, ELF binaries with DWARF) are valuable
  • PerformanceAccessPatternAnalyzer is O(N·W) and fast enough for 50k events, but large workloads with millions of samples could use profiling

🗺️ Roadmap

Near-term

  • ARM SPE backend validation on real hardware
  • Better counter-only mode: symbol-level model from perf report even without perf mem
  • Smarter arena sizing using DWARF object sizes when available
  • neural diff — compare two runs side-by-side

Longer-term

  • Multi-process support (profiling servers, not just short-lived commands)
  • NUMA-aware arena placement
  • Export to established profiler formats (Firefox Profiler, Brendan Gregg flamegraphs)
  • IDE integration for inline advice

🚀 Getting Started

git clone https://github.com/your-org/neural.git
cd neural

# Install (no runtime deps — stdlib only)
pip install -e .

# Optional: struct-level DWARF resolution
pip install pyelftools

# Run tests (requires gcc for shim tests)
python3 -m unittest discover -s tests -p "test_*.py" -v

# Verify your perf setup
perf --version
cat /proc/sys/kernel/perf_event_paranoid  # should be ≤ 1 for full sampling

📋 Contribution Guidelines

Code Standards

  • Pure Python 3.10+, zero required runtime dependencies — keep it that way
  • New optional dependencies are fine but must degrade gracefully with a one-time warning (see observe/dwarf.py for the pattern)
  • Type annotations on all public functions
  • Tests for any new logic — tests/ uses unittest with no external test runner required
  • The shim is C; keep it C99-compatible and free of external libraries beyond libc and pthreads

Commit Style

Follow conventional commits:

feat: add ARM SPE sampling backend
fix: handle perf stat output with scaled counts on busy CPUs
test: add fixture for AMD IBS perf mem dump format
docs: add real-world Redis benchmark in USAGE.md

Pull Request Process

  1. Fork and create a branch from main
  2. Make changes with focused, descriptive commits
  3. Ensure python3 -m unittest discover -s tests -p "test_*.py" passes
  4. If touching the C shim, confirm tests/test_shim.py passes (requires gcc)
  5. Update ASSUMPTIONS.md if you made a design call the spec left ambiguous
  6. Open a PR with a description that includes:
    • What problem this solves
    • How you tested it (hardware used, kernel version, perf version if relevant)
    • Any known limitations

Adding a New Advice Type

  1. Add the detection method to morph/advice.py following the existing pattern
  2. Add a trigger test and a false-positive test to tests/test_advice.py
  3. Document the trigger conditions in CHECKLIST.md under §3.4

Adding a New Perf Backend

  1. Add a parser in profiler/core/parser.py
  2. Add detection in profiler/core/perf_checker.py
  3. Wire it into CacheProfiler._extract_samples() in profiler/core/profiler.py
  4. Add a test fixture in tests/ — even a small synthetic perf dump is better than nothing

🐛 Reporting Bugs

Open a GitHub issue and include:

  • perf --version and uname -r
  • cat /proc/sys/kernel/perf_event_paranoid
  • CPU model (lscpu | grep "Model name")
  • The exact neural command you ran
  • Full terminal output (the [neural] log lines are especially useful)

Perf output format varies across kernel versions and CPU vendors. The more environment detail you include, the faster we can reproduce it.

🤝 Community

  • GitHub Issues — bug reports, feature requests, hardware compatibility reports
  • Discussions — design questions, "is this the right approach" conversations

📚 Key Files to Read First

File Why
ASSUMPTIONS.md Design decisions and open questions — good starting points for contribution
neural/profiler/core/profiler.py The main pipeline; understand this before touching anything else
neural/observe/pattern.py Core learning logic
neural/morph/advice.py Where new advice types go
neural/morph/shim/neural_alloc.c The C shim — read this before touching arena or free-list logic

Questions? Open an issue — hardware-specific bugs especially benefit from back-and-forth.