[Deeploy PR] NE16 Linear Layer Kernels#184
Closed
pauloohaha wants to merge 1 commit into
Closed
Conversation
- Add NE16 linear layer kernels, including a topology pass, NE16 templates, parsers, tile constraints, and bindings - The topology pass recognizes NE16-compatible GEMM layers, adjusts the weight layout for the NE16, and converts the requant shift/scale to the NE16 format - The template detects whether the input is signed; if so, it adds a +128 offset to the input during C runtime and compensates via the bias - Add GAP9 SDK-based Dequant/Quant templates using CNN_Copy.c kernels, replacing the generic templates - Add a generic DequantQuantMergePass that folds adjacent Dequant→Quant pairs into identity or RequantShift - Add a GAP9-specific TopologyOptimizer (GAP9Optimizer) to replace PULPOptimizer Bug fixes: - Add output signedness check in QuantChecker - Fix L3 DMA template (add proper casts) and remove the blocking L3 DMA hack - Isolate dory memory functions from other libraries in CMakeLists so they compile with -Og while compute kernels compile with -O3 - Disable PULPAddRequantMergePass due to incorrect pattern matching when Add has multiple consumers
Contributor
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR introduces NE16 (Neural Engine 16) backend support for GEMM operations on GAP9, replaces the L3 DMA blocking adapter pattern with direct instantiation, adds GAP9 SDK-specific quantization templates, and updates the build system to include NE16 kernel sources with a separate dory memory/DMA library. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Parser as NE16GEMMParser
participant Tiler as NE16GEMMTiler
participant Template as NE16GEMMTemplate
participant Kernel as NE16 SDK Kernel
Client->>Parser: Parse RequantizedGemm node (A, B, C, mul, scale_n)
Parser->>Parser: Validate 5 inputs & shift attribute
Parser-->>Client: Return parse success with context mapping
Client->>Tiler: Apply NE16GEMMTileConstraint
Tiler->>Tiler: Add geometrical constraints (M, O, N dimensions)
Tiler->>Tiler: Add policy constraints (N untiled, O divisible by 32)
Tiler-->>Client: Return tiling solution with cubes
Client->>Template: Align to context with operatorRepresentation
Template->>Template: Derive signedness from type metadata
Template->>Template: Compute weight layout (bitplane-pack with +128 offset)
Template->>Template: Apply input bias compensation (128 * w_sum)
Template->>Template: Rescale per-channel scales
Template-->>Client: Return updated context & schedule
Client->>Kernel: Execute generated code
Kernel->>Kernel: Convert int8 inputs to uint8 (+128 offset)
Kernel->>Kernel: Perform NE16 1x1 GEMM
Kernel->>Kernel: Apply requantization
Kernel-->>Client: Return output
sequenceDiagram
participant Graph as Computation Graph
participant Pass as DequantQuantMergePass
participant Merger as Scale/ZeroPoint Analyzer
participant Optimizer as Requantization Builder
Graph->>Pass: Match Dequant→Quant pattern
Pass->>Merger: Compute effective scaling ratio
Merger->>Merger: Extract Dequant scale/zero_point
Merger->>Merger: Extract Quant scale/zero_point
Merger-->>Pass: Return scaling ratio & zero_point deltas
alt Ratio ≈ 1.0 & zero_points ≈ 0 & signed==true
Pass->>Graph: Rewire Quant consumers to Dequant input
Pass->>Graph: Remove Dequant & Quant nodes
Pass-->>Graph: Identity path (no intermediate compute)
else Fallback to requantization
Pass->>Optimizer: Compute integer mul/add from scales
Optimizer->>Optimizer: Use right shift 2**16 for quantization
Optimizer-->>Pass: Return mul/add constants
Pass->>Graph: Insert RequantShift node
Pass-->>Graph: Requantized path
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the intent of your PR here.
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.