Problem
The Exp (exponential) function currently only has a Pure Go implementation using math.Exp. It lacks SIMD optimizations for both AMD64 and ARM64 platforms.
Current status:
- ✅ f32/Exp and f64/Exp exposed in public API
- ✅ Pure Go implementation with overflow protection
- ❌ No AVX implementation (AMD64)
- ❌ No NEON implementation (ARM64)
Implementation Details
Current Pure Go Code
// e^x with clamping to prevent overflow
// Clamps to ±709 (f64) or ±88 (f32) to prevent overflow
func exp32Go(dst, src []float32) {
for i := range dst {
x := src[i]
// Clamp extreme values
if x > 88.0 {
dst[i] = math.Inf(1)
} else if x < -88.0 {
dst[i] = 0
} else {
dst[i] = float32(math.Exp(float64(x)))
}
}
}
Performance Opportunity
Based on benchmarks of similar activation functions:
- AMD64 AVX: Potential 10-30x speedup
- ARM64 NEON: Potential 5-15x speedup
Reference Performance (Sigmoid at 1024 elements)
- AMD64 AVX: 43x speedup @ 59.3 GB/s
- ARM64 NEON: Better throughput with vector operations
Implementation Requirements
AMD64 AVX (f32)
- Load source values into YMM registers (8x float32)
- Clamp values to ±88.0 using VCMPPS + VBLENDVPS
- Compute e^x using polynomial approximation or exp2 + scale
- Handle overflow: set to ±inf for clamped values
- Store results with VST1
- Include scalar remainder handling
AMD64 AVX (f64)
- Similar to f32 but with XMM registers (4x float64)
- Clamp values to ±709.0
- Compute e^x with higher precision
ARM64 NEON (both f32 and f64)
- Vector clamping with FCMGT/FCMLT
- Polynomial approximation for e^x
- Handle extreme values with saturation
- Scalar remainder handling
Math Approximation
For SIMD implementations, consider:
-
Polynomial approximation (fast, good accuracy)
e^x ≈ 1 + x + x²/2! + x³/3! + ... + x^n/n!
Horner's method for numerical stability
-
Exp2-based approach (alternative)
e^x = 2^(x / ln(2))
Decompose x = k + f where k is integer, 0 ≤ f < 1
e^x = 2^k * e^f
-
Use existing SIMD exp functions (if available in SVML or similar)
References
Related Issues
Priority
Medium - Exp is less commonly used than Sigmoid/Tanh/ReLU in neural networks, but still valuable for certain applications. Pure Go fallback is acceptable.
Acceptance Criteria
Problem
The
Exp(exponential) function currently only has a Pure Go implementation usingmath.Exp. It lacks SIMD optimizations for both AMD64 and ARM64 platforms.Current status:
Implementation Details
Current Pure Go Code
Performance Opportunity
Based on benchmarks of similar activation functions:
Reference Performance (Sigmoid at 1024 elements)
Implementation Requirements
AMD64 AVX (f32)
AMD64 AVX (f64)
ARM64 NEON (both f32 and f64)
Math Approximation
For SIMD implementations, consider:
Polynomial approximation (fast, good accuracy)
Horner's method for numerical stability
Exp2-based approach (alternative)
Use existing SIMD exp functions (if available in SVML or similar)
References
Related Issues
Priority
Medium - Exp is less commonly used than Sigmoid/Tanh/ReLU in neural networks, but still valuable for certain applications. Pure Go fallback is acceptable.
Acceptance Criteria