Skip to content

Exp function lacks SIMD implementation - requires AVX optimization #7

@tphakala

Description

@tphakala

Problem

The Exp (exponential) function currently only has a Pure Go implementation using math.Exp. It lacks SIMD optimizations for both AMD64 and ARM64 platforms.

Current status:

  • ✅ f32/Exp and f64/Exp exposed in public API
  • ✅ Pure Go implementation with overflow protection
  • ❌ No AVX implementation (AMD64)
  • ❌ No NEON implementation (ARM64)

Implementation Details

Current Pure Go Code

// e^x with clamping to prevent overflow
// Clamps to ±709 (f64) or ±88 (f32) to prevent overflow
func exp32Go(dst, src []float32) {
    for i := range dst {
        x := src[i]
        // Clamp extreme values
        if x > 88.0 {
            dst[i] = math.Inf(1)
        } else if x < -88.0 {
            dst[i] = 0
        } else {
            dst[i] = float32(math.Exp(float64(x)))
        }
    }
}

Performance Opportunity

Based on benchmarks of similar activation functions:

  • AMD64 AVX: Potential 10-30x speedup
  • ARM64 NEON: Potential 5-15x speedup

Reference Performance (Sigmoid at 1024 elements)

  • AMD64 AVX: 43x speedup @ 59.3 GB/s
  • ARM64 NEON: Better throughput with vector operations

Implementation Requirements

AMD64 AVX (f32)

- Load source values into YMM registers (8x float32)
- Clamp values to ±88.0 using VCMPPS + VBLENDVPS
- Compute e^x using polynomial approximation or exp2 + scale
- Handle overflow: set to ±inf for clamped values
- Store results with VST1
- Include scalar remainder handling

AMD64 AVX (f64)

- Similar to f32 but with XMM registers (4x float64)
- Clamp values to ±709.0
- Compute e^x with higher precision

ARM64 NEON (both f32 and f64)

- Vector clamping with FCMGT/FCMLT
- Polynomial approximation for e^x
- Handle extreme values with saturation
- Scalar remainder handling

Math Approximation

For SIMD implementations, consider:

  1. Polynomial approximation (fast, good accuracy)

    e^x ≈ 1 + x + x²/2! + x³/3! + ... + x^n/n!
    

    Horner's method for numerical stability

  2. Exp2-based approach (alternative)

    e^x = 2^(x / ln(2))
    Decompose x = k + f where k is integer, 0 ≤ f < 1
    e^x = 2^k * e^f
    
  3. Use existing SIMD exp functions (if available in SVML or similar)

References

Related Issues

Priority

Medium - Exp is less commonly used than Sigmoid/Tanh/ReLU in neural networks, but still valuable for certain applications. Pure Go fallback is acceptable.

Acceptance Criteria

  • AVX implementation for f32 Exp (AMD64)
  • AVX implementation for f64 Exp (AMD64)
  • NEON implementation for f32 Exp (ARM64)
  • NEON implementation for f64 Exp (ARM64)
  • Benchmarks showing speedup over Pure Go
  • Tests passing on all platforms
  • Proper overflow handling (clamp to ±inf)
  • Documentation updated with performance data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions