Skip to content

1.1x prefill and decode speedup (attention/activations)#629

Merged
copybara-service[bot] merged 1 commit into
devfrom
test_773579903
Jun 20, 2025
Merged

1.1x prefill and decode speedup (attention/activations)#629
copybara-service[bot] merged 1 commit into
devfrom
test_773579903

Conversation

@copybara-service
Copy link
Copy Markdown

@copybara-service copybara-service Bot commented Jun 20, 2025

1.1x prefill and decode speedup (attention/activations)

Optimizations

  • Better load-balancing in attention threading
    (Previously, clusters were limited by #heads)
  • Add MulByConstTo to avoid zero-init
  • Parallel activations

Cleanup

  • Prepare for RowPtr in A or B
  • Pass through thread_id to ops
  • Avoid warning in bench_matmul

@copybara-service copybara-service Bot force-pushed the test_773579903 branch 3 times, most recently from 3ea86cf to 9b3a001 Compare June 20, 2025 15:49
Optimizations
- Better load-balancing in attention threading
(Previously, clusters were limited by #heads)
- Add MulByConstTo to avoid zero-init
- Parallel activations

Cleanup
- Prepare for RowPtr in A or B
- Pass through thread_id to ops
- Avoid warning in bench_matmul

PiperOrigin-RevId: 773723423
@copybara-service copybara-service Bot merged commit 0f70f28 into dev Jun 20, 2025
@copybara-service copybara-service Bot deleted the test_773579903 branch June 20, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant