Vector Masking by rgiunti · Pull Request #104 · pulp-platform/spatz

rgiunti · 2026-06-12T09:09:06Z

PR purpose

This PR introduces the vector masking for the Spatz supported instructions. It fixes and extend the opened PR #45 .

Main changes

Vector masking implementation follows the same scheme in all units then each unit specializes it to its own datapath.

Mask fetch. On a masked instruction (vm=0) whose mask has not been fetched yet, a dedicated FSM state reuses the VRF read ports to fetch v0 and latches it into a local register.
Datapath freeze during the fetch. While v0 is being read, the unit is stalled. This both prevents operating with a stale mask and avoids conflicts on the shared VRF read ports. In-flight memory responses are never dropped: they are buffered in the reorder buffer until the flow resumes.
Byte-granular expansion. The latched v0 is combinationally expanded into a VLEN-wide, byte-granular mask (vm_masking): each v0 bit replicated over 1/2/4/8 bytes according to vsew. The slice matching the instruction's current position in the vector is selected from it (slice base derived from the progress counters, rounded to the VRF word size).
Mask application. For each destination word, the mask slice is ANDed into the write-byte-enable: not masked bytes are simply not written, so they keep their previous value (mask-undisturbed semantics). This applies to loads (vrf_req_d.wbe = load_wbe & vm_wbe), arithmetic write-back (vreg_wbe), and slides (slide_wbe & vm_masking[...]). For stores, the same idea is applied to the memory strobe (mem_req_strb = store_strb & vm_strb). Since the store data is rotated for strided/indexed and unaligned accesses, the mask slice goes through the same rotations, so each mask bit always stays attached to its byte.
Mask-producing compares (VFCMP) are the only exception: the destination is a mask register, so the mask is ANDed into the result bits themselves (result & v0_bit) rather than into the write-byte-enable.

DiyouS · 2026-06-15T08:44:27Z

One question, do you have some synthesis results on the area overhead for the mask?

rgiunti · 2026-06-15T08:48:25Z

One question, do you have some synthesis results on the area overhead for the mask?

I'm make the synthesis right now in order to compare it with the reports obtained starting from the actual state of the main branch. Could you please tell me where the CI is failing right now? because I used my local CI before opening the draft PR and all seemed ok

DiyouS · 2026-06-15T09:39:41Z

One question, do you have some synthesis results on the area overhead for the mask?

I'm make the synthesis right now in order to compare it with the reports obtained starting from the actual state of the main branch. Could you please tell me where the CI is failing right now? because I used my local CI before opening the draft PR and all seemed ok

Not seems an actual failure to me. I will try to rerun the CI

DiyouS · 2026-06-15T14:31:30Z

All fake failures have resolved in new CI run, but there is still an issue with doublebw configuration in FFT kernel:
10 - spatzBenchmarks-rtl-dp-fft_M128_N2 (Timeout)

rgiunti · 2026-06-16T07:17:07Z

All fake failures have resolved in new CI run, but there is still an issue with doublebw configuration in FFT kernel: 10 - spatzBenchmarks-rtl-dp-fft_M128_N2 (Timeout)

Ok, I'm going to debug the issue. Meanwhile I obtained the synthesis results with GF22 at 1GHz and there are not timing loops and time violations. Regarding the area there is an overhead of 1,54% with masking.

DiyouS · 2026-06-16T07:18:24Z

All fake failures have resolved in new CI run, but there is still an issue with doublebw configuration in FFT kernel: 10 - spatzBenchmarks-rtl-dp-fft_M128_N2 (Timeout)

Ok, I'm going to debug the issue. Meanwhile I obtained the synthesis results with GF22 at 1GHz and there are not timing loops and time violations. Regarding the area there is an overhead of 1,54% with masking.

Sounds great! Thanks a lot!

Navaneeth-KunhiPurayil · 2026-06-16T08:19:54Z

+    if(spatz_req.op_arith.is_reduction == 1'b1) begin
+      case(spatz_req.op)
+        VADD: // VREDSUM_VS, VFREDUSUM_VS, VFREDOSUM_VS
+          reduction_useless_value = '0;


Maybe we can rename it to reduction_neutral_value or reduction_identity_value?

Navaneeth-KunhiPurayil · 2026-06-16T08:24:51Z

Thanks for the changes, Looks good to me!

rgiunti · 2026-06-16T14:25:53Z

Thanks for the changes, Looks good to me!

Perfect, thanks! I just need some time to fix the benchmark issue because I found an error in the indexed store. I hope to fix it soon, then we can perform another CI run.

…structions

DiyouS · 2026-06-26T07:17:29Z

Ventaglio tests failed. the computed results are somehow all zeros

rgiunti · 2026-06-26T08:15:50Z

Ventaglio tests failed. the computed results are somehow all zeros

The mistake I did was not to add ventaglio cfg to my local CI when I performed the test of Ventaglio with masking on top. Unfortunately since I'm not having the possibility to work with your CI I'm exposed to this kind of errors and I've to remember to update manually my CI every time a new cfg is added. I'm sorry, my bad. I'll try to fix the issues as soon as I can.

DiyouS · 2026-06-26T08:18:09Z

Ventaglio tests failed. the computed results are somehow all zeros

The mistake I did was not to add ventaglio cfg to my local CI when I performed the test of Ventaglio with masking on top. Unfortunately since I'm not having the possibility to work with your CI I'm exposed to this kind of errors and I've to remember to update manually my CI every time a new cfg is added. I'm sorry, my bad. I'll try to fix the issues as soon as I can.

Actually, we can think about moving some benchmarks to GitHub so that external people from Unibo and ChipsIT can also trigger CI. This should be much easier for you

rgiunti requested a review from Navaneeth-KunhiPurayil June 12, 2026 09:09

rgiunti self-assigned this Jun 12, 2026

rgiunti force-pushed the test/maoyuan/masking branch from f173c36 to d3c5f33 Compare June 12, 2026 09:26

rgiunti requested a review from DiyouS June 12, 2026 09:43

Navaneeth-KunhiPurayil reviewed Jun 16, 2026

View reviewed changes

rgiunti force-pushed the test/maoyuan/masking branch 2 times, most recently from 4cfe2ee to 36d78f8 Compare June 22, 2026 10:38

rgiunti marked this pull request as ready for review June 22, 2026 10:39

rgiunti requested a review from Navaneeth-KunhiPurayil June 22, 2026 10:39

rgiunti force-pushed the test/maoyuan/masking branch from 36d78f8 to 87bc3e9 Compare June 22, 2026 15:00

MaoyuanCai and others added 12 commits June 23, 2026 09:17

add vector masking instructions and masking support for arithmetic in…

c5a168d

…structions

add masking support for VLOAD instructions

e0e8c2b

add support for VSTORE masking on 64bit spatz

7d3dda5

Solve the v0_t_reading_done issues in VFU

1cb8400

add masking support for 64-bit spatz VSTORE

cf396e3

add masking support for VSLDU

69ada03

add 512-bit vm support for 4096-bit register groups

7de2d1a

[FIX] VFU masking

17d952e

[FIX] vlsu vm_masking

ca2c7a3

[FIX] vm_masking in vsldu

5613d50

[FIX/UPDATE] load/store tests

37fa32f

[FIX] reset vl=1 before VCMP

ef11028

rgiunti added 14 commits June 23, 2026 09:17

[TEST] uncomment mask test cases

99c379a

[FIX] masking for float comp instr

7c26d24

[OPT] removing unuseful read cycle

ee7a614

[FIX] rs1 val for vfslide1

5545fab

[FIX/TEST] fix masking tests for widening instr

27324db

[HW] add masking for idx store 32b cfg

942e678

[FIX/HW] vs2_elem_id_d trigger for idx instr

cbe4f15

[FIX/HW] vfu wbe generation for masking

9fdaa16

[FIX/HW] race btw v0 read vs. vd writeback

232f482

[FIX/SYNTH] fixed timing loop in VFU

d9174c5

[HW] doublebw VLSU masking

cbca251

[SW/FIX] VSET for masking in load/store tests

7318886

[LINT] trailing spaces, unuseful comments

e29d350

[FIX] err in rebase VFU

fce34f6

rgiunti force-pushed the test/maoyuan/masking branch from 87bc3e9 to fce34f6 Compare June 23, 2026 07:18

Uh oh!

Conversation

rgiunti commented Jun 12, 2026

PR purpose

Main changes

Uh oh!

DiyouS commented Jun 15, 2026

Uh oh!

rgiunti commented Jun 15, 2026

Uh oh!

DiyouS commented Jun 15, 2026

Uh oh!

DiyouS commented Jun 15, 2026

Uh oh!

rgiunti commented Jun 16, 2026

Uh oh!

DiyouS commented Jun 16, 2026

Uh oh!

Navaneeth-KunhiPurayil Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

rgiunti Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Navaneeth-KunhiPurayil commented Jun 16, 2026

Uh oh!

rgiunti commented Jun 16, 2026

Uh oh!

DiyouS commented Jun 26, 2026

Uh oh!

rgiunti commented Jun 26, 2026

Uh oh!

DiyouS commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants