x86 gather instrs such as the following that have index element size > data element size are treated by the DR decoder in a way that's different from the x86 manual and other tools.
vpgatherqd/vgatherqps xmm0 {k1}, [xax + ymm1 * 4]
vpgatherqd/vgatherqps ymm0 {k1}, [xax + zmm1 * 4]
Here the xmm0 or ymm0 respectively is inflated to ymm0 or zmm0 size by DR.
E.g.,
INSTR_CREATE_vgatherqps_mask(
dc,
opnd_create_reg(DR_REG_XMM0),
opnd_create_reg(DR_REG_K1),
opnd_create_base_disp(DR_REG_RAX, DR_REG_YMM1, 4, 0, OPSZ_4));
disassembles into
vgatherqps {%k1} (%rax,%ymm1,4)[4byte] -> %xmm0 %k1
but if it is encoded and then decoded, we get:
vgatherqps {%k1} (%rax,%ymm1,4)[4byte] -> %ymm0 %k1
where the xmm0 is inflated to ymm0.
This is still functionally the same because such gathers (with index size > data size), do not write to the full ymm, but only its first half (which is the xmm) and zero out the rest.
vgatherqps uses OPSZ_16_vex32_evex64 which is 16B, 32B, or 64B based on the prefix (PREFIX_VEX_L, PREFIX_EVEX_LL). As I understand the ymm0 makes DR use 32B vector length for the instr. So the following then "inflates" the dest into a ymm too.
Xref
|
bool operand_is_ymm = (TEST(PREFIX_EVEX_LL, di->prefixes) && |
|
return (TEST(PREFIX_EVEX_LL, di->prefixes) |
x86 gather instrs such as the following that have index element size > data element size are treated by the DR decoder in a way that's different from the x86 manual and other tools.
Here the xmm0 or ymm0 respectively is inflated to ymm0 or zmm0 size by DR.
E.g.,
disassembles into
but if it is encoded and then decoded, we get:
where the xmm0 is inflated to ymm0.
This is still functionally the same because such gathers (with index size > data size), do not write to the full ymm, but only its first half (which is the xmm) and zero out the rest.
vgatherqps uses OPSZ_16_vex32_evex64 which is 16B, 32B, or 64B based on the prefix (PREFIX_VEX_L, PREFIX_EVEX_LL). As I understand the ymm0 makes DR use 32B vector length for the instr. So the following then "inflates" the dest into a ymm too.
Xref
dynamorio/core/ir/x86/decode.c
Line 1545 in af5c1cf
dynamorio/core/ir/x86/decode.c
Line 294 in af5c1cf