[Query] Similar performance with/without polycc for matmul on AMD Ryzen 5 3600

This is just a query to clarify if I am doing something wrong on compiling the examples.

So I was following this tutorial: https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md. In the section where we see comparison of openblas/mkl with gcc, clang and pluto, I was expecting to see a similar improvement of about 5x to 10x with the tiled schedule, but I do not see any performance improvements. 

with pluto 
```
➜  matmul git:(master) ✗ make tiled
../../polycc matmul.c --noparallel  --second-level-tile  -o matmul.tiled.c
[pluto] compute_deps (isl)
[pluto] Number of statements: 1
[pluto] Total number of loops: 3
[pluto] Number of deps: 3
[pluto] Maximum domain dimensionality: 3
[pluto] Number of parameters: 3
[pluto] Diamond tiling not possible/useful
[pluto] Affine transformations [<iter coeff's> <param> <const>]

T(S1): (i, j, k)
loop types (loop, loop, loop)

[Pluto] After tiling:
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, j, k)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)

[Pluto] After intra-tile optimize
T(S1): (zT3/16, zT4/2, zT5/16, zT3, zT4, zT5, i, k, j)
loop types (loop, loop, loop, loop, loop, loop, loop, loop, loop)

[pluto] using statement-wise -fs/-ls options: S1(4,9), 
[Pluto] Output written to matmul.tiled.c

[pluto] Timing statistics
[pluto] SCoP extraction + dependence analysis time: 0.000710s
[pluto] Auto-transformation time: 0.002295s
[pluto] Tile size selection time: 0.000000s
[pluto]                 Total constraint solving time (LP/MIP/ILP) time: 0.000482s
[pluto] Code generation time: 0.023760s
[pluto] Other/Misc time: 0.075358s
[pluto] Total time: 0.102123s
[pluto] All times: 0.000710 0.002295 0.023760 0.075358
gcc -O3 -march=native -mtune=native -ffast-math -DTIME matmul.tiled.c -o tiled -lm

➜  matmul git:(master) ✗ ./tiled
3.056028s
5.62 GFLOPS
```

with plain gcc O3 flag
```
➜  matmul git:(master) ✗ gcc  matmul.c -o matmul.gcc -ffast-math -lm -DTIME -O3 -march=native -mtune=native
➜  matmul git:(master) ✗ ./matmul.gcc 
3.340085s
5.14 GFLOPS
```

and just to get the peak performance, below is the result of using openblas on the same machine
```
➜  matmul git:(master) ✗ ./openblas 
0.080084s
214.52 GFLOPS
```

Please let me know in case any additional information is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Query] Similar performance with/without polycc for matmul on AMD Ryzen 5 3600 #113

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Query] Similar performance with/without polycc for matmul on AMD Ryzen 5 3600 #113

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions