Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
build/
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ if(APPLE)
endif()


add_executable(matmul main_ans.cpp)
add_executable(matmul main.cpp)


if(OpenMP_CXX_FOUND)
Expand Down
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,3 +235,37 @@ git push origin student-name
- Use small test cases to debug your blocked and parallel implementations.

Good luck, and enjoy optimizing your matrix multiplication!

---

## Performance Results

**System**: Linux x86-64, GCC 14.2, OpenMP 4.5, 16-core CPU
**Threads**: `OMP_NUM_THREADS=4`
**Block size**: 32 (default for `blocked_matmul`)
**Timing**: `omp_get_wtime()` wall-clock time
**All implementations validated** against `output.raw` — all correct.

| Test Case | Dimensions (m × n × p) | Naive Time (s) | Blocked Time (s) | Parallel Time (s) | Blocked Speedup | Parallel Speedup |
|-----------|------------------------|----------------|------------------|-------------------|-----------------|------------------|
| 0 | 64 × 64 × 64 | 0.002058 | 0.001546 | 0.000737 | 1.33× | 2.79× |
| 1 | 128 × 64 × 128 | 0.004927 | 0.004216 | 0.002521 | 1.17× | 1.95× |
| 2 | 100 × 128 × 56 | 0.005118 | 0.003347 | 0.001811 | 1.53× | 2.83× |
| 3 | 128 × 64 × 128 | 0.003910 | 0.004345 | 0.002394 | 0.90× | 1.63× |
| 4 | 32 × 128 × 32 | 0.000465 | 0.000548 | 0.000589 | 0.85× | 0.79× |
| 5 | 200 × 100 × 256 | 0.023371 | 0.020785 | 0.011280 | 1.12× | 2.07× |
| 6 | 256 × 256 × 256 | 0.061720 | 0.066638 | 0.032397 | 0.93× | 1.91× |
| 7 | 256 × 300 × 256 | 0.073160 | 0.081120 | 0.036994 | 0.90× | 1.98× |
| 8 | 64 × 128 × 64 | 0.001940 | 0.002197 | 0.001349 | 0.88× | 1.44× |
| 9 | 256 × 256 × 257 | 0.060754 | 0.068660 | 0.031667 | 0.88× | 1.92× |

### Notes on Results

**Blocked matrix multiplication (block_size = 32)**
The blocked implementation shows modest speedup for certain matrix shapes (cases 0, 2) and slight overhead in others. The test matrices are relatively small, so most data already fits in L2/L3 cache even for the naive approach. The benefit of blocking is most pronounced for large matrices (e.g. 512×512 and above) where cache thrashing becomes significant. The inner loop order used is `i-k-j`, which hoists `A[i][k]` into a register and accesses `B[k][j]` and `C[i][j]` sequentially — better cache behaviour than the `i-j-k` order from the pseudocode.

**Parallel matrix multiplication (OpenMP, 4 threads)**
The parallel implementation consistently outperforms naive, achieving 1.4×–2.8× speedup with 4 threads. The `#pragma omp parallel for schedule(static)` directive distributes rows of C across threads; since each row is written independently there are no race conditions or false-sharing issues. With 8 threads the speedup rises to ~3.5× on the 256×256 cases.

**Why parallel speedup < 4× with 4 threads?**
Amdahl's Law and thread-launch overhead dominate for small matrices (cases 4, 8). For larger matrices the parallel efficiency improves towards the theoretical maximum.
65 changes: 65 additions & 0 deletions data/0/result.raw

Large diffs are not rendered by default.

129 changes: 129 additions & 0 deletions data/1/result.raw

Large diffs are not rendered by default.

101 changes: 101 additions & 0 deletions data/2/result.raw

Large diffs are not rendered by default.

129 changes: 129 additions & 0 deletions data/3/result.raw

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions data/4/result.raw
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
32 32
475.48 468.33 428.82 481.55 506.37 507.78 537.87 463.84 506.76 414.39 512.99 487.33 465.36 468.75 485.41 480.73 459.31 454.19 438.14 440.61 469.26 463.56 467.42 469.99 442.77 452.45 489.42 432.51 484.39 448.8 490.93 473.63
563.96 559.42 499.7 551.2 587.21 583.6 606.83 511.74 591.58 471.29 558.71 579.73 558.13 563.36 499.87 564.94 547.51 510.83 560.9 510.42 542.97 542.44 554.6 528.53 508.94 483.98 568.68 521.48 587.48 520.53 602.32 575
526.89 504.29 463.09 525.55 544.22 557.04 582.19 454.76 569.98 461.31 547.72 574.63 545.01 547.69 474.66 514.61 514.37 502.19 497.54 490.26 550.37 485.94 493.84 518.77 478.02 455.41 513.96 487.73 565.01 521.46 534.48 541.33
527.69 501.52 431.58 505.09 538.45 569.18 539.08 478.47 546.58 437.63 564.85 549.75 517.79 523.51 472.08 532.48 493.99 511.72 506.92 473.54 534.28 503.49 514.13 526.81 479.08 490.39 502.89 470.83 525.65 525.2 551.06 550.77
521.43 511.76 454.23 495.91 508.85 527.29 538.7 444.46 540.78 456.87 531.08 559.75 508.71 507.2 466.09 498.8 524.41 519.46 519.81 503.67 503.47 503.98 507.03 514.94 468.69 486 529.06 466.93 540.25 465.04 573.17 526.66
516.59 482.96 462.42 505.3 549.25 542.29 526.66 451.21 524.93 451.58 535.14 548.72 522.8 524.97 450.6 508.86 498.94 479.53 524.67 452.56 517.53 516.33 512.98 530.95 488.64 459.11 515.22 468.33 553.04 492.32 559.73 524.01
554.14 542.7 476.12 515.42 555.24 568.66 585.5 464.31 561.49 468.58 550.58 564.56 542.6 528.67 493.69 562.81 495.41 490.55 520.68 507.3 539.17 529.15 510.29 504.64 478.61 460.08 528.51 492.25 568.51 515.03 585.8 535.72
535.47 534.2 476.11 491.1 563.8 539.43 566.03 511.29 575.43 458.74 559.46 585.64 535.67 543.67 493.8 537.73 527.94 472.77 516.49 479.29 531.96 527.25 508.61 510.15 491.98 451.28 558.5 462.3 540.5 499.05 568.2 533.94
504.86 508.32 474.81 505.95 553.57 561.24 594.8 468.55 566.61 455.2 554.8 561.19 520.53 543.19 492.65 549.69 533.97 493.87 494.95 489.46 539.15 522.58 508.3 523.41 488.31 508.7 540.94 464.27 551.35 524.16 541.52 558.8
513.49 514.67 442.63 532.94 504.83 553.06 561.27 463.98 558.56 462.44 555.69 582.45 509.16 513.36 466.79 520.7 520.12 479.48 479.13 489.83 497.55 527.59 511.66 486.07 468.67 494.62 504.64 463.82 543.11 492.82 525.12 506.16
468.81 448.08 406.67 457.58 487.39 508.43 519.29 419.94 495.78 400.22 466.01 453.41 427.29 462.35 436.53 474.76 454.06 445.47 450.22 419.92 481.98 450.29 451.45 433.57 425 437.53 442.25 416.21 488.42 477.04 461.83 498.17
509.81 481.11 433.9 472.62 509.25 514.42 533.8 475.82 511.07 414.89 513.16 562.52 489.08 500.41 480.73 484.38 455.12 464.45 497.08 488.99 528.63 461.29 506.07 498.52 477.87 461.54 482.87 457.14 515.94 483.2 531.67 508.39
583.14 575.92 509.59 561.34 608.86 638.31 648.9 528.51 634.43 511.6 614.87 604.46 574.19 590.26 530.75 587.03 602.11 555.17 574.93 543.08 585.71 585.49 595.45 540.06 535.64 568.89 594.57 520.25 604.18 559.27 617.51 596.06
498.17 485.09 450.29 483.99 546.45 550.83 564.12 479.21 539.18 464.1 545.02 545.05 495.44 504.09 460.43 498.28 510.73 522.57 486.51 441.93 555.5 483.02 496.36 461.5 469.49 476.05 501.89 472.57 530.46 521.82 531.14 551.67
521.48 474.59 459.22 489.14 494.89 546.13 533.78 446.01 530.67 431.26 517.06 483 489.08 530.55 484.21 510.7 479.65 494.09 484.22 478.68 502.32 461.4 507.06 502.8 452.32 470.17 468.3 454.65 528.87 501.22 507.6 498.58
539.56 519.44 511.98 522.8 567.99 576.72 602.07 494.86 600.39 468.32 560.76 570.01 556.43 537.99 520.96 552.86 533.1 483.73 521.24 512.49 531.88 555.02 541.73 523.65 494.02 486.02 557.46 484.28 569.13 515.01 568.72 548.91
499.13 491.77 444.68 489.88 477.95 528.96 551.4 456.04 550.95 426.85 530.07 546.66 496.53 506.1 501.17 517.78 501.36 452.43 470.17 460.23 518.04 469.35 502.29 484.3 451.27 462.98 506.23 451.99 549.18 495.47 499.05 524.56
510.41 514.57 432.8 497.66 524.03 526.07 569.52 440.96 546.9 425.71 509.29 545.24 506.96 521.64 456.93 528.41 533.05 483.69 498.76 494.05 507.62 481.75 473.87 496.75 450.29 453.51 521.27 455.21 521.48 478.97 544.38 562.27
484.66 475.57 409.5 478.58 508.52 510.89 522.22 450 503.99 409.55 507.84 495.76 494.6 468.68 461.59 499.23 476.62 445.97 481.07 470.59 488.8 471.38 475.21 480.36 418.17 452.41 511.95 431.47 493.17 465.76 529.12 498.23
465.72 494.62 410.83 495.06 509.45 492.76 528.56 440.89 540.17 411.82 512.76 487.04 472.91 468.51 479.56 503.65 469.65 441.4 477.23 397.61 492.65 470.97 484.67 475.3 439.85 448.36 452.75 449.17 472.73 442.78 498.11 483.87
490.2 471.19 412.85 426.13 477.28 490.09 523.4 466.82 473.83 415.53 497.44 481.5 463.41 478.42 445.49 468.81 466.76 457 460.85 450.65 480.61 440.46 464.97 447.23 458.11 421.43 476.14 410.54 442.82 450.53 484.67 505.4
569.29 552.8 482.64 551.93 550.29 571.53 634.22 484.46 597.08 481.79 593.92 599.19 555.44 552.82 530.36 569.72 556.89 534.45 557.33 546.03 538.65 571.86 557.16 549.47 515.62 519.85 550.22 524.96 584.62 545.14 579.94 576.74
610.89 594.67 499.65 547.87 611.23 615.66 589.49 537.15 598.72 514.93 589.19 636.07 563.96 586.69 547.19 585.13 604.06 563.87 538.81 542.92 600.44 544.32 572.56 555.58 532.21 542.49 589.79 541.22 590.62 577.58 632.63 612.54
534.72 488.14 420.4 482.6 515.09 530.22 557.14 500.47 538.45 432.27 526.36 552.9 518.56 496.33 452.91 519.8 489.99 491.14 494.72 491.37 514.77 480.23 497.92 501.12 472.37 455.08 542.21 446.94 497.88 496.19 532.3 505.75
537.51 530.01 467.78 525.36 538.95 539.67 572.65 466.25 564.36 459.51 537 541.93 501.71 492.27 498.64 542.55 518.82 442.58 469.67 472.01 547.83 498.29 498.86 473.48 486.27 484.42 521.37 489.52 544.69 487.91 534.65 528.82
486.96 471.09 423.73 476.49 516.28 528.93 521.82 437.57 504.82 420.98 511.42 528.16 511.45 497.65 439.61 479.45 479.19 462.38 458.77 461.98 517.09 470.63 519.76 475.13 457.02 442.9 453.4 432.31 518.16 479.11 510.35 501.77
507.38 486.02 457.1 489.74 511.57 505.54 568.87 440.42 544.02 443.98 542.61 566.87 508.68 512.97 479.04 507.41 506.15 461.57 490.76 484.41 551.57 514.06 506.82 476.86 473.06 472.7 486.31 458.62 557.86 503.79 518.5 545.75
543.03 502.49 463.87 496.4 513.75 527.98 564.68 459.89 551.61 452.32 533.23 539.88 517.65 516.3 478.04 543.05 543.31 471.26 490.59 485.11 500.81 515.12 509.74 509.18 505.03 503.9 518.88 479.92 533.24 493.2 536.26 523.15
492.73 489.15 404.26 494.58 499.12 547.25 512.92 480.52 520.84 440.2 540.02 513.55 484.39 483.82 438.03 496.36 525.23 489.12 475.1 479.17 515.01 490.01 507.99 505.14 471.49 473.51 492.3 461.67 536.08 482.45 536.69 511.38
470.88 469.49 445.51 468.28 481.9 468.55 517.38 417.82 524.07 436.13 497.58 480.71 462.88 509.88 444.71 484.97 462.46 451.58 436.21 435.57 454.28 459.33 466.95 452.7 449.5 428.71 472.57 417.98 510.16 440.92 476.38 487.37
489.98 485.12 443.22 497.87 532.14 557.1 560.34 490 536.71 425.54 525.71 525.49 507.46 535.97 508.98 512.06 519.52 489.35 471.04 477.69 548.62 474.13 503.94 515.43 489.06 475.79 510.13 432.11 523.08 511.93 518.67 556.13
510.04 493.35 453.56 497.04 526.66 562.02 570.9 459.72 548.44 432.93 551.48 500.69 521.26 497.8 462.95 523.29 494.65 471.26 490.03 450.88 505.25 487.01 492.41 483.57 465.25 475.45 498 485.69 571.22 489.51 538.17 534.01
Loading