Skip to content

HDDS-15612 allow incremental CRC calculation on the client side#10546

Draft
yandrey321 wants to merge 2 commits into
apache:masterfrom
yandrey321:HDDS-15612
Draft

HDDS-15612 allow incremental CRC calculation on the client side#10546
yandrey321 wants to merge 2 commits into
apache:masterfrom
yandrey321:HDDS-15612

Conversation

@yandrey321

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Use incremental CRC32/CRC32C for calculating CRC inside BlockOutputStream.write(), it allows reduces overhead of CRC calculation when multiple streams are used across multiple threads. The best results in synthetic tests are archived when number of writes are between number of CPU cores and 3x CPU cores on the system, with smaller number of threads there are no statistically significant difference.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15612

How was this patch tested?

run benchmark before and after the changes in running CRC (CRC32/CRC32C). Also tested with SHA which have around 90% penalty comparing to NONE.

DN latency = 0:

write | thr | CRC32 before | CRC32 after | improvement

64KB | 1 | 5,121 | 5,108 | -0.3%
64KB | 4 | 16,954 | 17,613 | +3.9%
64KB | 7 | 25,187 | 25,466 | +1.1%
64KB | 14 | 29,061 | 34,761 | +19.6%
64KB | 28 | 38,320 | 41,589 | +8.5%
64KB | 42 | 39,305 | 43,455 | +10.6%
256KB | 1 | 5,000 | 5,021 | +0.4%
256KB | 4 | 17,056 | 17,363 | +1.8%
256KB | 7 | 25,438 | 26,414 | +3.8%
256KB | 14 | 29,296 | 35,634 | +21.6%
256KB | 28 | 36,450 | 44,180 | +21.2%
256KB | 42 | 42,394 | 47,925 | +13.0%
512KB | 1 | 4,935 | 5,052 | +2.4%
512KB | 4 | 17,078 | 17,511 | +2.5%
512KB | 7 | 25,437 | 26,175 | +2.9%
512KB | 14 | 32,701 | 34,489 | +5.5%
512KB | 28 | 36,029 | 40,330 | +11.9%
512KB | 42 | 41,390 | 46,210 | +11.6%
1MB | 1 | 4,890 | 4,587 | -6.2%
1MB | 4 | 17,098 | 17,369 | +1.6%
1MB | 7 | 25,111 | 26,208 | +4.4%
1MB | 14 | 29,831 | 31,801 | +6.6%
1MB | 28 | 34,467 | 42,697 | +23.9%
1MB | 42 | 39,096 | 45,694 | +16.9%
2MB | 1 | 4,949 | 5,043 | +1.9%
2MB | 4 | 17,016 | 17,436 | +2.5%
2MB | 7 | 25,217 | 26,036 | +3.2%
2MB | 14 | 28,417 | 30,928 | +8.8%
2MB | 28 | 34,679 | 39,171 | +13.0%
2MB | 42 | 37,594 | 45,896 | +22.1%
4MB | 1 | 4,906 | 5,011 | +2.2%
4MB | 4 | 16,681 | 16,834 | +0.9%
4MB | 7 | 24,572 | 25,490 | +3.7%
4MB | 14 | 28,744 | 33,360 | +16.1%
4MB | 28 | 35,084 | 40,057 | +14.2%
4MB | 42 | 38,310 | 43,410 | +13.3%

DN latency = 2ms

write | thr | CRC32 before | CRC32 after | improvement

64KB | 1 | 3,573 | 3,684 | +3.1%
64KB | 4 | 13,190 | 13,135 | -0.4%
64KB | 7 | 20,193 | 22,076 | +9.3%
64KB | 14 | 33,002 | 34,220 | +3.7%
64KB | 28 | 36,779 | 41,195 | +12.0%
64KB | 42 | 41,572 | 46,690 | +12.3%
256KB | 1 | 3,637 | 3,616 | -0.6%
256KB | 4 | 13,183 | 13,220 | +0.3%
256KB | 7 | 21,000 | 21,983 | +4.7%
256KB | 14 | 31,295 | 37,303 | +19.2%
256KB | 28 | 37,701 | 42,588 | +13.0%
256KB | 42 | 42,186 | 45,956 | +8.9%
512KB | 1 | 3,609 | 3,620 | +0.3%
512KB | 4 | 13,176 | 13,275 | +0.8%
512KB | 7 | 21,631 | 20,549 | -5.0%
512KB | 14 | 31,191 | 36,158 | +15.9%
512KB | 28 | 38,879 | 45,905 | +18.1%
512KB | 42 | 41,783 | 46,745 | +11.9%
1MB | 1 | 3,595 | 3,600 | +0.2%
1MB | 4 | 12,997 | 13,166 | +1.3%
1MB | 7 | 20,974 | 20,668 | -1.5%
1MB | 14 | 30,689 | 37,095 | +20.9%
1MB | 28 | 38,903 | 42,712 | +9.8%
1MB | 42 | 41,657 | 46,474 | +11.6%
2MB | 1 | 3,583 | 3,591 | +0.2%
2MB | 4 | 13,038 | 13,244 | +1.6%
2MB | 7 | 20,232 | 21,560 | +6.6%
2MB | 14 | 30,324 | 33,905 | +11.8%
2MB | 28 | 38,330 | 42,388 | +10.6%
2MB | 42 | 40,479 | 45,859 | +13.3%
4MB | 1 | 3,556 | 3,588 | +0.9%
4MB | 4 | 13,089 | 13,020 | -0.5%
4MB | 7 | 19,647 | 20,461 | +4.1%
4MB | 14 | 28,546 | 32,630 | +14.3%
4MB | 28 | 36,946 | 40,711 | +10.2%
4MB | 42 | 38,476 | 43,831 | +13.9%

@adoroszlai adoroszlai marked this pull request as draft June 21, 2026 11:13
@adoroszlai

Copy link
Copy Markdown
Contributor
hadoop-hdds/client/src/test/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStreamWriteBenchmark.java
 265: Line is longer than 120 characters (found 121).
 271: 'for' child has incorrect indentation level 14, expected level should be 12.
 272: 'checksumType' has incorrect indentation level 16, expected level should be 18.
 430: More than 7 parameters (found 8).
 609: More than 7 parameters (found 8).
Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.15.0:testCompile (default-testCompile) on project hdds-client: Compilation failure
hadoop-hdds/client/src/test/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStreamWriteBenchmark.java:[564,18] cannot find symbol
  symbol:   variable ALLOCATE_DIRECT
  location: interface org.apache.hadoop.ozone.common.ChunkBuffer

@errose28

Copy link
Copy Markdown
Contributor

We were looking at this during the US community sync today and had a hard time following what is happening. Can you:

  • Get the markdown table to render correctly.
  • Provide units for all measurements. For example:
    • We weren't sure whether thr was threads or throughput. If throughput it needs a unit.
    • It's unclear what the unit on CRC32 before/after is. ops/sec?
    • The stray DN Latency lines in the description need more context
  • Specify what this will look like in a real cluster outside of the simulated environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants