Add small key type tests#812
Conversation
|
/ok to test 7b0beb1 |
|
/ok to test 323adf8 |
323adf8 to
e77f232
Compare
|
/ok to test 281721b |
|
/ok to test 281721b |
281721b to
390844b
Compare
|
/ok to test 61993aa |
61993aa to
22ad7ae
Compare
|
/ok to test 22ad7ae |
|
/ok to test 6dc9527 |
|
CI passes. Nice work! I'll give it a review on Monday or Tuesday depending on my workload. In the meantime, we can prepare follow up PRs: (1) replicate the same test extensions to all data structures, also sparsely testing combinations of differenty sized key/value types (e.g. 1B keys and 4B payloads), (2) go over the docs, readme, and asserts to make sure they reflect the new type constraints. |
|
Thanks @sleeepyjack. Is there an easier way to reproduce CI locally in the future? |
| cuda::proclaim_return_type<Key>(build_fn{})); | ||
|
|
||
| set.insert_async(keys_begin, keys_begin + num_keys); | ||
| CUCO_CUDA_TRY(cudaDeviceSynchronize()); |
There was a problem hiding this comment.
question: why is this needed?
There was a problem hiding this comment.
It shouldn't be required since it is running on the default stream IIUC. It's probably a red herring trying to fix open bug #804.
| template <> | ||
| struct packed<sizeof(uint8_t)> { | ||
| using type = uint8_t; ///< Packed type as `uint8_t` if the size of the object is 1 | ||
| }; |
There was a problem hiding this comment.
We pack paired data (e.g., a 32-bit key and 32-bit value) into a single 64-bit unit for CAS, used only in map scenarios. Therefore, the smallest CAS we need is 8 bits (1B), and the smallest packed type is 2B, making the 1B overload unnecessary.
| template <> | |
| struct packed<sizeof(uint8_t)> { | |
| using type = uint8_t; ///< Packed type as `uint8_t` if the size of the object is 1 | |
| }; |
| template <> | ||
| struct packed<sizeof(uint16_t)> { | ||
| using type = uint16_t; ///< Packed type as `uint16_t` if the size of the object is 2 | ||
| }; |
There was a problem hiding this comment.
could we add a 2B key-value pair in the map unit test to exercise this code path? One test case is sufficient.
@ksapru All the CI environment is shared via devcontainers. You could use them to repro the issue. |
@ksapru , yes, @PointKernel 's suggestion is the right way to go. Note that if CI fails, the associated test/build log will tell you at the end how to spin up the corresponding devcontainer environment to reproduce the bug. If you don't have access to a GPU system, you can still use the base images of the devcontainers (e.g., this image) directly to at least compile the code locally by running the |
| cuda::proclaim_return_type<Key>(build_fn{})); | ||
|
|
||
| set.insert_async(keys_begin, keys_begin + num_keys); | ||
| CUCO_CUDA_TRY(cudaDeviceSynchronize()); |
There was a problem hiding this comment.
It shouldn't be required since it is running on the default stream IIUC. It's probably a red herring trying to fix open bug #804.
| (uint8_t, cuco::test::probe_sequence::double_hashing, 1), | ||
| (uint8_t, cuco::test::probe_sequence::linear_probing, 1), | ||
| (uint16_t, cuco::test::probe_sequence::double_hashing, 1), | ||
| (uint16_t, cuco::test::probe_sequence::double_hashing, 2), | ||
| (uint16_t, cuco::test::probe_sequence::linear_probing, 1), | ||
| (uint16_t, cuco::test::probe_sequence::linear_probing, 2), |
There was a problem hiding this comment.
I wonder if using singed integers (as in the rest of the tests) would make sense for consistency. Plus, it would probably also catch some unintended overflow/wrap-around errors. Just a suggestion though - no hard requirement.
Extends the
static_setunit-test type matrices to cover sub-4-byte key types (uint8_tanduint16_t), as requested in the good first issue.Changes:
uint8_tanduint16_tto theTEMPLATE_TEST_CASE_SIGtype axes in all 8 templatedstatic_settest filesnum_keysguard (sizeof(Key) == 1 ? 100 : 400) to keep key sequences within each type's representable range, leaving room for the0xFF/0xFFFFempty sentinelgold_capacityvalues inunique_sequence_testandretrieve_all_testto be conditional onnum_keysfor_each_testto match the reduced key count foruint8_tThe underlying implementation (
packed_casviacuda::atomic_ref) already supports 1B and 2B atomic operations — these tests verify that correctness holds end-to-end for small key types.Addresses #805