Session 3 Summary

Overview
Data Structures and Matrix Formats
- Matrix Format Variants
Hyper Hash Data Structure
- Hyper Hash Concept and Motivation
- Hyper Hash Implementation
Hyper Hash Heuristics and Management
- When to Build Hyper Hash
- Hyper Hash and User API
Matrix Format Conversions
- Conform and Convert Mechanisms
- Hysteresis and Format Stability
Implications for Dense Operations
- Dense Vector Results
Matrix Multiply Variants
- Saxpy Methods Overview
Next Steps and JIT Discussion
- Transition to JIT Focus
Session Conclusion

Overview

This session focused on SuiteSparse GraphBLAS data structures, specifically exploring the hyper hash implementation and data format conversions. The discussion covered algorithmic variants, performance optimizations, and the design philosophy behind GraphBLAS's multiple data formats.

Data Structures and Matrix Formats

Matrix Format Variants [00:00:08 - 00:09:00]

Discussion: Dr. Davis explained the core data structures in SuiteSparse GraphBLAS, emphasizing the complexity introduced by multiple matrix formats and parallelism strategies.

Key Points:

Four fundamental matrix formats: full, bitmap, sparse, and hyper-sparse
Four parallelism variants for mxm operations (in saxpy-3 method): coarse Gustavson, fine Gustavson, coarse hash, fine hash
The Saxpy-3 method implements Gustavson's algorithm modified for sparse matrix multiplication
Different variants handle workspace management differently: coarse grain gives each thread its own workspace, fine grain uses atomics for shared access

Key Quote:

"How many matrix multiplies do I have? I don't even know how to count that high. Do you count these as different algorithms? They're kind of different algorithms. Do you have the mask present? Is the mask complemented or not? Yeah, it's a totally different algorithm now." [00:07:38]

Technical Details:

Template-based factory system handles variations at compile time without JIT for data-type-independent operations
Extensive debug scaffolding code validates parallel task execution
Each variant combination produces different factory methods (e.g., saxpy3_sim files)

Hyper Hash Data Structure

Hyper Hash Concept and Motivation [00:12:06 - 00:26:06]

Discussion: Introduction to the hyper hash data structure, which accelerates lookups in hyper-sparse matrices by inverting the hyper list.

Key Points:

Hyper list maps K (position) → J (vector name): J = A.h[K]
Hyper hash inverts this: given J, find K (or -1 if not present)
Traditional approach used binary search across the hyper list
Binary search is expensive on CUDA and for large hyper lists

Key Quote:

"I came up with this idea of well, gee, I could build a hash table, but you see, it's a static hash table. There's no inserts or deletions into this hash table because the hashing is just across the vectors that are present in the matrix." [00:17:39]

Design Decision:

Hyper hash is a sparse CSC matrix where:
- Rows represent J values (vector dimension)
- Columns represent hash buckets (64 in the example)
- Values are pairs of (J, K)
Load factor kept at ≤25% to minimize collisions
Hash table size always power of 2 for efficient modulo operations

Hyper Hash Implementation [00:26:06 - 00:35:20]

Discussion: Technical implementation details of the hyper hash structure as a GrB_Matrix object.

Key Points:

Hyper hash stored as a GrB_Matrix Y inside the matrix structure
Y is always sparse (never hyper-sparse itself)
Hash collisions handled by searching short bucket lists
If bucket has >256 entries, falls back to binary search within bucket

Key Quote:

"I have a very, very fast, very stupid hash function, frankly, because it's just supposed to be super fast. Yeah, there could be collisions. I make the hash table size a power of 2 so I can do bit-and operations to mask out the modular thing to know which hash bucket I'm in." [00:24:16]

Lookup Algorithm:

Compute hash function: F(J) to get bucket number
Access column F(J) of hyper hash matrix Y
Search bucket for J (linear or binary search depending on size)
Return K value from (J,K) pair if found, else -1

Code Location: [00:27:20 - 00:32:00]

Build: builder/GB_hyper_hash_build.c
Lookup: hyper/GB_hyper_hash_lookup (static inline function)
Free: matrix/ folder

Hyper Hash Heuristics and Management

When to Build Hyper Hash [00:39:00 - 00:49:00]

Discussion: Decision logic for when to construct a hyper hash versus using simple binary search.

Key Points:

Default threshold: 1024 non-empty vectors (configurable via GrB_get/set)
Hyper hash only built when needed by specific algorithms
Not built during matrix construction, only when operations require it
If matrix not hyper-sparse, no hyper hash needed

Algorithms that use hyper hash: [00:49:00 - 00:54:00]

Assign operations (zombie assignments)
Some add operations
Element-wise multiply (emult algorithm 8)
Submatrix operations
Saxpy-3 method (for M and A matrices, not B)
Dot product methods
Extract operations

Key Quote:

"The hyper hash build might decide, yes, that's a hyper-sparse matrix. You could in theory build a hyper hash for it. But why bother? You've only got 10 vectors in there. Just do the binary search across the 10, and you're fine." [00:48:05]

Performance Trade-offs:

Build time versus lookup time
For small vectors (<1024), binary search sufficiently fast
Hysteresis prevents thrashing when vector count oscillates near threshold

Hyper Hash and User API [00:55:00 - 01:00:00]

Discussion: How hyper hash interacts with pack/unpack functionality and API design.

NOTE: the pack/unpack methods were the primary methods for moving data in and out of a GrB_Matrix in O(1) time and space in GraphBLAS v9. Since this video was taken, I have since added a more flexible set of methods: load/unload into/from a GxB_Container. For that method, the hyper-hash is loaded/unloaded as its own GrB_Matrix.

Key Points:

Pack/unpack provides non-opaque access to matrix internals
Separate unpack_hyper_hash function added to preserve hyper hash across pack/unpack
Original design would delete hyper hash on unpack, requiring expensive rebuild
Decision made to add separate function rather than break API compatibility

Key Quote:

"I added a unpack hyper hash. If you're really desperate to preserve my hyper-sparse matrix object, you can unpack as a hyper-sparse matrix and also unpack its hyper hash, keep it, fiddle with it, and then put it back nicely, and then I've got back to where we started." [00:56:02]

API Design Consideration:

Could have broken backward compatibility with extended unpack parameters
Instead chose two separate functions for cleaner API
Pack/unpack somewhat experimental, difficult to standardize in spec

Matrix Format Conversions

Conform and Convert Mechanisms [01:02:00 - 01:09:00]

Discussion: How GraphBLAS automatically selects and converts between matrix formats.

Key Points:

"Conform" = heuristic decision about which format to use
Matrix format chosen dynamically after computation completes
16 possible format combinations (4 formats, each can be enabled/disabled)
Default: all formats enabled, automatic selection

Conversion Examples:

Sparse → Full: Just free integer arrays if all entries present
Full → Sparse: Add integer arrays, data stays in place
Full → Bitmap: Create bitmap of all true values, no data movement
Sparse → Bitmap: Requires moving data (needs templating for different types)

Key Quote:

"GraphBLAS must always respect the structural computation of its result. You always have to say it's not there or it's there, and you get the right answer." [01:03:36]

Hysteresis and Format Stability [01:04:41 - 01:05:09]

Discussion: How GraphBLAS prevents format thrashing.

Key Points:

Hysteresis region prevents constant conversion at format boundaries
If matrix near boundary (e.g., sparse vs hyper-sparse), keep current format
Only convert when clearly in one regime or another
User can constrain formats via GrB_set (though "full only" becomes "bitmap or full")

Implications for Dense Operations

Dense Vector Results [01:05:09 - 01:11:00]

Discussion: Why GraphBLAS sometimes produces sparse results even with dense inputs.

Key Points:

Y = A*X where A is sparse, X is dense does NOT guarantee dense Y
If A has empty rows, Y must have missing entries (cannot fake with zeros)
GraphBLAS builds output in sparse/bitmap form, then checks if full during conform
Converting sparse to full only requires freeing integer arrays (no data movement)

Performance Impact:

Extra overhead computing indices that may be discarded
Necessary for semantic correctness
Optimized path: use accumulator to ensure no deletions

Key Quote:

"What if it happens that you have no empty rows? Well, I have to compute that fact. I'll produce an output Y, and then I'm like, 'Oh look, vector Y, you're full.' I convert it to full by just freeing its integers." [01:05:56]

Workaround for Dense Output:

Use accumulator: Y += A*X where Y starts dense
With accumulator, no deletions possible
Allows use of specialized dense kernels (saxpy4, saxv5)
Faster execution, direct writes to output

Matrix Multiply Variants

Saxpy Methods Overview [01:12:00 - 01:13:05]

Discussion: Different matrix multiply algorithms in GraphBLAS.

Key Points:

Two major families: dot-product style and Saxpy style
Saxpy-3: Main sparse × sparse algorithm, uses Gustavson method
Saxpy-4: C += A*B where C is bitmap/full
Saxpy-5: C += A*B where A is bitmap/full, B is sparse
Saxpy-1 and Saxpy-2 deleted (old algorithms)
Dot-1 deleted, Dot-2 and Dot-3 remain
XP Dot: Additional dot product methods

Complexity:

Saxpy-3 is "one of the more complex matrix multiply methods"
Handles all combinations of sparse/hyper-sparse inputs
Different parallel strategies mixed in single operation

Next Steps and JIT Discussion

Transition to JIT Focus [01:13:05 - 01:20:00]

Discussion: Planning next session's topics and identifying JIT as critical foundation.

Key Points:

Matrix multiply too complex without understanding JIT first
JIT system shared between CPU (OpenMP) and CUDA
63-bit enumeration space for mxm variants (8 bits just for matrix formats)
Fast enumeration critical for hash table lookup of compiled kernels
Template system uses pound-define macros and pound-include

JIT Components:

Enumify: Fast encoding of kernel variants
Codify: Generate source code
Macrofy: Build macro definitions
Almost identical between CUDA and CPU JIT
Functions all have "fy" suffix in jitifer folder

Compilation Flow:

Enumerate kernel signature (very fast)
Check hash table for existing kernel
If not found: generate source, compile, load, cache
If found: retrieve function pointer (fast path)

Session Conclusion

Topics Covered:

Hyper hash data structure and implementation
Format conversion mechanisms
Pack/unpack API considerations
Performance heuristics and trade-offs
Matrix multiply algorithm variants

Next Session Plan: [01:19:25]

Deep dive into JIT system
Factory kernels vs JIT kernels
Macro system and template mechanisms
Enumeration and code generation
CUDA JIT integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session 3 Summary

Table of Contents

Overview

Data Structures and Matrix Formats

Matrix Format Variants [00:00:08 - 00:09:00]

Hyper Hash Data Structure

Hyper Hash Concept and Motivation [00:12:06 - 00:26:06]

Hyper Hash Implementation [00:26:06 - 00:35:20]

Hyper Hash Heuristics and Management

When to Build Hyper Hash [00:39:00 - 00:49:00]

Hyper Hash and User API [00:55:00 - 01:00:00]

Matrix Format Conversions

Conform and Convert Mechanisms [01:02:00 - 01:09:00]

Hysteresis and Format Stability [01:04:41 - 01:05:09]

Implications for Dense Operations

Dense Vector Results [01:05:09 - 01:11:00]

Matrix Multiply Variants

Saxpy Methods Overview [01:12:00 - 01:13:05]

Next Steps and JIT Discussion

Transition to JIT Focus [01:13:05 - 01:20:00]

Session Conclusion

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Session 3 Summary

Table of Contents

Overview

Data Structures and Matrix Formats

Matrix Format Variants [00:00:08 - 00:09:00]

Hyper Hash Data Structure

Hyper Hash Concept and Motivation [00:12:06 - 00:26:06]

Hyper Hash Implementation [00:26:06 - 00:35:20]

Hyper Hash Heuristics and Management

When to Build Hyper Hash [00:39:00 - 00:49:00]

Hyper Hash and User API [00:55:00 - 01:00:00]

Matrix Format Conversions

Conform and Convert Mechanisms [01:02:00 - 01:09:00]

Hysteresis and Format Stability [01:04:41 - 01:05:09]

Implications for Dense Operations

Dense Vector Results [01:05:09 - 01:11:00]

Matrix Multiply Variants

Saxpy Methods Overview [01:12:00 - 01:13:05]

Next Steps and JIT Discussion

Transition to JIT Focus [01:13:05 - 01:20:00]

Session Conclusion