This repository contains a collection of debug use case examples for SST. These are small, artificial examples illustrating situations that might occur in an SST simulation where a debugger could be used to detect or analyze behavior. They are simple SST models with small topologies. Some examples demonstrate debugger features available today, but other cases might serve to inspire possible new debugger features or companion tools.
For each debug story we include a "use case report": a short write-up that explains the scenario, what behavior to observe, and how the SST debugger can be used to investigate it. Many reports also include thoughts and wishlist items for debugger improvements. The stories and links to their reports are in the story status table. For cross-cutting debugger ideas that come up across multiple stories, see the wish list document. This document also includes a catalog of wishlist items mentioned in individual stories.
- All stories are launched from a single SST simulation configuration script,
runStory.py, which is passed the name of the particular story to run. Valid story names that can be passed to this are listed in the first column of the story status table. - This repository is still a work in progress. All of the use cases listed below are implemented, and our current effort is focused on hand-verifying each case and evaluating how it could currently be addressed using the SST debugger.
- All stories are built around a single SST component named
Node(implemented inNode.cppandNode.h) and use a unified simulation configuration file,runStory.py.
From this directory:
-
Build and run in one step:
./doit <storyName> -
Or run manually:
make clean && makesst --interactive-stop ./runStory.py <storyName>
Where <storyName> is any valid story name from the story descriptions section.
This table lists the use case stories included in this repository and overviews their status. To see a short description of each story see the story descriptions section.
All stories have been implemented, so we're now focused on ensuring that they have been implemented properly and writing "use case reports" for each.
In the "Verified?" column, we indicate whether it has been hand-verified (indicated with ✅, ❌, or ❓; ❌ indicates that something is wrong and ❓ indicates that although I've manually read the code and believe it to be correct I don't know of an easy way to verify that it's working as intended today).
In the "use case report" column I use ♦ symbols to indicate how "mature" I believe the report is. You can view one diamond as indicating that the report includes an example script of how to use the sst debugger to address the case but I haven't yet thought deeply about how effective it is. Two diamonds has more content and some thoughts on wishlist items for the SST debugger. Three diamonds indicates that I view the content as being "complete".
| Story | Verified? | Use Case Report | Notes |
|---|---|---|---|
| Event Tracing | |||
| wrongPath | ✅ | ♦♦ | works in debugger but requires advanced topology knowledge and the event to set a side effect on components |
| infiniteLoop | ✅ | ♦ | |
| unexpectedDisappear | ✅ | ♦ | |
| missedDeadline | ✅ | ♦ | |
| outOfOrderReceipt | ✅ | ♦ | |
| duplicateSepTimes | ✅ | ♦ | |
| duplicateSameTime | ✅ | ♦ | |
| Event Processing | |||
| broadcastStorm | ✅ | ♦ | |
| badMerge | ✅ | ♦ | |
| Incorrect Topology | |||
| missingLink | ✅ | ♦ | |
| wrongLink | ✅ | ♦ | |
| unexpectedDuplicateLink | ✅ | ♦ | |
| Deadlock | |||
| directDeadlock | ❓ | ♦ | |
| indirectDeadlock | ❓ | ♦ | |
| Fault Detection And Attribution | |||
| detectWhenComponentBecomesInvalid | ✅ | ♦ | |
| badInvariantBetweenComponents | ✅ | ♦ | |
| componentsLoseParity | ✅ | ♦ | |
| divergedModels | ✅ | ♦ | |
| componentCausesSegfault | ✅ | ♦ | |
| badInitialState | ✅ | ♦ | |
| badTerminatingState | ✅ | ♦ | |
| findFirstToComplete | ❓ | ♦ | |
| determineWhatNotComplete | ❓ | ♦ | |
| Load Imbalances | |||
| findEventHeavyComponent | ✅ | ♦ | |
| findSlowProcessingComponent | ❓ | ♦ | |
| findMemHeavyComponent | ❓ | ♦ | |
| findMemHeavyEvent | ❓ | ♦ | |
| findStarvedComponent | ✅ | ♦ |
| Category | Stories |
|---|---|
| Event Tracing | wrongPath, infiniteLoop, unexpectedDisappear, missedDeadline, outOfOrderReceipt, duplicateSepTimes, duplicateSameTime |
| Event Processing | broadcastStorm, badMerge |
| Incorrect Topology | missingLink, wrongLink, unexpectedDuplicateLink |
| Deadlock | directDeadlock, indirectDeadlock |
| Fault Detection And Attribution | detectWhenComponentBecomesInvalid, badInvariantBetweenComponents, componentsLoseParity, diverged models: divergedModels_A and divergedModels_B, componentCausesSegfault, badInitialState, badTerminatingState, findFirstToComplete, determineWhatNotComplete |
| Load Imbalances | findEventHeavyComponent, findSlowProcessingComponent, findMemHeavyComponent, findMemHeavyEvent, findStarvedComponent |
-
-
-
An event propagates throughout the model, its intended path is A -> B -> C, but B misroutes the event to D instead.
-
-
An event is supposed to move onward to D, but A, B, and C keep forwarding it in a cycle, creating an infinite loop.
-
-
The intended path is A -> B -> C -> D, but the event vanishes at C because it is never forwarded onward.
-
-
D is expected to receive an event by a target time, but the A -> B -> C -> D path uses enough link latency that arrival is late; the goal is to locate which link is causing the slowdown.
-
-
E is intended to see
ev1beforeev2, but two events launched on different branches arrive in the opposite order because C starts at3nswhile A starts at5ns(with all links at1ns).
-
-
D is expected to receive a given event once, but A injects it at setup and again on later ticks, so repeated deliveries occur at different times.
-
-
B is expected to receive a given event once, but A injects it twice at setup.
-
-
-
-
-
An event is broadcast too broadly from A to all six neighbors at startup.
-
-
C receives values from A and B and should merge them correctly, but it multiplies
10 * 2instead of performing the intended add-style merge before sending the result to D.
-
-
-
-
-
The intended topology includes a B <-> C connection, but that link is absent.
-
-
The intended topology is A -> B, but A is connected to C instead.
-
-
A and B are linked twice instead of once.
-
-
-
-
-
A waits for an event from B while B waits for an event from A, so neither side ever makes progress.
-
-
This is the same wait cycle as direct deadlock, but with B sitting between A and C as a relay, so the blocked endpoints are separated by an intermediate component.
-
-
-
-
-
A starts valid and then flips its
validflag to false on a 40ns clock tick, modeling a component whose state becomes invalid during execution.
-
-
A cross-component invariant is supposed to hold, but C follows a different update rule when it receives certain values, breaking the invariant.
-
-
A and B are expected to stay in matching state over time, but their scripted values diverge at cycle 40 when they become 5 and 7.
-
-
This pair of stories represent separate models that are intended to retain parity with each other throughout execution, but at timestamp 40,
divergedModels_Auses value 5 whiledivergedModels_Buses value 7.
-
-
Component C asserts once its clock reaches cycle 50 or later. The goal is to identify which component is responsible for the segfault and at what point in time the segfault occurs.
-
-
Four unconnected components are intended to initialize to the same state, but C starts with a different value than the others.
-
-
Similar to
badInitialState, but the issue is that C changes to a different value before the simulation terminates. The goal is to identify which component has the bad value just prior to termination.
-
-
The goal is to determine which component finishes first; the completion order is D first, then B, then C, then A.
-
-
The goal is to find components that never mark complete when the simulation ought to be done; here A, D, and E finish, while B and C never do.
-
-
-
-
-
The goal is to identify which component processes the most events; in this four-node ring each component sends to its neighbor to the right.
-
-
One component should be noticeably slower at processing than the others; all nodes send one event at startup to their right neighbor, but the event received by B takes much longer to process.
-
-
The goal is to spot a component with unusually high memory usage; four unconnected components allocate different local buffer sizes, with B holding by far the largest payload.
-
-
The goal is to spot an unusually large event; each node in a ring sends one rightward event with a payload buffer, and one of those messages is much larger than the others.
-
-
The intended pattern is that all components should receive work, but one does not; in the current ring with uneven send quotas, C receives no events while the others do.
-
-
- Add the story name to
NODE_STORY_LISTinNode.cpp. - Add
setup_<story>andhandleEvent_<story>inNode.handNode.cpp. - Add the story string to
VALID_STORIESinrunStory.py. - Add a
story_<story>()function inrunStory.py.
Older standalone cases are stored in:
old/infiniteLoopTest/old/loadImbalance/



























