Bug/220 spice makes bsk sims not thread safe#994
Conversation
4a1af4d to
aeed048
Compare
8f6bf30 to
a9ef12b
Compare
schaubh
left a comment
There was a problem hiding this comment.
nice work! All tests passed on my system. Just some minor comments to address
|
Nice work, looks like we are there. @juan-g-bonilla , as you commented on the original issue with the suggestion to use a mutex, would you have time to take a look at this code? |
There was a problem hiding this comment.
Nicely done @Natsoulas ! Many of my comments are actually encouraging you to fix existing issues or even the original fix I proposed so we can get this class to its highest quality possible.
Thanks for the tag @schaubh .
a9ef12b to
6a063cf
Compare
|
@juan-g-bonilla |
13283c0 to
70d1575
Compare
schaubh
left a comment
There was a problem hiding this comment.
Some quick comments to address.
juan-g-bonilla
left a comment
There was a problem hiding this comment.
Left some minor suggestions + a warning about an edge cases when unloading happens before loading. Would be great to address those, but otherwise if you address Dr Shaub's comments LGTM.
70d1575 to
ae2c037
Compare
schaubh
left a comment
There was a problem hiding this comment.
good to go. Thanks for addressing this long-standing issue.
|
This PR is leading to errors in some of my unit tests in BSK-RL, but I believe that is no longer working. I'm not sure if it's newly trying to unload files that weren't explicitly loaded, or if it always tried that but the error is being passed in a way that's no longer caught by my code. The error is triggered when this object (https://github.com/AVSLab/bsk_rl/blob/f70bd259049f2bcdc40c76eb258736968bf98cf2/src/bsk_rl/utils/orbital.py#L294) is created then deleted. |
|
@Natsoulas , please see Mark's recent issue with your branch. Could you please take a look at this? |
|
I'll take a look asap. |
Description
The changes implement thread safety for the SPICE interface to address GitHub issue #220, which reported deadlocks and data corruption when running parallel Monte Carlo simulations.
This solution uses a mutex and reference counting approach to protect SPICE kernel loading and unloading operations (inspired by Juan's suggestion comment on the issue):
kernel_manipulation_mutexin the SpiceInterface classkernel_reference_counterto track kernel usageloadSpiceKernel()to only load kernels once and use reference countingunloadSpiceKernel()to only unload kernels when no longer neededclearKeeper()with the mutexThis ensures that when multiple simulations run in parallel, they won't encounter race conditions or deadlocks when accessing SPICE kernels.
Verification
The changes were verified with a dedicated thread safety unit test:
test_spiceThreadSafety.pyunit test in the SPICE interface's_UnitTestdirectoryThe test specifically reproduces the conditions described in issue #220.
Documentation
Added code comments explaining the thread safety mechanism in:
spiceInterface.hfor the static mutex and reference counterspiceInterface.cppfor the mutex implementation and kernel reference counting logicNo existing documentation was invalidated.
Future work