Add multi host and multi user support to the process manager#941
Conversation
e4cfb16 to
d452d58
Compare
|
so most things work. things to do now is to ensure that the -pmaas config actually works independently in the daqsystemtest repo and then later we can investigate if everything works for both ehn1 and local, using /log and not /log well the pmaas thing worked a treat |
c1bc926 to
3c980ac
Compare
|
Hello hello @PawelPlesniak this is the big boi now. Ready for review to get the multi host multi user support in! |
| # Remaps the localhost into the correct server | ||
| # and also grabs the correct port from the right environment from the config | ||
|
|
||
| connection_server_config = ( |
There was a problem hiding this comment.
This is the host name not the config, rename this var to be clearer.
| if item.className() == "Variable": | ||
| if item.name == "CONNECTION_PORT": | ||
| item.value = new_port | ||
| def change_localhost(self, session_dal): |
There was a problem hiding this comment.
Call this resolve_localhost for clarity
There was a problem hiding this comment.
There is also a resolve_localhost_to_hostname function, could these work in parallel? Seems like there are multiple functions to achieve similar functionality.
There was a problem hiding this comment.
resolved_localhost does use the resolve_localhost_to_hostname function. We simply wrap it in dal_localhost_mapping so it can get used in changing the dal (as defined in the resolve_localhost function cuz I think it needs a little bit of additional processing.
There is a case to be had to basically upgrade the existing resolve_localhost_to_hostname so that it now uses the dal_localhost_mapping code, but I haven't investigated the use of resolve_localhost_to_hostname elsewhere in the codebase and I'd like to not make any changes to resolve_localhost_to_hostname for the time being.
| if item.className() == "Variable": | ||
| if item.name == "CONNECTION_PORT": | ||
| item.value = new_port | ||
| def change_localhost(self, session_dal): |
There was a problem hiding this comment.
There is also a resolve_localhost_to_hostname function, could these work in parallel? Seems like there are multiple functions to achieve similar functionality.
|
Sorry for the split of revised changes to comments, that is an error on my behalf |
|
Other notes
This should be docstringed, with a statement that the run control will validate this in the sesssion manager in the future.
What was the motivation for this? |
|
Once these are all done, I will test with |
What do you mean by docstrings? I will add some comments on the code saying that the run control will validate this in the session manager in the future, but the blocking behaviour is pretty obvious from the code itself.
During the localhost mapping changes, I noticed that if the ports arent mapped properly drunc kinda just gets stuck until a 2 minute or so timeout engages. Having this extra check immediately checks if the controller can actually connect to the connectivity service in less than 10 seconds, and will fail quickly if said connectivity service cant be reached for whatever reason. Can't think of a good reason to not have that (even after I hopefully fixed the LCS so I've kept it in) |
|
Hi @PawelPlesniak, I think I've addressed most of your comments, either with a code change or a reply on the thread. Can you go over it and let me know what you think? |
This is sufficiently clear. I meant to include a note, this absolutely should exist for now, but in the future we will want to move this to the session manager.
Thanks |
|
Pytest failures are associated with changes in develop which are not accounted for in the working branch. As this is going into an intermediary branch, which will be used for development and testing prior to merging, this failure will be ignored for now and addressed once the branch that goes into develop will be ready. The same is true for the integration tests. |
|
This works as currently intended. Testing was completed as follows
Testing with The location for running the process manager is also dependent, as expeced (permissions) Follow up notes
Validated
Validated
Validated
Validated
Validated
Validated
Validated
Validated
Validated
Validated
It is not, but thats a later problem ;)
Validated Nice work! |
9a8d141
into
PawelPlesniakEmuhammad/SplitShellFixing
Add multi host and multi user support to the process manager
Add multi host and multi user support to the process manager
Add multi host and multi user support to the process manager
Description
Fixes #926
Fixes #937
Blocks boot behaviour in unified shell if another session (with the same name) is already booted
Asks for confirmation when booting in the process manager shell if user tries to spawn a session with the same name as an existing session
Multi host support works by mapping localhost instances to an actual host
Multi user support works
extra connectivity service is ready check in the controller
new touch_and_chmod behaviour to enable correct permissions for read and write
deletes the logfile in case there was a previous iteration of it at boot now
checks for working directory permissions before running
BIG NOTE PERMISSIONS SET TO RWX FOR EVERYONE (777)!!! THIS MUST BE ACCEPTABLE
all this works for localhost, ehn1, and the localhost-pmaas config (where logs go to /logs)
Note about multi user support
It works!
However it has to run at a very specific level of permissions. To recap:
Because of the above,
Type of change
List of required branches from other repositories
DUNE-DAQ/daqsystemtest#322
Suggested manual testing checklist
built off 260603 nightly
Some relevant commands include
The list of steps (vague, sorry) are
Developer checklist
Prior to marking this as "Ready for Review"
Tests ran on: NP04 - 028 from release 03 June 2026 nightly
Unit tests - some tests can't be ran on the CI. This is documented. If this PR checks a feature that can't be tested with CI, this has been marked appropriately.
Integration tests - the
daqsystemtest_integtest_bundlerequires a lot of resources, and connections to the EHN1 infrastructure. Check the cross referenced list if you can't run these. The developer needs to run at least the .pytest --marker) passeddaqsystemtest_integtest_bundle.sh -k minimal_system_quick_test.pydaqsystemtest_integtest_bundle.sh./scripts/drunc_integtest_bundle.sh)Final checklist prior to marking this as "Ready for Review"
Reviewer checklist
src/daqsystemtest_integtest_bundle.sh -k minimal_system_quick_test.pyifdaqsystemtest_integtest_bundle.shdruncare in the log filesdruncfailure appears:scripts/drunc_integtest_bundle.sh)Once the above boxes are checked, the PR(s) can be merged following the steps below.
Choose one of the following an complete all substepsPrior to merging
Once completed, the reviewer can merge the PR.
Notification message for a Slack channel
Note - this should be to #dunedaq-integration for general workflow that isn't during a release candidate period, and to #daq-release-prep otherwise.
For an single merge that changes the user workflow
For co-ordinated merge