feat: support for app model by vsoch · Pull Request #3 · converged-computing/resource-secretary

vsoch · 2026-04-13T06:41:06Z

We want to be able to do dispatch experiments, and we need a hardened way to associate very specific granularity of an actual command with a prompt. For example, here are four ways I can ask for a resource manager:

exact: (provide the full, exact command, akin to the baseline or ground truth)
explicit: use "flux run" to do...
verbose "You should use the flux resource manager..."
discovery: "Use the resource manager and max nodes/tasks you discover"

Those examples are prompt styles. The high level research questions might be:

What are the ways a researcher can best ask to run a computational workload?
What styles shouldn't they use? What styles lead to unexpected outcomes?
What styles are comparable to ground truth (running the exact command we intended)

The other variable we have to model is the actual complexity of the command. For example:

simple: "Just run the lmp binary with no flags"
hard "Run lammps with all these customizations, affinity, etc."

Explicitly, asking for just running lammps (no flags) is a much simpler request than asking for CPU affinity. So we have to model that too.

If you put the prompt style (first) together with the command complexity (second), you get a matrix of possible configurations. If you model each piece of the command, it can get large very quickly, but that's OK! As long as we can capture the exact choice and granularity for each dimension (to compare to the baseline), I think we can assess the relative contribution of a style/complexity to an outcome. The outcome can be a figure of merit, wall time, or content of a log (e.g., telling the agent to use -nocite or "remove the citation." This is what I implemented today, and I've run the base cases for.

This is the basis for dispatch experiments. We have baseline runs across the actual lammps commands, and we compare them to different (programmatically and controlled) prompts. We (I hope) will see when things start to fall off, and which strategies / combinations are good. And maybe which specific parts of the request are harder for the agent to handle. Add that to what we modeled for negotiation (e.g., expected vs. actual tool calls, success rate to actually run it, report the correct job id, etc) I think it's a good assessment of answering the question "How well can an agent reliably submit jobs for us?" I've finished the runs for the base cases, and I am doing two sizes (a larger, longer running problem size) and a smaller one because it occurred to me one result could be biased (e.g., if a smaller running time has smaller variation it might falsely look better).

The design of this is really cool in that we model an application akin to a provider, for which we have 51 across real and simulated (this was the SC26 paper). It's cool because most of the discovery and base class I inherit for the app from the BaseProvider. The prompt generation that requires the workload manager (Flux, a provider) comes from the provider class - they are working together! We do a much better job here modeling the prompts, and I would like to make a more hardened prompt generator class under simulation that can be shared between the two, and then design the prompt generation for providers akin to apps. I might want to redo earlier experiments with this new strategy (the generation can be generalized to work with the providers). I'm really liking (and enjoying working on) this library.

I usually like to test the experiment setup and tweak details before full runs, so I should be able to run the experiments this week. The orchestration is complete, and the cluster setup with the mcp-server fully working. I added a new dual mode as a lazy man's way of saying "come up as a hub and a worker" just for testing.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

This is a different use case than flux-mcp, and arguably we should not be exposing extra information about the cluster. Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

And we can do this with other params in the future Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

to be clear, this does not influence the experiment execution, it is just that we are missing the cpu affinity in the ground truth (but it is always there. Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

feat: support for app model

5a3e8fb

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from b442784 to 5a3e8fb Compare April 13, 2026 17:07

nit: better handle argument types for provider

d9df9fb

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch 2 times, most recently from 2a81be3 to 62bc9bf Compare April 13, 2026 18:33

submit: require to verify running without error

72dde35

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from 62bc9bf to 72dde35 Compare April 13, 2026 19:31

submit: expose cpu and gpu affinity

b408315

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch 4 times, most recently from 4ced393 to 01700e9 Compare April 14, 2026 04:53

secretary: needs to parse types of tool calls

17d0cf6

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch 2 times, most recently from fd07c70 to c41618d Compare April 14, 2026 17:00

nit: do not have uri as an argument

5f5cd54

This is a different use case than flux-mcp, and arguably we should not be exposing extra information about the cluster. Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch 2 times, most recently from 7e5b2aa to 462d998 Compare April 14, 2026 17:40

test: more robust arg parsing

6ac9037

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from 462d998 to 6ac9037 Compare April 14, 2026 17:45

vsoch added 5 commits April 14, 2026 11:08

nit: an "error" when the job is inactive is not an error

e031df2

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

feat: software provider tool to ask a binary for help (needs testing)

5ef8b5d

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

feat: add expectation to submit prompt

c26c695

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

bug: if we miss the first line of output, will keep going

ec50ad7

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

feat: allow setting max attempts for secretary from environment

afef078

And we can do this with other params in the future Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from f6d62fc to afef078 Compare April 16, 2026 03:28

vsoch added 3 commits April 15, 2026 20:32

debug

151174a

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

bug: ensure we have flag added to command

18d8a35

to be clear, this does not influence the experiment execution, it is just that we are missing the cpu affinity in the ground truth (but it is always there. Signed-off-by: vsoch <vsoch@users.noreply.github.com>

test: tell agent not to remove user specified variables

af772c0

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from 555e153 to f857f9f Compare April 16, 2026 20:52

vsoch force-pushed the dispatch branch 5 times, most recently from bd0011d to 8250aa0 Compare April 16, 2026 21:41

feat: add support for lammps problem size validator

7ea230d

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch force-pushed the dispatch branch from 8250aa0 to 7ea230d Compare April 16, 2026 21:46

vsoch added 2 commits April 17, 2026 20:38

test: better response to message about env for flux submit

ca4ac9b

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

resource secretary missing provider self

2a28c1e

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support for app model#3

feat: support for app model#3
vsoch wants to merge 18 commits intomainfrom
dispatch

vsoch commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vsoch commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vsoch commented Apr 13, 2026 •

edited

Loading