Skip to content

Record Thor manager pod info early; preserve connected worker pod names on registration timeout#21155

Draft
Copilot wants to merge 2 commits intocandidate-10.2.xfrom
copilot/record-thor-manager-early
Draft

Record Thor manager pod info early; preserve connected worker pod names on registration timeout#21155
Copilot wants to merge 2 commits intocandidate-10.2.xfrom
copilot/record-thor-manager-early

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 26, 2026

In containerized Thor, pod names (manager + workers) were only published to the workunit after all workers had fully registered. A registration timeout caused no pod info to be recorded at all.

Changes

thgraphmanager.hpp / thgraphmanager.cpp

  • Split publishPodNames() into two focused functions:
    • publishManagerPodName(workunit, graphName) — records the Thor manager pod immediately; also associates the graph name on subsequent calls (replacing the thorMain use of publishPodNames)
    • publishWorkerPodNames(workunit, workers) — records worker pod names and stores them in the static connectedWorkers for future graph-name associations

thmastermain.cppCRegistryServer::connect()

  • Call publishManagerPodName before the worker registration wait loop
  • Wrap the registration while loop in a try/catch so that whatever workers did connect are passed to publishWorkerPodNames regardless of a timeout exception
  • StWhenK8sReady timestamp is only recorded on full success (no change in semantics)
// Before: single publish after all workers registered — nothing recorded on timeout
publishPodNames(workunit, graphName, &connectedWorkers);

// After: manager recorded immediately; workers recorded even on partial registration
publishManagerPodName(workunit, graphName);          // early, before registration loop
// ... registration loop wrapped in try/catch ...
publishWorkerPodNames(workunit, connectedWorkers);   // whatever connected, always recorded

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • downloads.apache.org
    • Triggering command: /usr/bin/curl curl --fail --retry 3 -L REDACTED --create-dirs --output apr-1.7.6.tar.bz2.7502.part -H User-Agent: vcpkg/2025-09-03-4580816534ed8fd9634ac83d46471440edd82dfe (curl) as nwin�� r-strong conftest.o /home/REDACTED/wor.deps/antlr3debughandlers.Tpo h include/antlr3commontoken.h include/antlr3commontree.h include/antlr3commontreeadaptor.h incl --local /opt/pipx_bin/gi. as (dns block)
    • Triggering command: /usr/bin/curl curl --fail --retry 3 -L REDACTED --create-dirs --output apr-1.7.6.tar.bz2.8220.part -H User-Agent: vcpkg/2025-09-03-4580816534ed8fd9634ac83d46471440edd82dfe (curl) rm -f core conftest.err -linux-x86_64/bin/cmake conftest.c -c .o es -D_VCPKG_INS (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Copilot AI changed the title [WIP] Update pod name publishing to include early thor manager info Record Thor manager pod info early; preserve connected worker pod names on registration timeout Mar 26, 2026
Copilot AI requested a review from jakesmith March 26, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants