Skip to content

Commit d02f41b

Browse files
committed
[feat] Support cross-job actor discovery via explicit namespace
When multiple Ray Jobs share the same Ray cluster, Named Actors are isolated by namespace. Without an explicit namespace, a TQ Controller created by one job is invisible to workers in another job. This commit adds namespace="transfer_queue" to both: - ray.get_actor() in _init_from_existing() - TransferQueueController.options() in init() This ensures that the TQ Controller is always registered and discovered in the fixed "transfer_queue" namespace, enabling cross-job TQ sharing (e.g., a teacher server job creates TQ, and a trainer job connects to it). This change is backward-compatible: single-job usage is unaffected since the namespace is consistent between creation and discovery.
1 parent 9aefd26 commit d02f41b

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

transfer_queue/interface.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def _init_from_existing() -> bool:
9393
global _TQ_CONTROLLER
9494
try:
9595
if _TQ_CONTROLLER is None:
96-
_TQ_CONTROLLER = ray.get_actor("TransferQueueController")
96+
_TQ_CONTROLLER = ray.get_actor("TransferQueueController", namespace="transfer_queue")
9797

9898
except ValueError:
9999
logger.info("Called _init_from_existing() but TransferQueueController has not been initialized yet.")
@@ -174,7 +174,7 @@ def init(conf: DictConfig | None = None) -> DictConfig | None:
174174

175175
try:
176176
global _TQ_CONTROLLER
177-
_TQ_CONTROLLER = TransferQueueController.options(name="TransferQueueController").remote( # type: ignore[attr-defined]
177+
_TQ_CONTROLLER = TransferQueueController.options(name="TransferQueueController", namespace="transfer_queue").remote( # type: ignore[attr-defined]
178178
sampler=sampler, polling_mode=final_conf.controller.polling_mode
179179
)
180180
logger.info("TransferQueueController has been created.")

0 commit comments

Comments
 (0)