LLM Foraging Workspace (ROS 2 + Gazebo Classic)

This repository contains an active ROS 2 + Gazebo Classic reimplementation of CPFA-style multi-robot foraging, migrated from the original ARGoS/C++ codebase.

Current focus:

reliable multi-robot spawning
food detection, pickup, and deposit pipeline
CPFA controller integration (site fidelity, pheromones, uninformed search)
Gazebo visualization plugins (pheromone trails and robot status LEDs)

The project is under active development. Use this README as the source of truth for how to run and extend the code.

1) Repository Layout

Workspace root:

src/llm_foraging: ROS 2 package (nodes, launch files, config, plugins, interfaces)

Inside src/llm_foraging:

llm_foraging/: Python nodes and CPFA logic
launch/: main launch entry points
config/: simulation and controller parameters
plugins/: Gazebo plugins (C++)
worlds/: Gazebo world files
msg/, srv/: custom ROS interfaces

2) Prerequisites

Recommended environment:

Ubuntu 22.04
ROS 2 Humble
Gazebo Classic 11 (gazebo_ros)

Install commonly required packages (example):

sudo apt update
sudo apt install -y \
  ros-humble-gazebo-ros-pkgs \
  ros-humble-turtlebot3 \
  ros-humble-turtlebot3-gazebo \
  python3-colcon-common-extensions

Notes:

This repo uses Gazebo Classic plugins (gazebo_ros_state, custom .so plugins).
package.xml is the authoritative dependency list.

Python version: ROS 2 Humble's rclpy C extension is built for Python 3.10 only. Do NOT run anything under a conda env (3.11+) or the project's .venv; experiment_recorder_node.py and any rclpy import will crash with ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'. Before running launch/batch/sweep commands, make sure which python3 resolves to /usr/bin/python3.10:

# Strip venv and conda from PATH for this shell
export PATH=$(echo "$PATH" | tr ":" "\n" | grep -v "/\.venv/" | grep -v "/anaconda3/" | paste -sd:)
unset VIRTUAL_ENV CONDA_DEFAULT_ENV CONDA_PREFIX CONDA_SHLVL
source /opt/ros/humble/setup.bash
source install/setup.bash

3) Build

From workspace root:

source /opt/ros/humble/setup.bash
colcon build --packages-select llm_foraging --symlink-install
source install/setup.bash

Why --symlink-install:

faster Python iteration
launch/executable updates reflected without full reinstall in many cases

4) Main Run Modes

A) CPFA simulation

ros2 launch llm_foraging cpfa_simulation.launch.py

Starts:

Gazebo world + robots
food_manager_node
food_gazebo_visualizer
pheromone_manager_node
one cpfa_controller per robot namespace

B) Important launch arguments

Used by CPFA and full launches:

foraging_config:=<path> (default: config/foraging_sim.yaml)
robot_config:=<path> (default: config/robots.yaml)
environment_config:=<path> (default: config/foraging_sim.yaml)
max_parallel_spawns:=auto|all|N
use_sim_time:=true|false
headless:=true|false (default false) — skip launching gzclient; recommended for batch/sweep runs

Example:

ros2 launch llm_foraging cpfa_simulation.launch.py max_parallel_spawns:=all headless:=true

C) Experiment Recording (Always-On by Default in CPFA Launch)

cpfa_simulation.launch.py now supports per-run experiment recording:

creates run folder: log/experiments/<run_id>/
writes manifest.yaml with launch/config factors and git hash (best effort)
records rosbag core topics into bag/core*
runs a passive recorder node writing live artifacts to artifacts/
writes summary.json on shutdown and appends log/experiments/index.csv

Launch arguments:

record_experiment:=true|false (default true)
experiment_root_dir:=<path> (default log/experiments)
experiment_tag:=<label> (default empty)
bag_profile:=core (v1 uses core only)

Artifacts written per run:

manifest.yaml
bag/core*
artifacts/food_stats.csv
artifacts/llm_decisions.jsonl
artifacts/cpfa_state.jsonl
summary.json

D) Batch Runs (Multiple Trials)

Use the helper script to run repeated CPFA launches with fixed wall-clock duration per run and collect one consolidated CSV:

python3 scripts/run_batch.py --runs 10 --run-seconds 1200 --tag-prefix llm_hybrid

Common options:

--experiment-root log/experiments
--foraging-config <path>
--robot-config <path>
--environment-config <path>
--max-parallel-spawns auto|all|N
--extra-launch-arg key:=value (repeatable)
--extra-result-col key=value (repeatable) — injects a column into every row of batch_results.csv; used by run_sweep.py to record factor values
--run-indices N[,M,...] — run only these specific replicate indices (1..10) using their corresponding TRIAL_SEEDS entries, instead of the full 1..N range. Useful for surgically re-running a handful of failed replicates with the same seeds as the original.

Example with custom configs:

python3 scripts/run_batch.py \
  --runs 10 \
  --run-seconds 900 \
  --tag-prefix gpt5mini_team6 \
  --foraging-config src/llm_foraging/config/foraging_sim.yaml \
  --robot-config src/llm_foraging/config/robots.yaml

Batch outputs:

per-run outputs stay in log/experiments/<run_id>/
batch aggregate files are written to:
- log/experiments/batches/<batch_id>/batch_results.csv
- log/experiments/batches/<batch_id>/batch_meta.json

CSV writes are incremental — each row is appended and flushed right after its run finishes, so partial progress survives a mid-batch crash.

E) Multi-factor sweeps (36-cell factor grid)

scripts/run_sweep.py orchestrates a full sweep across (team size × arena size × food distribution) in parallel across env-isolated workers. Each worker slot gets its own ROS_DOMAIN_ID (base 51) and GAZEBO_MASTER_URI port (base 11345) so multiple Gazebo instances can run on one machine.

Default grid (matches the current paper design):

team sizes: 4, 6, 8, 10
arena sizes: 6, 8, 10 (food counts 64, 128, 256 respectively — coupled)
distributions: powerlaw, clustered, random
36 cells × 10 replicates × 1200 s per run ÷ 3 parallel workers ≈ 40 hours

# Full production sweep — launch inside tmux for SSH-safe 40 h run
tmux new -s sweep
python3 scripts/run_sweep.py \
  --sweep-tag llm_hybrid_v2_fixed \
  --runs-per-cell 10 --run-seconds 1200 --max-concurrency 3
# Detach: Ctrl-b d. Reattach: tmux attach -t sweep

Useful options:

--cells <list> — restrict to specific cell ids.
- Bare form: --cells team04_arena06_powerlaw,team04_arena08_clustered
- Per-run-index form (semicolon separator): --cells 'team04_arena06_powerlaw:8;team04_arena08_powerlaw:6,7'
- When any :idx appears, the per-cell indices are passed through to run_batch.py --run-indices, so only those specific replicates run.
--force — bypass the resume logic that would otherwise skip cells with complete row counts.
--dry-run — generate per-cell yaml configs and print the planned commands without launching Gazebo. Always run this once before a 40 h sweep.

Sweep output layout:

log/sweeps/<stamp>_<tag>/
├── sweep_manifest.yaml             # factor grid, git hash, CLI args
├── sweep_results.csv               # ONE row per run — main analysis file
├── sweep_summary.json              # per-cell aggregates
└── cells/<cell_id>/
    ├── configs/{foraging_sim,robots}.yaml   # per-cell generated config
    ├── experiments/<run_id>/
    │   ├── summary.json
    │   ├── manifest.yaml
    │   ├── artifacts/food_stats.csv
    │   ├── artifacts/llm_decisions.jsonl
    │   └── bag/core/core_0.db3               # rosbag (bulk of disk use)
    └── experiments/batches/<batch_id>/batch_results.csv

The aggregator in scripts/sweep/aggregate.py polls every ~5 s and appends new rows from each cell's batch_results.csv into the sweep-level sweep_results.csv, so progress is visible live.

F) Merging a rerun back into a base sweep

scripts/merge_reruns.py combines a base sweep with a rerun sweep: base rows whose (cell_id, run_index) key appears in the rerun are dropped, rerun rows take their place, and a new merged sweep_results.csv + sweep_summary.json is written to a fresh output directory. Inputs are read-only.

python3 scripts/merge_reruns.py \
  --base   log/sweeps/<base_stamp> \
  --rerun  log/sweeps/<rerun_stamp> \
  --output log/sweeps/<final_name>

This is how targeted replacement of a few contaminated runs produces a clean published dataset without re-running the entire grid.

G) Disk management

Each run's bag/core/core_0.db3 is ~400–1000 MB (99 % of disk usage). Bags are only needed for time-series replay / trajectory analysis; the summary.json, artifacts/, and the roll-up sweep_results.csv contain everything needed for standard aggregate analysis. After extracting final stats, it is safe to bulk-delete bags to reclaim space:

find log/sweeps/<sweep_dir> -type d -name bag -exec rm -rf {} +

5) Configuration Model

Primary config file:

src/llm_foraging/config/foraging_sim.yaml

Key sections:

environment: nest + wall geometry for spawner
food_manager: food distribution and services
cpfa_controller: CPFA and motion/avoidance parameters
pheromone_manager: pheromone lifecycle and publication
food_gazebo_visualizer: food model spawn/delete/follow behavior

Robot team layout:

src/llm_foraging/config/robots.yaml
Grid format: rows, cols, spacing, center, namespace/name prefixes

LLM Policy Mode (Optional)

cpfa_controller now supports an explicit policy switch:

decision_policy_mode: 'cpfa' keeps pure CPFA behavior (baseline-safe default)
decision_policy_mode: 'llm_hybrid' enables low-frequency LLM decisions

LLM is additionally gated by:

llm_enabled: true|false

Recommended baseline-vs-LLM comparison workflow:

Keep all non-LLM parameters identical.
Run baseline with:
- decision_policy_mode: 'cpfa'
- llm_enabled: false
Run LLM hybrid with:
- decision_policy_mode: 'llm_hybrid'
- llm_enabled: true

Required environment variable for LLM mode:

OpenAI: OPENAI_API_KEY (or set llm_api_key_env to another variable name)
Anthropic Claude: ANTHROPIC_API_KEY (or set llm_api_key_env accordingly)

Credential loading order in LLM mode:

secret*.yml / secrets*.yml file in config/ (auto-discovered, or set explicitly with llm_secrets_file)
llm_api_key parameter (if set)
environment variable from llm_api_key_env (default OPENAI_API_KEY)

Expected secrets file keys:

provider (optional: openai|anthropic|claude|auto)
api_base
model (optional)
api_key or api_keys (list/dict, indexed by llm_secrets_key_index)

LLM behavior:

Decision events only (not per-tick motion control):
- post-deposit departure strategy
- nest-arrival departure strategy (return-to-nest without food)
- starvation-triggered search reevaluation (timer is time since last nest visit)
Exact-action LLM mode for starvation:
- both starvation actions are always available (RETURN_TO_NEST_FOR_INFO, CONTINUE_SEARCH)
- prompt instructs RETURN_TO_NEST_FOR_INFO as a desperation / last-resort choice
strict JSON response validation with action whitelist
deterministic fallback to CPFA if timeout, parse failure, invalid action, or stale response
optional per-robot text transcripts (request/response/error) in llm_conversation_log_dir

Useful LLM parameters in cpfa_controller.ros__parameters:

llm_timeout_sec
llm_nest_arrival_enabled
llm_starvation_threshold_sec
llm_starvation_query_interval_sec
llm_secrets_file
llm_secrets_key_index
llm_model
llm_reasoning_effort (low|medium|high or empty for provider default)
llm_provider (openai|anthropic|auto)
llm_anthropic_version (default 2023-06-01)
llm_log_decisions
llm_decision_topic (default /cpfa/llm_decision)
llm_conversation_log_enabled
llm_conversation_log_dir

6) Motion Architecture (Critical)

There are two kinematic paths in this codebase:

Gazebo model plugin path

plugin: plugins/kinematic_drive_plugin.cpp
receives cmd_vel, updates pose in Gazebo, publishes odom

Controller-side SetEntityState path

cpfa_controller can directly call /gazebo/set_entity_state

Current recommended CPFA setup:

use Gazebo kinematic plugin in launch (use_kinematic_plugin:=true in CPFA launch)
keep cpfa_controller.use_kinematic_motion: false

This avoids double-driving robots and reduces service-call bottlenecks.

Crisp constant-rate turning (CPFA controller):

drive_constant_angular_turning: true enables fixed-magnitude angular turns for navigation and avoidance yaw alignment.
drive_yaw_deadband, avoidance_turn_tolerance, and survey_turn_tolerance control stop-near-target margin (default target tuning in this repo is 0.10 rad).
Set drive_constant_angular_turning: false to restore legacy proportional turn behavior (with clamped angular speed).

7) Key Topics and Services

Useful topics:

/food/list (llm_foraging/msg/FoodList)
/food/stats (llm_foraging/msg/FoodStats)
/food/spawn_ready (std_msgs/Bool)
/cpfa/state (llm_foraging/msg/CPFAState)
/pheromone/list (llm_foraging/msg/PheromoneList)
/gazebo/model_states (gazebo_msgs/ModelStates)

Food services:

/food/check_proximity (CheckFood)
/food/pickup (PickupFood)
/food/deposit (DepositFood)
/food/reset (Trigger)

Pheromone services:

/pheromone/deposit (DepositPheromone)
/pheromone/get (GetPheromone)
/pheromone/reset (Trigger)

Gazebo services commonly used:

/spawn_entity
/delete_entity
/gazebo/set_entity_state

8) Development Workflow

A) Add/modify Python nodes

Edit under src/llm_foraging/llm_foraging/
Ensure executable entry exists in setup.py (console_scripts)
Ensure install rule exists in CMakeLists.txt install(PROGRAMS ...)
Rebuild package and re-source workspace

B) Add/modify interfaces

Edit msg/ or srv/
Update rosidl_generate_interfaces in CMakeLists.txt
Rebuild package

C) Add/modify Gazebo plugins

Edit/create C++ plugin in plugins/
Register plugin target in CMakeLists.txt
Add plugin to world/model SDF as needed
Rebuild package and relaunch Gazebo

D) Recommended iteration loop

colcon build --packages-select llm_foraging --symlink-install
source install/setup.bash
ros2 launch llm_foraging cpfa_simulation.launch.py

9) Troubleshooting

Executable not found in launch

Rebuild package and source workspace again:
- colcon build --packages-select llm_foraging --symlink-install
- source install/setup.bash
Check:
- ros2 pkg executables llm_foraging

`ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'`

Your shell is running Python 3.11 (conda base / venv) but ROS Humble's rclpy is built for Python 3.10 only.
Fix: clean the PATH (see "Python version" note in §2 Prerequisites) so which python3 resolves to /usr/bin/python3.10, then re-source /opt/ros/humble/setup.bash and install/setup.bash.
Symptom in a sweep: rows appear in sweep_results.csv with summary_found=False and empty metric columns — the experiment_recorder_node.py crashed on import and never wrote summary.json.

`spawn_entity` not available / nothing spawns

Ensure gzserver is running and did not crash.
If Gazebo reports Address already in use, terminate stale Gazebo processes and relaunch.

`spawn_entity.py: specified file /tmp/tmp..._tb*.sdf does not exist`

Historical bug — an older version of run_batch.py ran an mtime-scoped /tmp SDF cleanup between runs. /tmp is shared across parallel workers, so that cleanup could delete a sibling worker's in-flight SDF during spawn and the whole run would start with an empty world.
Fixed by making _cleanup_temp_sdfs a no-op. Don't re-add a cleanup that isn't scoped to files owned by the current worker. /tmp litter (~14 MB per sweep) gets cleaned on reboot.

Robots not moving at startup

Confirm /food/spawn_ready is published true (controllers can wait for food spawn completion).
Confirm controllers started for all namespaces from robots.yaml.

Pheromone trails not visible

Verify /pheromone/list has active pheromones.
Verify world contains pheromone_trail_visualizer plugin (worlds/multi_cpfa_world.world).

Sweep hangs after Ctrl-C

Single SIGINT on run_sweep.py triggers graceful shutdown (sets the flag, forwards SIGINT to active workers), but workers' run_batch.py catches the signal at the current replicate's proc.wait(), kills that one replicate, and then loops to the next replicate instead of exiting.
Send a second SIGINT to trigger the orchestrator's hard-exit path, then kill remaining run_batch.py process groups manually:
```
for pid in $(pgrep -f "run_batch.py.*<sweep_tag>"); do
  kill -KILL -"$pid"
done
```

10) Contributing Guidelines

Recommended:

create feature branches for major behavior changes
keep parameter changes in foraging_sim.yaml centralized
document behavioral changes in commit messages and PR descriptions
avoid mixing unrelated refactors with controller behavior changes

License:

Apache 2.0 (see src/llm_foraging/package.xml)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src/llm_foraging		src/llm_foraging
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LLM Foraging Workspace (ROS 2 + Gazebo Classic)

1) Repository Layout

2) Prerequisites

3) Build

4) Main Run Modes

A) CPFA simulation

B) Important launch arguments

C) Experiment Recording (Always-On by Default in CPFA Launch)

D) Batch Runs (Multiple Trials)

E) Multi-factor sweeps (36-cell factor grid)

F) Merging a rerun back into a base sweep

G) Disk management

5) Configuration Model

LLM Policy Mode (Optional)

6) Motion Architecture (Critical)

7) Key Topics and Services

8) Development Workflow

A) Add/modify Python nodes

B) Add/modify interfaces

C) Add/modify Gazebo plugins

D) Recommended iteration loop

9) Troubleshooting

Executable not found in launch

ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'

spawn_entity not available / nothing spawns

spawn_entity.py: specified file /tmp/tmp..._tb*.sdf does not exist

Robots not moving at startup

Pheromone trails not visible

Sweep hangs after Ctrl-C

10) Contributing Guidelines

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'`

`spawn_entity` not available / nothing spawns

`spawn_entity.py: specified file /tmp/tmp..._tb*.sdf does not exist`

Packages