This repository contains an active ROS 2 + Gazebo Classic reimplementation of CPFA-style multi-robot foraging, migrated from the original ARGoS/C++ codebase.
Current focus:
- reliable multi-robot spawning
- food detection, pickup, and deposit pipeline
- CPFA controller integration (site fidelity, pheromones, uninformed search)
- Gazebo visualization plugins (pheromone trails and robot status LEDs)
The project is under active development. Use this README as the source of truth for how to run and extend the code.
Workspace root:
src/llm_foraging: ROS 2 package (nodes, launch files, config, plugins, interfaces)
Inside src/llm_foraging:
llm_foraging/: Python nodes and CPFA logiclaunch/: main launch entry pointsconfig/: simulation and controller parametersplugins/: Gazebo plugins (C++)worlds/: Gazebo world filesmsg/,srv/: custom ROS interfaces
Recommended environment:
- Ubuntu 22.04
- ROS 2 Humble
- Gazebo Classic 11 (
gazebo_ros)
Install commonly required packages (example):
sudo apt update
sudo apt install -y \
ros-humble-gazebo-ros-pkgs \
ros-humble-turtlebot3 \
ros-humble-turtlebot3-gazebo \
python3-colcon-common-extensionsNotes:
- This repo uses Gazebo Classic plugins (
gazebo_ros_state, custom.soplugins). package.xmlis the authoritative dependency list.
Python version: ROS 2 Humble's rclpy C extension is built for Python 3.10 only. Do NOT run anything under a conda env (3.11+) or the project's .venv; experiment_recorder_node.py and any rclpy import will crash with ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'. Before running launch/batch/sweep commands, make sure which python3 resolves to /usr/bin/python3.10:
# Strip venv and conda from PATH for this shell
export PATH=$(echo "$PATH" | tr ":" "\n" | grep -v "/\.venv/" | grep -v "/anaconda3/" | paste -sd:)
unset VIRTUAL_ENV CONDA_DEFAULT_ENV CONDA_PREFIX CONDA_SHLVL
source /opt/ros/humble/setup.bash
source install/setup.bashFrom workspace root:
source /opt/ros/humble/setup.bash
colcon build --packages-select llm_foraging --symlink-install
source install/setup.bashWhy --symlink-install:
- faster Python iteration
- launch/executable updates reflected without full reinstall in many cases
ros2 launch llm_foraging cpfa_simulation.launch.pyStarts:
- Gazebo world + robots
food_manager_nodefood_gazebo_visualizerpheromone_manager_node- one
cpfa_controllerper robot namespace
Used by CPFA and full launches:
foraging_config:=<path>(default:config/foraging_sim.yaml)robot_config:=<path>(default:config/robots.yaml)environment_config:=<path>(default:config/foraging_sim.yaml)max_parallel_spawns:=auto|all|Nuse_sim_time:=true|falseheadless:=true|false(defaultfalse) — skip launchinggzclient; recommended for batch/sweep runs
Example:
ros2 launch llm_foraging cpfa_simulation.launch.py max_parallel_spawns:=all headless:=truecpfa_simulation.launch.py now supports per-run experiment recording:
- creates run folder:
log/experiments/<run_id>/ - writes
manifest.yamlwith launch/config factors and git hash (best effort) - records rosbag core topics into
bag/core* - runs a passive recorder node writing live artifacts to
artifacts/ - writes
summary.jsonon shutdown and appendslog/experiments/index.csv
Launch arguments:
record_experiment:=true|false(defaulttrue)experiment_root_dir:=<path>(defaultlog/experiments)experiment_tag:=<label>(default empty)bag_profile:=core(v1 uses core only)
Artifacts written per run:
manifest.yamlbag/core*artifacts/food_stats.csvartifacts/llm_decisions.jsonlartifacts/cpfa_state.jsonlsummary.json
Use the helper script to run repeated CPFA launches with fixed wall-clock duration per run and collect one consolidated CSV:
python3 scripts/run_batch.py --runs 10 --run-seconds 1200 --tag-prefix llm_hybridCommon options:
--experiment-root log/experiments--foraging-config <path>--robot-config <path>--environment-config <path>--max-parallel-spawns auto|all|N--extra-launch-arg key:=value(repeatable)--extra-result-col key=value(repeatable) — injects a column into every row ofbatch_results.csv; used byrun_sweep.pyto record factor values--run-indices N[,M,...]— run only these specific replicate indices (1..10) using their correspondingTRIAL_SEEDSentries, instead of the full 1..N range. Useful for surgically re-running a handful of failed replicates with the same seeds as the original.
Example with custom configs:
python3 scripts/run_batch.py \
--runs 10 \
--run-seconds 900 \
--tag-prefix gpt5mini_team6 \
--foraging-config src/llm_foraging/config/foraging_sim.yaml \
--robot-config src/llm_foraging/config/robots.yamlBatch outputs:
- per-run outputs stay in
log/experiments/<run_id>/ - batch aggregate files are written to:
log/experiments/batches/<batch_id>/batch_results.csvlog/experiments/batches/<batch_id>/batch_meta.json
CSV writes are incremental — each row is appended and flushed right after its run finishes, so partial progress survives a mid-batch crash.
scripts/run_sweep.py orchestrates a full sweep across
(team size × arena size × food distribution) in parallel across env-isolated
workers. Each worker slot gets its own ROS_DOMAIN_ID (base 51) and
GAZEBO_MASTER_URI port (base 11345) so multiple Gazebo instances can run
on one machine.
Default grid (matches the current paper design):
- team sizes: 4, 6, 8, 10
- arena sizes: 6, 8, 10 (food counts 64, 128, 256 respectively — coupled)
- distributions: powerlaw, clustered, random
- 36 cells × 10 replicates × 1200 s per run ÷ 3 parallel workers ≈ 40 hours
# Full production sweep — launch inside tmux for SSH-safe 40 h run
tmux new -s sweep
python3 scripts/run_sweep.py \
--sweep-tag llm_hybrid_v2_fixed \
--runs-per-cell 10 --run-seconds 1200 --max-concurrency 3
# Detach: Ctrl-b d. Reattach: tmux attach -t sweepUseful options:
--cells <list>— restrict to specific cell ids.- Bare form:
--cells team04_arena06_powerlaw,team04_arena08_clustered - Per-run-index form (semicolon separator):
--cells 'team04_arena06_powerlaw:8;team04_arena08_powerlaw:6,7' - When any
:idxappears, the per-cell indices are passed through torun_batch.py --run-indices, so only those specific replicates run.
- Bare form:
--force— bypass the resume logic that would otherwise skip cells with complete row counts.--dry-run— generate per-cell yaml configs and print the planned commands without launching Gazebo. Always run this once before a 40 h sweep.
Sweep output layout:
log/sweeps/<stamp>_<tag>/
├── sweep_manifest.yaml # factor grid, git hash, CLI args
├── sweep_results.csv # ONE row per run — main analysis file
├── sweep_summary.json # per-cell aggregates
└── cells/<cell_id>/
├── configs/{foraging_sim,robots}.yaml # per-cell generated config
├── experiments/<run_id>/
│ ├── summary.json
│ ├── manifest.yaml
│ ├── artifacts/food_stats.csv
│ ├── artifacts/llm_decisions.jsonl
│ └── bag/core/core_0.db3 # rosbag (bulk of disk use)
└── experiments/batches/<batch_id>/batch_results.csv
The aggregator in scripts/sweep/aggregate.py polls every ~5 s and appends
new rows from each cell's batch_results.csv into the sweep-level
sweep_results.csv, so progress is visible live.
scripts/merge_reruns.py combines a base sweep with a rerun sweep: base rows
whose (cell_id, run_index) key appears in the rerun are dropped, rerun rows
take their place, and a new merged sweep_results.csv + sweep_summary.json
is written to a fresh output directory. Inputs are read-only.
python3 scripts/merge_reruns.py \
--base log/sweeps/<base_stamp> \
--rerun log/sweeps/<rerun_stamp> \
--output log/sweeps/<final_name>This is how targeted replacement of a few contaminated runs produces a clean published dataset without re-running the entire grid.
Each run's bag/core/core_0.db3 is ~400–1000 MB (99 % of disk usage). Bags
are only needed for time-series replay / trajectory analysis; the
summary.json, artifacts/, and the roll-up sweep_results.csv contain
everything needed for standard aggregate analysis. After extracting final
stats, it is safe to bulk-delete bags to reclaim space:
find log/sweeps/<sweep_dir> -type d -name bag -exec rm -rf {} +Primary config file:
src/llm_foraging/config/foraging_sim.yaml
Key sections:
environment: nest + wall geometry for spawnerfood_manager: food distribution and servicescpfa_controller: CPFA and motion/avoidance parameterspheromone_manager: pheromone lifecycle and publicationfood_gazebo_visualizer: food model spawn/delete/follow behavior
Robot team layout:
src/llm_foraging/config/robots.yaml- Grid format: rows, cols, spacing, center, namespace/name prefixes
cpfa_controller now supports an explicit policy switch:
decision_policy_mode: 'cpfa'keeps pure CPFA behavior (baseline-safe default)decision_policy_mode: 'llm_hybrid'enables low-frequency LLM decisions
LLM is additionally gated by:
llm_enabled: true|false
Recommended baseline-vs-LLM comparison workflow:
- Keep all non-LLM parameters identical.
- Run baseline with:
decision_policy_mode: 'cpfa'llm_enabled: false
- Run LLM hybrid with:
decision_policy_mode: 'llm_hybrid'llm_enabled: true
Required environment variable for LLM mode:
- OpenAI:
OPENAI_API_KEY(or setllm_api_key_envto another variable name) - Anthropic Claude:
ANTHROPIC_API_KEY(or setllm_api_key_envaccordingly)
Credential loading order in LLM mode:
secret*.yml/secrets*.ymlfile inconfig/(auto-discovered, or set explicitly withllm_secrets_file)llm_api_keyparameter (if set)- environment variable from
llm_api_key_env(defaultOPENAI_API_KEY)
Expected secrets file keys:
provider(optional:openai|anthropic|claude|auto)api_basemodel(optional)api_keyorapi_keys(list/dict, indexed byllm_secrets_key_index)
LLM behavior:
- Decision events only (not per-tick motion control):
- post-deposit departure strategy
- nest-arrival departure strategy (return-to-nest without food)
- starvation-triggered search reevaluation (timer is time since last nest visit)
- Exact-action LLM mode for starvation:
- both starvation actions are always available (
RETURN_TO_NEST_FOR_INFO,CONTINUE_SEARCH) - prompt instructs
RETURN_TO_NEST_FOR_INFOas a desperation / last-resort choice
- both starvation actions are always available (
- strict JSON response validation with action whitelist
- deterministic fallback to CPFA if timeout, parse failure, invalid action, or stale response
- optional per-robot text transcripts (request/response/error) in
llm_conversation_log_dir
Useful LLM parameters in cpfa_controller.ros__parameters:
llm_timeout_secllm_nest_arrival_enabledllm_starvation_threshold_secllm_starvation_query_interval_secllm_secrets_filellm_secrets_key_indexllm_modelllm_reasoning_effort(low|medium|highor empty for provider default)llm_provider(openai|anthropic|auto)llm_anthropic_version(default2023-06-01)llm_log_decisionsllm_decision_topic(default/cpfa/llm_decision)llm_conversation_log_enabledllm_conversation_log_dir
There are two kinematic paths in this codebase:
- Gazebo model plugin path
- plugin:
plugins/kinematic_drive_plugin.cpp - receives
cmd_vel, updates pose in Gazebo, publishes odom
- Controller-side SetEntityState path
cpfa_controllercan directly call/gazebo/set_entity_state
Current recommended CPFA setup:
- use Gazebo kinematic plugin in launch (
use_kinematic_plugin:=truein CPFA launch) - keep
cpfa_controller.use_kinematic_motion: false
This avoids double-driving robots and reduces service-call bottlenecks.
Crisp constant-rate turning (CPFA controller):
drive_constant_angular_turning: trueenables fixed-magnitude angular turns for navigation and avoidance yaw alignment.drive_yaw_deadband,avoidance_turn_tolerance, andsurvey_turn_tolerancecontrol stop-near-target margin (default target tuning in this repo is0.10rad).- Set
drive_constant_angular_turning: falseto restore legacy proportional turn behavior (with clamped angular speed).
Useful topics:
/food/list(llm_foraging/msg/FoodList)/food/stats(llm_foraging/msg/FoodStats)/food/spawn_ready(std_msgs/Bool)/cpfa/state(llm_foraging/msg/CPFAState)/pheromone/list(llm_foraging/msg/PheromoneList)/gazebo/model_states(gazebo_msgs/ModelStates)
Food services:
/food/check_proximity(CheckFood)/food/pickup(PickupFood)/food/deposit(DepositFood)/food/reset(Trigger)
Pheromone services:
/pheromone/deposit(DepositPheromone)/pheromone/get(GetPheromone)/pheromone/reset(Trigger)
Gazebo services commonly used:
/spawn_entity/delete_entity/gazebo/set_entity_state
- Edit under
src/llm_foraging/llm_foraging/ - Ensure executable entry exists in
setup.py(console_scripts) - Ensure install rule exists in
CMakeLists.txtinstall(PROGRAMS ...) - Rebuild package and re-source workspace
- Edit
msg/orsrv/ - Update
rosidl_generate_interfacesinCMakeLists.txt - Rebuild package
- Edit/create C++ plugin in
plugins/ - Register plugin target in
CMakeLists.txt - Add plugin to world/model SDF as needed
- Rebuild package and relaunch Gazebo
colcon build --packages-select llm_foraging --symlink-install
source install/setup.bash
ros2 launch llm_foraging cpfa_simulation.launch.py- Rebuild package and source workspace again:
colcon build --packages-select llm_foraging --symlink-installsource install/setup.bash
- Check:
ros2 pkg executables llm_foraging
- Your shell is running Python 3.11 (conda base / venv) but ROS Humble's
rclpyis built for Python 3.10 only. - Fix: clean the PATH (see "Python version" note in §2 Prerequisites) so
which python3resolves to/usr/bin/python3.10, then re-source/opt/ros/humble/setup.bashandinstall/setup.bash. - Symptom in a sweep: rows appear in
sweep_results.csvwithsummary_found=Falseand empty metric columns — theexperiment_recorder_node.pycrashed on import and never wrotesummary.json.
- Ensure
gzserveris running and did not crash. - If Gazebo reports
Address already in use, terminate stale Gazebo processes and relaunch.
- Historical bug — an older version of
run_batch.pyran an mtime-scoped/tmpSDF cleanup between runs./tmpis shared across parallel workers, so that cleanup could delete a sibling worker's in-flight SDF during spawn and the whole run would start with an empty world. - Fixed by making
_cleanup_temp_sdfsa no-op. Don't re-add a cleanup that isn't scoped to files owned by the current worker./tmplitter (~14 MB per sweep) gets cleaned on reboot.
- Confirm
/food/spawn_readyis publishedtrue(controllers can wait for food spawn completion). - Confirm controllers started for all namespaces from
robots.yaml.
- Verify
/pheromone/listhas active pheromones. - Verify world contains
pheromone_trail_visualizerplugin (worlds/multi_cpfa_world.world).
- Single SIGINT on
run_sweep.pytriggers graceful shutdown (sets the flag, forwards SIGINT to active workers), but workers'run_batch.pycatches the signal at the current replicate'sproc.wait(), kills that one replicate, and then loops to the next replicate instead of exiting. - Send a second SIGINT to trigger the orchestrator's hard-exit path,
then kill remaining
run_batch.pyprocess groups manually:for pid in $(pgrep -f "run_batch.py.*<sweep_tag>"); do kill -KILL -"$pid" done
Recommended:
- create feature branches for major behavior changes
- keep parameter changes in
foraging_sim.yamlcentralized - document behavioral changes in commit messages and PR descriptions
- avoid mixing unrelated refactors with controller behavior changes
License:
- Apache 2.0 (see
src/llm_foraging/package.xml)