Skip to content

Zhourobotics/LLM-Foraging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

LLM Foraging Workspace (ROS 2 + Gazebo Classic)

This repository contains an active ROS 2 + Gazebo Classic reimplementation of CPFA-style multi-robot foraging, migrated from the original ARGoS/C++ codebase.

Current focus:

  • reliable multi-robot spawning
  • food detection, pickup, and deposit pipeline
  • CPFA controller integration (site fidelity, pheromones, uninformed search)
  • Gazebo visualization plugins (pheromone trails and robot status LEDs)

The project is under active development. Use this README as the source of truth for how to run and extend the code.

1) Repository Layout

Workspace root:

  • src/llm_foraging: ROS 2 package (nodes, launch files, config, plugins, interfaces)

Inside src/llm_foraging:

  • llm_foraging/: Python nodes and CPFA logic
  • launch/: main launch entry points
  • config/: simulation and controller parameters
  • plugins/: Gazebo plugins (C++)
  • worlds/: Gazebo world files
  • msg/, srv/: custom ROS interfaces

2) Prerequisites

Recommended environment:

  • Ubuntu 22.04
  • ROS 2 Humble
  • Gazebo Classic 11 (gazebo_ros)

Install commonly required packages (example):

sudo apt update
sudo apt install -y \
  ros-humble-gazebo-ros-pkgs \
  ros-humble-turtlebot3 \
  ros-humble-turtlebot3-gazebo \
  python3-colcon-common-extensions

Notes:

  • This repo uses Gazebo Classic plugins (gazebo_ros_state, custom .so plugins).
  • package.xml is the authoritative dependency list.

Python version: ROS 2 Humble's rclpy C extension is built for Python 3.10 only. Do NOT run anything under a conda env (3.11+) or the project's .venv; experiment_recorder_node.py and any rclpy import will crash with ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'. Before running launch/batch/sweep commands, make sure which python3 resolves to /usr/bin/python3.10:

# Strip venv and conda from PATH for this shell
export PATH=$(echo "$PATH" | tr ":" "\n" | grep -v "/\.venv/" | grep -v "/anaconda3/" | paste -sd:)
unset VIRTUAL_ENV CONDA_DEFAULT_ENV CONDA_PREFIX CONDA_SHLVL
source /opt/ros/humble/setup.bash
source install/setup.bash

3) Build

From workspace root:

source /opt/ros/humble/setup.bash
colcon build --packages-select llm_foraging --symlink-install
source install/setup.bash

Why --symlink-install:

  • faster Python iteration
  • launch/executable updates reflected without full reinstall in many cases

4) Main Run Modes

A) CPFA simulation

ros2 launch llm_foraging cpfa_simulation.launch.py

Starts:

  • Gazebo world + robots
  • food_manager_node
  • food_gazebo_visualizer
  • pheromone_manager_node
  • one cpfa_controller per robot namespace

B) Important launch arguments

Used by CPFA and full launches:

  • foraging_config:=<path> (default: config/foraging_sim.yaml)
  • robot_config:=<path> (default: config/robots.yaml)
  • environment_config:=<path> (default: config/foraging_sim.yaml)
  • max_parallel_spawns:=auto|all|N
  • use_sim_time:=true|false
  • headless:=true|false (default false) — skip launching gzclient; recommended for batch/sweep runs

Example:

ros2 launch llm_foraging cpfa_simulation.launch.py max_parallel_spawns:=all headless:=true

C) Experiment Recording (Always-On by Default in CPFA Launch)

cpfa_simulation.launch.py now supports per-run experiment recording:

  • creates run folder: log/experiments/<run_id>/
  • writes manifest.yaml with launch/config factors and git hash (best effort)
  • records rosbag core topics into bag/core*
  • runs a passive recorder node writing live artifacts to artifacts/
  • writes summary.json on shutdown and appends log/experiments/index.csv

Launch arguments:

  • record_experiment:=true|false (default true)
  • experiment_root_dir:=<path> (default log/experiments)
  • experiment_tag:=<label> (default empty)
  • bag_profile:=core (v1 uses core only)

Artifacts written per run:

  • manifest.yaml
  • bag/core*
  • artifacts/food_stats.csv
  • artifacts/llm_decisions.jsonl
  • artifacts/cpfa_state.jsonl
  • summary.json

D) Batch Runs (Multiple Trials)

Use the helper script to run repeated CPFA launches with fixed wall-clock duration per run and collect one consolidated CSV:

python3 scripts/run_batch.py --runs 10 --run-seconds 1200 --tag-prefix llm_hybrid

Common options:

  • --experiment-root log/experiments
  • --foraging-config <path>
  • --robot-config <path>
  • --environment-config <path>
  • --max-parallel-spawns auto|all|N
  • --extra-launch-arg key:=value (repeatable)
  • --extra-result-col key=value (repeatable) — injects a column into every row of batch_results.csv; used by run_sweep.py to record factor values
  • --run-indices N[,M,...] — run only these specific replicate indices (1..10) using their corresponding TRIAL_SEEDS entries, instead of the full 1..N range. Useful for surgically re-running a handful of failed replicates with the same seeds as the original.

Example with custom configs:

python3 scripts/run_batch.py \
  --runs 10 \
  --run-seconds 900 \
  --tag-prefix gpt5mini_team6 \
  --foraging-config src/llm_foraging/config/foraging_sim.yaml \
  --robot-config src/llm_foraging/config/robots.yaml

Batch outputs:

  • per-run outputs stay in log/experiments/<run_id>/
  • batch aggregate files are written to:
    • log/experiments/batches/<batch_id>/batch_results.csv
    • log/experiments/batches/<batch_id>/batch_meta.json

CSV writes are incremental — each row is appended and flushed right after its run finishes, so partial progress survives a mid-batch crash.

E) Multi-factor sweeps (36-cell factor grid)

scripts/run_sweep.py orchestrates a full sweep across (team size × arena size × food distribution) in parallel across env-isolated workers. Each worker slot gets its own ROS_DOMAIN_ID (base 51) and GAZEBO_MASTER_URI port (base 11345) so multiple Gazebo instances can run on one machine.

Default grid (matches the current paper design):

  • team sizes: 4, 6, 8, 10
  • arena sizes: 6, 8, 10 (food counts 64, 128, 256 respectively — coupled)
  • distributions: powerlaw, clustered, random
  • 36 cells × 10 replicates × 1200 s per run ÷ 3 parallel workers ≈ 40 hours
# Full production sweep — launch inside tmux for SSH-safe 40 h run
tmux new -s sweep
python3 scripts/run_sweep.py \
  --sweep-tag llm_hybrid_v2_fixed \
  --runs-per-cell 10 --run-seconds 1200 --max-concurrency 3
# Detach: Ctrl-b d. Reattach: tmux attach -t sweep

Useful options:

  • --cells <list> — restrict to specific cell ids.
    • Bare form: --cells team04_arena06_powerlaw,team04_arena08_clustered
    • Per-run-index form (semicolon separator): --cells 'team04_arena06_powerlaw:8;team04_arena08_powerlaw:6,7'
    • When any :idx appears, the per-cell indices are passed through to run_batch.py --run-indices, so only those specific replicates run.
  • --force — bypass the resume logic that would otherwise skip cells with complete row counts.
  • --dry-run — generate per-cell yaml configs and print the planned commands without launching Gazebo. Always run this once before a 40 h sweep.

Sweep output layout:

log/sweeps/<stamp>_<tag>/
├── sweep_manifest.yaml             # factor grid, git hash, CLI args
├── sweep_results.csv               # ONE row per run — main analysis file
├── sweep_summary.json              # per-cell aggregates
└── cells/<cell_id>/
    ├── configs/{foraging_sim,robots}.yaml   # per-cell generated config
    ├── experiments/<run_id>/
    │   ├── summary.json
    │   ├── manifest.yaml
    │   ├── artifacts/food_stats.csv
    │   ├── artifacts/llm_decisions.jsonl
    │   └── bag/core/core_0.db3               # rosbag (bulk of disk use)
    └── experiments/batches/<batch_id>/batch_results.csv

The aggregator in scripts/sweep/aggregate.py polls every ~5 s and appends new rows from each cell's batch_results.csv into the sweep-level sweep_results.csv, so progress is visible live.

F) Merging a rerun back into a base sweep

scripts/merge_reruns.py combines a base sweep with a rerun sweep: base rows whose (cell_id, run_index) key appears in the rerun are dropped, rerun rows take their place, and a new merged sweep_results.csv + sweep_summary.json is written to a fresh output directory. Inputs are read-only.

python3 scripts/merge_reruns.py \
  --base   log/sweeps/<base_stamp> \
  --rerun  log/sweeps/<rerun_stamp> \
  --output log/sweeps/<final_name>

This is how targeted replacement of a few contaminated runs produces a clean published dataset without re-running the entire grid.

G) Disk management

Each run's bag/core/core_0.db3 is ~400–1000 MB (99 % of disk usage). Bags are only needed for time-series replay / trajectory analysis; the summary.json, artifacts/, and the roll-up sweep_results.csv contain everything needed for standard aggregate analysis. After extracting final stats, it is safe to bulk-delete bags to reclaim space:

find log/sweeps/<sweep_dir> -type d -name bag -exec rm -rf {} +

5) Configuration Model

Primary config file:

  • src/llm_foraging/config/foraging_sim.yaml

Key sections:

  • environment: nest + wall geometry for spawner
  • food_manager: food distribution and services
  • cpfa_controller: CPFA and motion/avoidance parameters
  • pheromone_manager: pheromone lifecycle and publication
  • food_gazebo_visualizer: food model spawn/delete/follow behavior

Robot team layout:

  • src/llm_foraging/config/robots.yaml
  • Grid format: rows, cols, spacing, center, namespace/name prefixes

LLM Policy Mode (Optional)

cpfa_controller now supports an explicit policy switch:

  • decision_policy_mode: 'cpfa' keeps pure CPFA behavior (baseline-safe default)
  • decision_policy_mode: 'llm_hybrid' enables low-frequency LLM decisions

LLM is additionally gated by:

  • llm_enabled: true|false

Recommended baseline-vs-LLM comparison workflow:

  1. Keep all non-LLM parameters identical.
  2. Run baseline with:
    • decision_policy_mode: 'cpfa'
    • llm_enabled: false
  3. Run LLM hybrid with:
    • decision_policy_mode: 'llm_hybrid'
    • llm_enabled: true

Required environment variable for LLM mode:

  • OpenAI: OPENAI_API_KEY (or set llm_api_key_env to another variable name)
  • Anthropic Claude: ANTHROPIC_API_KEY (or set llm_api_key_env accordingly)

Credential loading order in LLM mode:

  1. secret*.yml / secrets*.yml file in config/ (auto-discovered, or set explicitly with llm_secrets_file)
  2. llm_api_key parameter (if set)
  3. environment variable from llm_api_key_env (default OPENAI_API_KEY)

Expected secrets file keys:

  • provider (optional: openai|anthropic|claude|auto)
  • api_base
  • model (optional)
  • api_key or api_keys (list/dict, indexed by llm_secrets_key_index)

LLM behavior:

  • Decision events only (not per-tick motion control):
    • post-deposit departure strategy
    • nest-arrival departure strategy (return-to-nest without food)
    • starvation-triggered search reevaluation (timer is time since last nest visit)
  • Exact-action LLM mode for starvation:
    • both starvation actions are always available (RETURN_TO_NEST_FOR_INFO, CONTINUE_SEARCH)
    • prompt instructs RETURN_TO_NEST_FOR_INFO as a desperation / last-resort choice
  • strict JSON response validation with action whitelist
  • deterministic fallback to CPFA if timeout, parse failure, invalid action, or stale response
  • optional per-robot text transcripts (request/response/error) in llm_conversation_log_dir

Useful LLM parameters in cpfa_controller.ros__parameters:

  • llm_timeout_sec
  • llm_nest_arrival_enabled
  • llm_starvation_threshold_sec
  • llm_starvation_query_interval_sec
  • llm_secrets_file
  • llm_secrets_key_index
  • llm_model
  • llm_reasoning_effort (low|medium|high or empty for provider default)
  • llm_provider (openai|anthropic|auto)
  • llm_anthropic_version (default 2023-06-01)
  • llm_log_decisions
  • llm_decision_topic (default /cpfa/llm_decision)
  • llm_conversation_log_enabled
  • llm_conversation_log_dir

6) Motion Architecture (Critical)

There are two kinematic paths in this codebase:

  1. Gazebo model plugin path
  • plugin: plugins/kinematic_drive_plugin.cpp
  • receives cmd_vel, updates pose in Gazebo, publishes odom
  1. Controller-side SetEntityState path
  • cpfa_controller can directly call /gazebo/set_entity_state

Current recommended CPFA setup:

  • use Gazebo kinematic plugin in launch (use_kinematic_plugin:=true in CPFA launch)
  • keep cpfa_controller.use_kinematic_motion: false

This avoids double-driving robots and reduces service-call bottlenecks.

Crisp constant-rate turning (CPFA controller):

  • drive_constant_angular_turning: true enables fixed-magnitude angular turns for navigation and avoidance yaw alignment.
  • drive_yaw_deadband, avoidance_turn_tolerance, and survey_turn_tolerance control stop-near-target margin (default target tuning in this repo is 0.10 rad).
  • Set drive_constant_angular_turning: false to restore legacy proportional turn behavior (with clamped angular speed).

7) Key Topics and Services

Useful topics:

  • /food/list (llm_foraging/msg/FoodList)
  • /food/stats (llm_foraging/msg/FoodStats)
  • /food/spawn_ready (std_msgs/Bool)
  • /cpfa/state (llm_foraging/msg/CPFAState)
  • /pheromone/list (llm_foraging/msg/PheromoneList)
  • /gazebo/model_states (gazebo_msgs/ModelStates)

Food services:

  • /food/check_proximity (CheckFood)
  • /food/pickup (PickupFood)
  • /food/deposit (DepositFood)
  • /food/reset (Trigger)

Pheromone services:

  • /pheromone/deposit (DepositPheromone)
  • /pheromone/get (GetPheromone)
  • /pheromone/reset (Trigger)

Gazebo services commonly used:

  • /spawn_entity
  • /delete_entity
  • /gazebo/set_entity_state

8) Development Workflow

A) Add/modify Python nodes

  1. Edit under src/llm_foraging/llm_foraging/
  2. Ensure executable entry exists in setup.py (console_scripts)
  3. Ensure install rule exists in CMakeLists.txt install(PROGRAMS ...)
  4. Rebuild package and re-source workspace

B) Add/modify interfaces

  1. Edit msg/ or srv/
  2. Update rosidl_generate_interfaces in CMakeLists.txt
  3. Rebuild package

C) Add/modify Gazebo plugins

  1. Edit/create C++ plugin in plugins/
  2. Register plugin target in CMakeLists.txt
  3. Add plugin to world/model SDF as needed
  4. Rebuild package and relaunch Gazebo

D) Recommended iteration loop

colcon build --packages-select llm_foraging --symlink-install
source install/setup.bash
ros2 launch llm_foraging cpfa_simulation.launch.py

9) Troubleshooting

Executable not found in launch

  • Rebuild package and source workspace again:
    • colcon build --packages-select llm_foraging --symlink-install
    • source install/setup.bash
  • Check:
    • ros2 pkg executables llm_foraging

ModuleNotFoundError: No module named 'rclpy._rclpy_pybind11'

  • Your shell is running Python 3.11 (conda base / venv) but ROS Humble's rclpy is built for Python 3.10 only.
  • Fix: clean the PATH (see "Python version" note in §2 Prerequisites) so which python3 resolves to /usr/bin/python3.10, then re-source /opt/ros/humble/setup.bash and install/setup.bash.
  • Symptom in a sweep: rows appear in sweep_results.csv with summary_found=False and empty metric columns — the experiment_recorder_node.py crashed on import and never wrote summary.json.

spawn_entity not available / nothing spawns

  • Ensure gzserver is running and did not crash.
  • If Gazebo reports Address already in use, terminate stale Gazebo processes and relaunch.

spawn_entity.py: specified file /tmp/tmp..._tb*.sdf does not exist

  • Historical bug — an older version of run_batch.py ran an mtime-scoped /tmp SDF cleanup between runs. /tmp is shared across parallel workers, so that cleanup could delete a sibling worker's in-flight SDF during spawn and the whole run would start with an empty world.
  • Fixed by making _cleanup_temp_sdfs a no-op. Don't re-add a cleanup that isn't scoped to files owned by the current worker. /tmp litter (~14 MB per sweep) gets cleaned on reboot.

Robots not moving at startup

  • Confirm /food/spawn_ready is published true (controllers can wait for food spawn completion).
  • Confirm controllers started for all namespaces from robots.yaml.

Pheromone trails not visible

  • Verify /pheromone/list has active pheromones.
  • Verify world contains pheromone_trail_visualizer plugin (worlds/multi_cpfa_world.world).

Sweep hangs after Ctrl-C

  • Single SIGINT on run_sweep.py triggers graceful shutdown (sets the flag, forwards SIGINT to active workers), but workers' run_batch.py catches the signal at the current replicate's proc.wait(), kills that one replicate, and then loops to the next replicate instead of exiting.
  • Send a second SIGINT to trigger the orchestrator's hard-exit path, then kill remaining run_batch.py process groups manually:
    for pid in $(pgrep -f "run_batch.py.*<sweep_tag>"); do
      kill -KILL -"$pid"
    done

10) Contributing Guidelines

Recommended:

  • create feature branches for major behavior changes
  • keep parameter changes in foraging_sim.yaml centralized
  • document behavioral changes in commit messages and PR descriptions
  • avoid mixing unrelated refactors with controller behavior changes

License:

  • Apache 2.0 (see src/llm_foraging/package.xml)

About

Git repository containing code for paper LLM-Foraging submitted to DARS 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors