Skip to content

Latest commit

 

History

History
205 lines (139 loc) · 6.96 KB

File metadata and controls

205 lines (139 loc) · 6.96 KB

MemCoE: Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

MemCoE Method Overview

This repository contains the official implementation of MemCoE, a cognition-inspired framework that optimizes both how to memorize and what to memorize through a two-stage approach: Memory Guideline Induction (MGI) and Guideline-Aligned Memory Policy Optimization (GMPO).


Table of Contents


Overview

MemCoE introduces a two-stage optimization paradigm for evolving memory systems:

  1. Memory Guideline Induction (MGI): Automatically induces high-level memory guidelines through textual gradient optimization, determining what information is worth memorizing.

  2. Guideline-Aligned Memory Policy Optimization (GMPO): Trains a memory policy model aligned with the induced guidelines, optimizing how to effectively store and retrieve memories.


Installation

Create a new conda environment and install dependencies:

conda create -n memcoe python=3.10
conda activate memcoe
pip install -r requirements.txt

Datasets

Download the following datasets and place them in the MGI_module/datasets/ directory:

Dataset Source
PersonaMem HuggingFace
PrefEval HuggingFace
PersonaBench GitHub
mkdir -p MGI_module/datasets
# Download and extract datasets to MGI_module/datasets/

Stage 1: Memory Guideline Induction (MGI)

Model Deployment

MGI supports both local model deployment via vLLM and API-based models (e.g., GPT-4o-mini).

Option A: Local Model with vLLM

Deploy a local model (e.g., Qwen2.5-7B-Instruct) using vLLM:

CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve Qwen2.5-7B-Instruct \
    --tensor_parallel_size 4 \
    --port 6025 \
    --host 0.0.0.0 \
    --served-model-name Qwen2.5-7B-Instruct

Verify the deployment:

cd MGI_module/
python async_vllm.py --model Qwen2.5-7B-Instruct

Option B: API-based Models

Configure your API credentials in async_llm.py:

API_KEY = 'your-api-key'
BASE_URL = 'your-base-url'

Training

Run the textual gradient optimization to induce memory guidelines:

cd MGI_module/textgrad/
python train_textgrad.py \
    --model Qwen2.5-7B-Instruct \
    --num_chunks 8 \
    --batch_size 10 \
    --num_feedback 5

Output: Results are saved in the textgrad/ directory after each optimization step.

Note: All MGI prompt templates are defined in meta_prompt.py.

Inference

After training, update the TEMPLATE_EVOLVE prompt template in MGI_module/processors/memory.py with the optimized guidelines, then run inference:

cd MGI_module/

# PersonaMem (32k and 128k context)
python run_inference.py --dataset personamem --mode memory --context longcontext --num_chunks 8 --size 32k --model Qwen2.5-7B-Instruct --output_dir results
python run_inference.py --dataset personamem --mode memory --context longcontext --num_chunks 8 --size 128k --model Qwen2.5-7B-Instruct --output_dir results

# PrefEval (explicit and implicit)
python run_inference.py --dataset prefeval --mode memory --num_chunks 8 --pref_form explicit --model Qwen2.5-7B-Instruct --output_dir results
python run_inference.py --dataset prefeval --mode memory --num_chunks 8 --pref_form implicit --model Qwen2.5-7B-Instruct --output_dir results

# PersonaBench (varying noise levels)
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.0 --model Qwen2.5-7B-Instruct --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.3 --model Qwen2.5-7B-Instruct --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.5 --model Qwen2.5-7B-Instruct --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.7 --model Qwen2.5-7B-Instruct --output_dir results

Output: Results are saved in the results/ directory.


Stage 2: Guideline-Aligned Memory Policy Optimization (GMPO)

GMPO Training

Train the memory policy model:

bash run_memory_7B.sh

Note: Modify run_memory_7B.sh to adjust hyperparameters and resource configurations for your hardware setup.

Output: Model checkpoints are saved in memory_agent/7B/global_step_xxx/.

GMPO Inference

1. Convert Checkpoint for vLLM

Run the merger script to convert the checkpoint:

bash scripts/merger.sh

Note: Update CKPT=memory_agent/7B/global_step_xxx in the script with the actual checkpoint step number.

2. Deploy the Trained Model

cd MGI_module/

CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve memory_agent/7B/global_step_xxx/huggingface \
    --tensor_parallel_size 4 \
    --port 6025 \
    --host 0.0.0.0 \
    --served-model-name MemCoE-7B

3. Run Inference

# PersonaMem (32k and 128k context)
python run_inference.py --dataset personamem --mode memory --context longcontext --num_chunks 8 --size 32k --model MemCoE-7B --output_dir results
python run_inference.py --dataset personamem --mode memory --context longcontext --num_chunks 8 --size 128k --model MemCoE-7B --output_dir results

# PrefEval (explicit and implicit)
python run_inference.py --dataset prefeval --mode memory --num_chunks 8 --pref_form explicit --model MemCoE-7B --output_dir results
python run_inference.py --dataset prefeval --mode memory --num_chunks 8 --pref_form implicit --model MemCoE-7B --output_dir results

# PersonaBench (varying noise levels)
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.0 --model MemCoE-7B --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.3 --model MemCoE-7B --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.5 --model MemCoE-7B --output_dir results
python run_inference.py --dataset personabench --mode memory --num_chunks 8 --noise 0.7 --model MemCoE-7B --output_dir results

Acknowledgements

We thank the following projects for their excellent work and open-source contributions:

  • verl - Volcano Engine Reinforcement Learning for LLMs
  • MemAgent - Memory Agent Framework