Feature: add LoRA fine-tuning support for pi0 policies

## Motivation
Full fine-tuning of pi0 models is expensive. LoRA would enable efficient task-specific adaptation with minimal VRAM.

## Proposal
Add a LoRAPolicy wrapper that applies LoRA adapters to attention layers while keeping the action head trainable.