Fix/learning pipeline by Grigory163 · Pull Request #1 · 0xCyberstan/Royale-RL-A-Reinforcement-Learning-Agent

Grigory163 · 2026-03-21T12:16:54Z

No description provided.

…, and live bot - agent.py: Pre-norm DT with sinusoidal PE, return conditioning, causal mask, PER replay buffer, cosine LR scheduler, gradient clipping, match stats - runbot.py: 7-component reward function (tower damage, destruction bonus, elixir efficiency, tempo, defense), TensorBoard logging, terminal rewards - config.py: Hog 2.6 deck config, evolved card mappings, video parser settings - scaler.py: Robust game area detection with fallback for portrait mode - New: video_finder.py, video_parser.py, card_trainer.py, dataset_builder.py, katacr_converter.py, replay_trainer.py for offline training pipeline - Updated anchors for Russian locale (battle_anchor, game_anchor)

1. Epsilon: 1.0 → 0.4 start (pretrained weights exist, no need for pure random) Decay 0.995 → 0.98 (reach 0.05 in ~60 games instead of ~600) 2. Cross-episode sampling: ReplayBuffer.sample() now respects episode boundaries Previously could sample context windows spanning end of one game + start of next, corrupting returns-to-go and confusing the model 3. Action validation: learn_from_game() now filters out: - "do nothing" actions (not real decisions) - Steps where hand has <2 valid cards (vision glitches) 4. Training safeguards: - Min buffer size = max(context_len*3, 200) before training - Epochs scale with buffer size to prevent overfitting on tiny data 5. Checkpoint persistence: save/load now preserves epsilon, total_games, wins, recent_results, optimizer state — no more resetting to epsilon=1.0 on restart 6. Early playability check in decide_action() — skip if no card is affordable

Architecture overview, learning pipeline, setup guide, and roadmap.

Grigory163 added 3 commits March 21, 2026 15:59

Add project documentation (PROJECT_NOTES.md)

299b9f7

Architecture overview, learning pipeline, setup guide, and roadmap.

Grigory163 force-pushed the fix/learning-pipeline branch from 580eeca to 299b9f7 Compare March 21, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/learning pipeline#1

Fix/learning pipeline#1
Grigory163 wants to merge 3 commits into
0xCyberstan:mainfrom
Grigory163:fix/learning-pipeline

Grigory163 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Grigory163 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant