Skip to content

Fix/learning pipeline#1

Open
Grigory163 wants to merge 3 commits into
0xCyberstan:mainfrom
Grigory163:fix/learning-pipeline
Open

Fix/learning pipeline#1
Grigory163 wants to merge 3 commits into
0xCyberstan:mainfrom
Grigory163:fix/learning-pipeline

Conversation

@Grigory163
Copy link
Copy Markdown

No description provided.

…, and live bot

- agent.py: Pre-norm DT with sinusoidal PE, return conditioning, causal mask,
  PER replay buffer, cosine LR scheduler, gradient clipping, match stats
- runbot.py: 7-component reward function (tower damage, destruction bonus,
  elixir efficiency, tempo, defense), TensorBoard logging, terminal rewards
- config.py: Hog 2.6 deck config, evolved card mappings, video parser settings
- scaler.py: Robust game area detection with fallback for portrait mode
- New: video_finder.py, video_parser.py, card_trainer.py, dataset_builder.py,
  katacr_converter.py, replay_trainer.py for offline training pipeline
- Updated anchors for Russian locale (battle_anchor, game_anchor)
1. Epsilon: 1.0 → 0.4 start (pretrained weights exist, no need for pure random)
   Decay 0.995 → 0.98 (reach 0.05 in ~60 games instead of ~600)

2. Cross-episode sampling: ReplayBuffer.sample() now respects episode boundaries
   Previously could sample context windows spanning end of one game + start of next,
   corrupting returns-to-go and confusing the model

3. Action validation: learn_from_game() now filters out:
   - "do nothing" actions (not real decisions)
   - Steps where hand has <2 valid cards (vision glitches)

4. Training safeguards:
   - Min buffer size = max(context_len*3, 200) before training
   - Epochs scale with buffer size to prevent overfitting on tiny data

5. Checkpoint persistence: save/load now preserves epsilon, total_games, wins,
   recent_results, optimizer state — no more resetting to epsilon=1.0 on restart

6. Early playability check in decide_action() — skip if no card is affordable
Architecture overview, learning pipeline, setup guide, and roadmap.
@Grigory163 Grigory163 force-pushed the fix/learning-pipeline branch from 580eeca to 299b9f7 Compare March 21, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant