Dev by shuvoxcd01 · Pull Request #39 · shuvoxcd01/GridMind

shuvoxcd01 · 2026-05-12T06:47:14Z

No description provided.

…into research-dev

…vault

…ucture

…agent interactions

* Dev (#27) * Add optional preprocessor * Update .gitignore * Add EpsilonRandomizedPolicyWrapper * Update .gitignore * Update .gitignore * Update .gitignore * Remove unused imports * Add base class for NN-based algorithms * Add base class for NN-based policy * Add neuroevolution * Update base learning algorithm * Add preprocessing step * Update base class type * Update base class * Enhance functionalities - Add preprocessor - Record action prob * Include additional info in trajectory * Add wrapper for frozenlake * Add utility for evolutionary RL * Add algorithms * Add performance evaluator for evolutionary RL * Update algorithm * Update episode collector * Update multiprocessing * Rename directory * Change directory * Fix imports * Change directory * Fix imports * Fix episode return * Update neuroevolution algorithm * Add neuroevolution example * ppo_off_policy_WIP * Update .gitignore * Update .gitignore * Format files * Format files * Updated algorithms * Change value estimator class * Add parent tracker * Track generation * Add QAssistedNeuroEvolution * Add DeepQLearningWithExperienceReplay * Add performance evaluator for evolutionary rl * Add additional properties * Use truncation selection from Selection module * Minor refactor * Add Selection module * Add async tensorboard logger * Add deep q learning * Annotate _train as abstractmethod * Add simple replay buffer * Update algorithm listing * Add save_policy parameter to train * Make num_actions parameter optional * Update ActorCriticPolicy policy call with appropriate argument * Delegate env to parent class * Rename parameter * Disable flatenning * Add tests * Remove unused imports * Change base class type * Add policy for Atari * Add q-network for Atari * Update algorithm * Change type-hint for q_network parameter * Add research algorithm * Add examples for atari * Reformat * Update example * Update algorithm * Fix channel and batch dim * Add mechanism to pop elements * Reformat * Add cuda support * Rename directory * Add save and load mechanisms * Add autofire wrapper * Add flag for q-derived policy evaluation * Add idle_truncation_wrapper * save_networks * Reformat * Periodically save q and agent network * Add q vs agent network comparison * Add saving and loading mechanism * Reformat files * Add mechanism to load and save q network * Reformat files * Reformat * Add target network * Separate target and online network * Add option for training with num_steps along with num_episodes * Add empty _train_steps method * Add optional info * Add base class for evolutionary RL algorithms * Refactor * Refactor * Add extra dim for scalar observations * Refactor * Add numpy array as input * Rename parameter * Refactor * Reformat * Reformat * Fix device placement * Add option to add network graph to tensorboard * Add graph to tensorbaord * Rename parameter * Use encoding * Add embedding based feature extractor * Add q_network with embedding layer * Add examples * Add preprocessing * Reformat * Add dqn taxi example * Update example * Dynamically adjust mutation std * Use embedding instead of one hot * Update in and out features number * Introduce embedding * Update gitignore * Update algorithm - Update mutation rate - Add global generation counter - Implement get_policy * Refactor * Update .gitignore * Refactor * Add taxi_q_network * Chage method name * Change method name * Update algorithm * Add method to get action probabilities * Refactor * Add comparisons * Add metrics * Add info * Update examples * Add wrapper env for taxi * Add soft update of target network * Add default option for soft update * Add max_grad_norm parameter and gradient clipping to prevent exploding gradients * Add customizable loss function to DeepQLearningWithExperienceReplay - Defaults to huber loss * Refactor QAssistedNeuroEvolution class to improve code readability and maintainability by reorganizing imports, adding optional selection functions, and enhancing formatting consistency. * Refactor code for improved readability and consistency by removing unnecessary blank lines and enhancing formatting in multiple files. * Add example usage for LunarLander with Deep Q-Learning agent * Add configuration files and performance comparison script for LunarLander with Q-assisted neuroevolution * Refactor QAssistedNeuroEvolution class to enhance mutation parameter handling and improve logging of algorithm parameters during execution. * Enhance NeuroEvolution class with generation tracking and best agent retrieval * Enhance NeuroEvolution class with generation tracking and best agent retrieval * Fix summary directory handling in configuration loading * Add configuration files for Lunar Lander environment * Refactor configuration loading in Q-assisted neuroevolution script to use ConfigLoader class * Remove unused imports in Q-assisted neuroevolution performance comparison script * Fix configuration file processing to ensure only .ini files are processed * Fix configuration loading to use dynamic generation count * Add configuration files for Lunar Lander environment with updated parameters * Fix import path for ConfigLoader in Q-assisted neuroevolution performance comparison script * Refactor configuration loading and environment setup in Q-assisted neuroevolution script * Add basic configuration file for Taxi environment setup * Refactor training method names from _train to train for consistency across algorithms * Remove unnecessary blank lines in BaseEvoRLAlgorithm class for improved readability * Add __all__ export for algorithm classes in __init__.py * Fix import error handling for SAVE_DATA_DIR in deep_q_learning.py * Refactor constructor calls in MonteCarloEveryVisitPrediction, MonteCarloEveryVisitPredictionIncremental, NStepTDPrediction, and TD0Prediction to include env parameter for consistency * List algorithms in __init__.py * Add feature construction classes to __init__.py for easy access * Add CITATION.cff and pre-commit hooks for version synchronization - Add CITATION.cff file for proper project citation - Add pre-commit configuration with automated version sync - Add script to keep CITATION.cff version in sync with pyproject.toml - Include standard pre-commit hooks for code quality checks * Refactor BaseLearningAlgorithm constructor to handle env initialization and set env_name conditionally * Update optional dependencies for rl-worlds to version 0.0.3.post1 * Move episode collector and trajectory imports to the correct utility module * Add DeterministicLookupPolicy class for action selection based on a lookup table * Add CITATION.cff to .gitignore * Refactor ActorCriticPolicy to accept observation shape and number of actions; add AtariPolicy and AtaricActorCriticPolicy implementations. * Fix training function reference in BaseEvoRLAlgorithm to use the abstract _train method * Implement QLearningExperienceReplay class with experience replay functionality * Refactor NeuroAgent to use 'policy' instead of 'network' for clarity; update related references in NeuroEvolution and QAssistedNeuroEvolution classes. * Refactor update_citation_version.py to use double quotes for string literals and remove unnecessary whitespace * Reformat * Reformat * Refactor to replace QAssistedNeuroEvolution with DeepQAssistedNeuroEvolution in multiple files for consistency and clarity. * Add BaseQAssistedNeuroEvolution class and related components - Introduced `BaseQAssistedNeuroEvolution` class for neuroevolution with Q-learning assistance. - Implemented population management, fitness evaluation, and Q-learning integration. - Added configuration loading for various parameters including population and Q-learning settings. - Updated `NeuroAgent` to accept a more general policy type. - Enhanced `QLearningExperienceReplay` to support new Q-learning features and improved training steps. - Refactored code for better readability and maintainability. * Refactor code for improved readability and consistency across multiple files * Refactor code for consistency and readability; update formatting in various files * Add get_all_action_probabilities method to policy classes; rename get_action_probs to get_action_prob * Add @torch.no_grad() decorator to mutation methods and update action probability retrieval * Add behavior_score property to NeuroAgent and update get_metadata method * Add KNNNeighborRetriever class for novelty search in evolutionary algorithms * Add ParetoSelector class for non-dominated sorting and Pareto-front selection * Add NoveltyUtils class for probability distribution analysis and distance calculations * Add assign_novelty_scores method to NeuroEvolutionUtil for novelty score calculation * Add novelty search functionality to BaseQAssistedNeuroEvolution class * Add use_novelty_search parameter to QAssistedNeuroEvolution class * Optimize Q-value conversion to use numpy for improved efficiency * Refactor action probability calculations to use numpy for improved performance * Fix initialization of QTableDerivedEpsilonGreedyPolicy to correctly set num_actions * Add video recording and evaluation parameters to neuroevolution classes * Fix QLearning constructor to include summary_dir and write_summary parameters * Add summary_dir and write_summary parameters to various learning algorithms * Update default_save_dir logic to handle missing SAVE_DATA_DIR * Set epsilon value in QLearning constructor and update policy with epsilon * Enhance action selection in QLearning and policy classes to support action masks * chore: apply pre-commit auto-fixes and add additional hooks * chore: apply formatting fixes to example files * Fix CITATION.cff to correctly identify "Das" as family name (#17) * Initial plan * Fix CITATION.cff author name format with Das as family name Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> * chore: update version and dependencies in pyproject.toml; fix citation version in CITATION.cff * Fix QNetworkDerivedEpsilonGreedyPolicy missing num_actions parameter to parent constructor (#18) * Initial plan * Fix QNetworkDerivedEpsilonGreedyPolicy to pass num_actions to parent constructor Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> * Fix Q-learning update rule to account for termination in environment step * Bump version to 0.0.7.1 in pyproject.toml --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

… retrieval methods in policy classes

…oved clarity and performance

…probabilities from the actor network

…lic_dev

Copilot

Pull request overview

This PR introduces several new utilities and algorithm enhancements across GridMind, including observation preprocessing (normalization + discretization), prioritized replay, expanded Q-derived policy interfaces (Q value access + action masking), and updates to PPO and evolutionary RL training hooks.

Changes:

Added new utilities: GridDiscretizationWrapper, MinMaxNormalizer, and a PrioritizedReplayBuffer implementation.
Extended Q-derived soft policy interface with get_q_value(s) / get_q_values() and updated Q-table / Q-network epsilon-greedy policies accordingly.
Updated/added algorithms: PPO updated to use GAE-style advantages/returns; added tabular Q(λ) with eligibility traces; refactored evo-RL training hook to call _train.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
src/gridmind/wrappers/env_wrappers/grid_discretization_wrapper.py	New wrapper that discretizes Box observations into a single Discrete state index.
src/gridmind/utils/algorithm_util/prioritized_replay_buffer.py	New prioritized replay buffer (sum-tree) built on top of the existing replay buffer.
src/gridmind/policies/soft/q_derived/q_table_derived_epsilon_greedy_policy.py	Adds public Q-value accessors (`get_q_value(s)`) and masking support.
src/gridmind/policies/soft/q_derived/q_network_derived_epsilon_greedy_policy.py	Adds masked greedy action selection and public Q-value accessors for Q-networks.
src/gridmind/policies/soft/q_derived/base_q_derived_soft_policy.py	Extends the abstract interface to require Q-value accessor methods.
src/gridmind/policies/parameterized/actor_critic_policy.py	Adds `get_all_action_probabilities()` helper.
src/gridmind/feature_construction/normalizer.py	New Min-Max normalizer feature constructor.
src/gridmind/feature_construction/init.py	Exports `MinMaxNormalizer` from the feature construction package.
src/gridmind/algorithms/tabular/temporal_difference/control/q_learning.py	Adjusts TD target to avoid bootstrapping across terminal transitions.
src/gridmind/algorithms/tabular/temporal_difference/control/q_learning_with_eligibility_trace.py	Adds a new Q-learning with eligibility traces control algorithm.
src/gridmind/algorithms/function_approximation/ppo/ppo.py	Refactors PPO update to use advantages/returns (GAE-style) and adjusts hyperparameters.
src/gridmind/algorithms/function_approximation/ppo/one_step_ppo.py	Adds a one-step TD-error PPO variant (new file).
src/gridmind/algorithms/evolutionary_rl/neuroevolution/neuroevolution.py	Refactors evo agent field naming (network→policy) and training method name.
src/gridmind/algorithms/evolutionary_rl/base_evo_rl_algorithm.py	Adds abstract `_train()` and fixes `train()` to call it via `_training_wrapper`.
pyproject.toml	Bumps version, Python requirement, and build-system/dependencies.
example_usage/control/mountain_car/one_step_actor_critic_example.py	Updates example to normalize observations before tile coding + multi-hot encoding.
CITATION.cff	Updates version metadata and formatting.
.gitignore	Adds additional ignored paths (tooling/research/notes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Get the index where this will be stored
+        data_idx = (
+            len(self.buffer)
+            if len(self.buffer) < self.capacity
+            else self.tree.write_index
+        )
+
+        # Use max priority for new experiences if TD-error not provided
+        # This ensures new experiences get sampled at least once
+        if td_error is None:
+            priority = self.max_priority
+        else:
+            priority = self._get_priority(td_error)
+            # Update max priority if this is larger
+            self.max_priority = max(self.max_priority, priority)
+
+        # Store in parent's buffer
+        super().store(state, action, reward, next_state, terminated, truncated)
+
+        # Add priority to tree
+        self.tree.add(priority, data_idx)
+
+        # Update write index for tree
+        self.tree.write_index = (self.tree.write_index + 1) % self.capacity
+


    def spawn_individual(self):
        network = DiscreteActionMLPPolicy(
            observation_shape=self.observation_shape,
            num_actions=self.num_actions,
            num_hidden_layers=2,
        )
-        spawned_individual = NeuroAgent(network=network)
+        spawned_individual = NeuroAgent(policy=network)

        return spawned_individual


+        env=eval_env, epoch_eval_interval=100
+    )
+    policy = ActorCriticPolicy(observation_shape=env.observation_space.shape, num_actions=env.action_space.n)
+    algorithm = PPO(env=env, policy=policy)


+import torch.nn as nn
+
+torch.autograd.set_detect_anomaly(True)
+
+logging.basicConfig(level=logging.DEBUG)


+            returns = torch.tensor(
+                [a + v for a, v in zip(advantages, values)],
+                dtype=torch.float32,
+                device=device,
+            )


+        # Validate that low < high
+        if np.any(self.low >= self.high):
+            raise ValueError(
+                f"All elements of low must be strictly less than high. "
+                f"Got low={self.low}, high={self.high}"
+            )


+        self.obs_low = env.observation_space.low
+        self.obs_high = env.observation_space.high
+        self.obs_shape = env.observation_space.shape
+        self.clip = clip
+
+        # Handle bins_per_dim parameter
+        if isinstance(bins_per_dim, int):
+            # Use same bins for all dimensions
+            if bins_per_dim <= 0:
+                raise ValueError(f"bins_per_dim must be positive, got {bins_per_dim}")
+            self.bins_per_dim = np.full(self.obs_shape[0], bins_per_dim, dtype=np.int32)
+        else:
+            # Use specific bins per dimension
+            bins_array = np.array(bins_per_dim, dtype=np.int32)
+            if len(bins_array) != self.obs_shape[0]:
+                raise ValueError(
+                    f"bins_per_dim length ({len(bins_array)}) must match "
+                    f"observation dimensionality ({self.obs_shape[0]})"
+                )


+cff-version: 0.0.7
 message: "If you use this software, please cite it as below."
 type: software
 title: "GridMind: A Reinforcement Learning Algorithms Library"
-version: 0.0.6
+version: 0.0.7


@@ -1,12 +1,12 @@
 [build-system]
-requires = ["setuptools >= 61.0"]
+requires = ["setuptools >= 78.1.1", "pip >= 25.3"]


…ross multiple files

…r structure for improved performance

… for clarity

…t files

…r module structure

…tability across algorithms

…tion

…ad of 'policy'

…yPolicy and RandomPolicy for action probability retrieval

shuvoxcd01 added 30 commits April 20, 2025 15:06

Update episode collector

3f018ef

Update multiprocessing

af6d555

Rename directory

9dec3e2

Change directory

99c6d58

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind into dev

bd4f877

Merge dev into research-dev

9e93981

Fix imports

13ca329

Change directory

ae8aaa6

Fix imports

c59ec2b

Fix episode return

a409fb3

Update neuroevolution algorithm

4febf6f

Add neuroevolution example

ff5a5c3

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind into dev

5bf9d45

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind-Research …

6423b35

…into research-dev

ppo_off_policy_WIP

125667c

Update .gitignore

f3cf64c

Update .gitignore

a95ecae

Format files

13609a4

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind into dev

45fa0b4

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind-Research …

c013a78

…into research-dev

Format files

58e5ed5

Updated algorithms

95c155b

Change value estimator class

d2196de

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind into dev

b624132

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind-Research …

e03b6b3

…into research-dev

Add parent tracker

5865aae

Track generation

7ccd61c

Add QAssistedNeuroEvolution

98c8d90

Add DeepQLearningWithExperienceReplay

0bebc0f

Add performance evaluator for evolutionary rl

48bcbae

shuvoxcd01 and others added 13 commits December 31, 2025 22:07

feat: add .mcp.json to .gitignore

a330369

fix: update urllib3 version in dependencies

aa889ef

feat: update .gitignore to include research files and Obsidian notes …

8d73935

…vault

feat: enhance _get_greedy_action method to support action masking

c1dd3dd

feat: implement Prioritized Experience Replay Buffer with SumTree str…

589dfc7

…ucture

fix: correct training function reference in BaseEvoRLAlgorithm

68339a3

refactor: update NeuroEvolution to use policy instead of network for …

0d9e007

…agent interactions

feat: implement Q-Learning with Eligibility Trace and enhance Q-value…

942988a

… retrieval methods in policy classes

Add OneStep PPO algorithm with training and evaluation setup

a049671

refactor: streamline PPO class parameters and training logic for impr…

e1fc6e0

…oved clarity and performance

feat: implement get_all_action_probabilities method to return action …

11a8f6f

…probabilities from the actor network

Merge branch 'dev' of https://github.com/shuvoxcd01/GridMind into pub…

7cc290b

…lic_dev

shuvoxcd01 self-assigned this May 12, 2026

shuvoxcd01 requested a review from Copilot May 12, 2026 06:48

Copilot started reviewing on behalf of shuvoxcd01 May 12, 2026 06:49 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

shuvoxcd01 added 13 commits May 12, 2026 17:14

Merge branch 'main' of https://github.com/shuvoxcd01/GridMind into dev

938b6d5

refactor: simplify method implementations and improve code clarity ac…

a776ac2

…ross multiple files

refactor: optimize SumTree methods and enhance PrioritizedReplayBuffe…

0d03c13

…r structure for improved performance

refactor: rename get_continuous_observation to discretize_observation…

3aa75db

… for clarity

refactor: update logging and summary handling across multiple algorithms

2baf831

refactor: add write_summary=False to algorithm initializations in tes…

7853e32

…t files

refactor: remove main execution blocks from multiple files for cleane…

e60cf62

…r module structure

refactor: enhance action probability handling and improve numerical s…

9b47e17

…tability across algorithms

feat: add QNetworkToStateValueEstimatorWrapper for state value estima…

64f7a43

…tion

refactor: add getter and setter for policy in NeuroAgent class

f74d19b

fix: update NeuroAgent instantiation to use 'network' parameter inste…

e35741a

…ad of 'policy'

feat: add get_all_action_probabilities method to StochasticStartGreed…

f7549e1

…yPolicy and RandomPolicy for action probability retrieval

fix: set write_summary parameter to False in various algorithms

8849d51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#39

Dev#39
shuvoxcd01 wants to merge 295 commits into
mainfrom
dev

shuvoxcd01 commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shuvoxcd01 commented May 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants