Dev#39
Open
shuvoxcd01 wants to merge 295 commits into
Open
Conversation
…into research-dev
…into research-dev
…into research-dev
…agent interactions
* Dev (#27) * Add optional preprocessor * Update .gitignore * Add EpsilonRandomizedPolicyWrapper * Update .gitignore * Update .gitignore * Update .gitignore * Remove unused imports * Add base class for NN-based algorithms * Add base class for NN-based policy * Add neuroevolution * Update base learning algorithm * Add preprocessing step * Update base class type * Update base class * Enhance functionalities - Add preprocessor - Record action prob * Include additional info in trajectory * Add wrapper for frozenlake * Add utility for evolutionary RL * Add algorithms * Add performance evaluator for evolutionary RL * Update algorithm * Update episode collector * Update multiprocessing * Rename directory * Change directory * Fix imports * Change directory * Fix imports * Fix episode return * Update neuroevolution algorithm * Add neuroevolution example * ppo_off_policy_WIP * Update .gitignore * Update .gitignore * Format files * Format files * Updated algorithms * Change value estimator class * Add parent tracker * Track generation * Add QAssistedNeuroEvolution * Add DeepQLearningWithExperienceReplay * Add performance evaluator for evolutionary rl * Add additional properties * Use truncation selection from Selection module * Minor refactor * Add Selection module * Add async tensorboard logger * Add deep q learning * Annotate _train as abstractmethod * Add simple replay buffer * Update algorithm listing * Add save_policy parameter to train * Make num_actions parameter optional * Update ActorCriticPolicy policy call with appropriate argument * Delegate env to parent class * Rename parameter * Disable flatenning * Add tests * Remove unused imports * Change base class type * Add policy for Atari * Add q-network for Atari * Update algorithm * Change type-hint for q_network parameter * Add research algorithm * Add examples for atari * Reformat * Update example * Update algorithm * Fix channel and batch dim * Add mechanism to pop elements * Reformat * Add cuda support * Rename directory * Add save and load mechanisms * Add autofire wrapper * Add flag for q-derived policy evaluation * Add idle_truncation_wrapper * save_networks * Reformat * Periodically save q and agent network * Add q vs agent network comparison * Add saving and loading mechanism * Reformat files * Add mechanism to load and save q network * Reformat files * Reformat * Add target network * Separate target and online network * Add option for training with num_steps along with num_episodes * Add empty _train_steps method * Add optional info * Add base class for evolutionary RL algorithms * Refactor * Refactor * Add extra dim for scalar observations * Refactor * Add numpy array as input * Rename parameter * Refactor * Reformat * Reformat * Fix device placement * Add option to add network graph to tensorboard * Add graph to tensorbaord * Rename parameter * Use encoding * Add embedding based feature extractor * Add q_network with embedding layer * Add examples * Add preprocessing * Reformat * Add dqn taxi example * Update example * Dynamically adjust mutation std * Use embedding instead of one hot * Update in and out features number * Introduce embedding * Update gitignore * Update algorithm - Update mutation rate - Add global generation counter - Implement get_policy * Refactor * Update .gitignore * Refactor * Add taxi_q_network * Chage method name * Change method name * Update algorithm * Add method to get action probabilities * Refactor * Add comparisons * Add metrics * Add info * Update examples * Add wrapper env for taxi * Add soft update of target network * Add default option for soft update * Add max_grad_norm parameter and gradient clipping to prevent exploding gradients * Add customizable loss function to DeepQLearningWithExperienceReplay - Defaults to huber loss * Refactor QAssistedNeuroEvolution class to improve code readability and maintainability by reorganizing imports, adding optional selection functions, and enhancing formatting consistency. * Refactor code for improved readability and consistency by removing unnecessary blank lines and enhancing formatting in multiple files. * Add example usage for LunarLander with Deep Q-Learning agent * Add configuration files and performance comparison script for LunarLander with Q-assisted neuroevolution * Refactor QAssistedNeuroEvolution class to enhance mutation parameter handling and improve logging of algorithm parameters during execution. * Enhance NeuroEvolution class with generation tracking and best agent retrieval * Enhance NeuroEvolution class with generation tracking and best agent retrieval * Fix summary directory handling in configuration loading * Add configuration files for Lunar Lander environment * Refactor configuration loading in Q-assisted neuroevolution script to use ConfigLoader class * Remove unused imports in Q-assisted neuroevolution performance comparison script * Fix configuration file processing to ensure only .ini files are processed * Fix configuration loading to use dynamic generation count * Add configuration files for Lunar Lander environment with updated parameters * Fix import path for ConfigLoader in Q-assisted neuroevolution performance comparison script * Refactor configuration loading and environment setup in Q-assisted neuroevolution script * Add basic configuration file for Taxi environment setup * Refactor training method names from _train to train for consistency across algorithms * Remove unnecessary blank lines in BaseEvoRLAlgorithm class for improved readability * Add __all__ export for algorithm classes in __init__.py * Fix import error handling for SAVE_DATA_DIR in deep_q_learning.py * Refactor constructor calls in MonteCarloEveryVisitPrediction, MonteCarloEveryVisitPredictionIncremental, NStepTDPrediction, and TD0Prediction to include env parameter for consistency * List algorithms in __init__.py * Add feature construction classes to __init__.py for easy access * Add CITATION.cff and pre-commit hooks for version synchronization - Add CITATION.cff file for proper project citation - Add pre-commit configuration with automated version sync - Add script to keep CITATION.cff version in sync with pyproject.toml - Include standard pre-commit hooks for code quality checks * Refactor BaseLearningAlgorithm constructor to handle env initialization and set env_name conditionally * Update optional dependencies for rl-worlds to version 0.0.3.post1 * Move episode collector and trajectory imports to the correct utility module * Add DeterministicLookupPolicy class for action selection based on a lookup table * Add CITATION.cff to .gitignore * Refactor ActorCriticPolicy to accept observation shape and number of actions; add AtariPolicy and AtaricActorCriticPolicy implementations. * Fix training function reference in BaseEvoRLAlgorithm to use the abstract _train method * Implement QLearningExperienceReplay class with experience replay functionality * Refactor NeuroAgent to use 'policy' instead of 'network' for clarity; update related references in NeuroEvolution and QAssistedNeuroEvolution classes. * Refactor update_citation_version.py to use double quotes for string literals and remove unnecessary whitespace * Reformat * Reformat * Refactor to replace QAssistedNeuroEvolution with DeepQAssistedNeuroEvolution in multiple files for consistency and clarity. * Add BaseQAssistedNeuroEvolution class and related components - Introduced `BaseQAssistedNeuroEvolution` class for neuroevolution with Q-learning assistance. - Implemented population management, fitness evaluation, and Q-learning integration. - Added configuration loading for various parameters including population and Q-learning settings. - Updated `NeuroAgent` to accept a more general policy type. - Enhanced `QLearningExperienceReplay` to support new Q-learning features and improved training steps. - Refactored code for better readability and maintainability. * Refactor code for improved readability and consistency across multiple files * Refactor code for consistency and readability; update formatting in various files * Add get_all_action_probabilities method to policy classes; rename get_action_probs to get_action_prob * Add @torch.no_grad() decorator to mutation methods and update action probability retrieval * Add behavior_score property to NeuroAgent and update get_metadata method * Add KNNNeighborRetriever class for novelty search in evolutionary algorithms * Add ParetoSelector class for non-dominated sorting and Pareto-front selection * Add NoveltyUtils class for probability distribution analysis and distance calculations * Add assign_novelty_scores method to NeuroEvolutionUtil for novelty score calculation * Add novelty search functionality to BaseQAssistedNeuroEvolution class * Add use_novelty_search parameter to QAssistedNeuroEvolution class * Optimize Q-value conversion to use numpy for improved efficiency * Refactor action probability calculations to use numpy for improved performance * Fix initialization of QTableDerivedEpsilonGreedyPolicy to correctly set num_actions * Add video recording and evaluation parameters to neuroevolution classes * Fix QLearning constructor to include summary_dir and write_summary parameters * Add summary_dir and write_summary parameters to various learning algorithms * Update default_save_dir logic to handle missing SAVE_DATA_DIR * Set epsilon value in QLearning constructor and update policy with epsilon * Enhance action selection in QLearning and policy classes to support action masks * chore: apply pre-commit auto-fixes and add additional hooks * chore: apply formatting fixes to example files * Fix CITATION.cff to correctly identify "Das" as family name (#17) * Initial plan * Fix CITATION.cff author name format with Das as family name Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> * chore: update version and dependencies in pyproject.toml; fix citation version in CITATION.cff * Fix QNetworkDerivedEpsilonGreedyPolicy missing num_actions parameter to parent constructor (#18) * Initial plan * Fix QNetworkDerivedEpsilonGreedyPolicy to pass num_actions to parent constructor Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: shuvoxcd01 <16299215+shuvoxcd01@users.noreply.github.com> --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> * Fix Q-learning update rule to account for termination in environment step * Bump version to 0.0.7.1 in pyproject.toml --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
… retrieval methods in policy classes
…oved clarity and performance
…probabilities from the actor network
There was a problem hiding this comment.
Pull request overview
This PR introduces several new utilities and algorithm enhancements across GridMind, including observation preprocessing (normalization + discretization), prioritized replay, expanded Q-derived policy interfaces (Q value access + action masking), and updates to PPO and evolutionary RL training hooks.
Changes:
- Added new utilities:
GridDiscretizationWrapper,MinMaxNormalizer, and aPrioritizedReplayBufferimplementation. - Extended Q-derived soft policy interface with
get_q_value(s)/get_q_values()and updated Q-table / Q-network epsilon-greedy policies accordingly. - Updated/added algorithms: PPO updated to use GAE-style advantages/returns; added tabular Q(λ) with eligibility traces; refactored evo-RL training hook to call
_train.
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/gridmind/wrappers/env_wrappers/grid_discretization_wrapper.py | New wrapper that discretizes Box observations into a single Discrete state index. |
| src/gridmind/utils/algorithm_util/prioritized_replay_buffer.py | New prioritized replay buffer (sum-tree) built on top of the existing replay buffer. |
| src/gridmind/policies/soft/q_derived/q_table_derived_epsilon_greedy_policy.py | Adds public Q-value accessors (get_q_value(s)) and masking support. |
| src/gridmind/policies/soft/q_derived/q_network_derived_epsilon_greedy_policy.py | Adds masked greedy action selection and public Q-value accessors for Q-networks. |
| src/gridmind/policies/soft/q_derived/base_q_derived_soft_policy.py | Extends the abstract interface to require Q-value accessor methods. |
| src/gridmind/policies/parameterized/actor_critic_policy.py | Adds get_all_action_probabilities() helper. |
| src/gridmind/feature_construction/normalizer.py | New Min-Max normalizer feature constructor. |
| src/gridmind/feature_construction/init.py | Exports MinMaxNormalizer from the feature construction package. |
| src/gridmind/algorithms/tabular/temporal_difference/control/q_learning.py | Adjusts TD target to avoid bootstrapping across terminal transitions. |
| src/gridmind/algorithms/tabular/temporal_difference/control/q_learning_with_eligibility_trace.py | Adds a new Q-learning with eligibility traces control algorithm. |
| src/gridmind/algorithms/function_approximation/ppo/ppo.py | Refactors PPO update to use advantages/returns (GAE-style) and adjusts hyperparameters. |
| src/gridmind/algorithms/function_approximation/ppo/one_step_ppo.py | Adds a one-step TD-error PPO variant (new file). |
| src/gridmind/algorithms/evolutionary_rl/neuroevolution/neuroevolution.py | Refactors evo agent field naming (network→policy) and training method name. |
| src/gridmind/algorithms/evolutionary_rl/base_evo_rl_algorithm.py | Adds abstract _train() and fixes train() to call it via _training_wrapper. |
| pyproject.toml | Bumps version, Python requirement, and build-system/dependencies. |
| example_usage/control/mountain_car/one_step_actor_critic_example.py | Updates example to normalize observations before tile coding + multi-hot encoding. |
| CITATION.cff | Updates version metadata and formatting. |
| .gitignore | Adds additional ignored paths (tooling/research/notes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+235
to
+259
| # Get the index where this will be stored | ||
| data_idx = ( | ||
| len(self.buffer) | ||
| if len(self.buffer) < self.capacity | ||
| else self.tree.write_index | ||
| ) | ||
|
|
||
| # Use max priority for new experiences if TD-error not provided | ||
| # This ensures new experiences get sampled at least once | ||
| if td_error is None: | ||
| priority = self.max_priority | ||
| else: | ||
| priority = self._get_priority(td_error) | ||
| # Update max priority if this is larger | ||
| self.max_priority = max(self.max_priority, priority) | ||
|
|
||
| # Store in parent's buffer | ||
| super().store(state, action, reward, next_state, terminated, truncated) | ||
|
|
||
| # Add priority to tree | ||
| self.tree.add(priority, data_idx) | ||
|
|
||
| # Update write index for tree | ||
| self.tree.write_index = (self.tree.write_index + 1) % self.capacity | ||
|
|
Comment on lines
92
to
100
| def spawn_individual(self): | ||
| network = DiscreteActionMLPPolicy( | ||
| observation_shape=self.observation_shape, | ||
| num_actions=self.num_actions, | ||
| num_hidden_layers=2, | ||
| ) | ||
| spawned_individual = NeuroAgent(network=network) | ||
| spawned_individual = NeuroAgent(policy=network) | ||
|
|
||
| return spawned_individual |
| env=eval_env, epoch_eval_interval=100 | ||
| ) | ||
| policy = ActorCriticPolicy(observation_shape=env.observation_space.shape, num_actions=env.action_space.n) | ||
| algorithm = PPO(env=env, policy=policy) |
Comment on lines
+13
to
+17
| import torch.nn as nn | ||
|
|
||
| torch.autograd.set_detect_anomaly(True) | ||
|
|
||
| logging.basicConfig(level=logging.DEBUG) |
Comment on lines
+181
to
+185
| returns = torch.tensor( | ||
| [a + v for a, v in zip(advantages, values)], | ||
| dtype=torch.float32, | ||
| device=device, | ||
| ) |
Comment on lines
+118
to
+123
| # Validate that low < high | ||
| if np.any(self.low >= self.high): | ||
| raise ValueError( | ||
| f"All elements of low must be strictly less than high. " | ||
| f"Got low={self.low}, high={self.high}" | ||
| ) |
Comment on lines
+123
to
+141
| self.obs_low = env.observation_space.low | ||
| self.obs_high = env.observation_space.high | ||
| self.obs_shape = env.observation_space.shape | ||
| self.clip = clip | ||
|
|
||
| # Handle bins_per_dim parameter | ||
| if isinstance(bins_per_dim, int): | ||
| # Use same bins for all dimensions | ||
| if bins_per_dim <= 0: | ||
| raise ValueError(f"bins_per_dim must be positive, got {bins_per_dim}") | ||
| self.bins_per_dim = np.full(self.obs_shape[0], bins_per_dim, dtype=np.int32) | ||
| else: | ||
| # Use specific bins per dimension | ||
| bins_array = np.array(bins_per_dim, dtype=np.int32) | ||
| if len(bins_array) != self.obs_shape[0]: | ||
| raise ValueError( | ||
| f"bins_per_dim length ({len(bins_array)}) must match " | ||
| f"observation dimensionality ({self.obs_shape[0]})" | ||
| ) |
Comment on lines
+1
to
+5
| cff-version: 0.0.7 | ||
| message: "If you use this software, please cite it as below." | ||
| type: software | ||
| title: "GridMind: A Reinforcement Learning Algorithms Library" | ||
| version: 0.0.6 | ||
| version: 0.0.7 |
| @@ -1,12 +1,12 @@ | |||
| [build-system] | |||
| requires = ["setuptools >= 61.0"] | |||
| requires = ["setuptools >= 78.1.1", "pip >= 25.3"] | |||
…ross multiple files
…r structure for improved performance
…r module structure
…tability across algorithms
…yPolicy and RandomPolicy for action probability retrieval
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.