Skip to content

Mayank-Kumar-Maurya/CliffWalking-Reinforcement-Q-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

🤖 CliffWalking-Reinforcement-Q-Learning

An implementation of the Q-Learning algorithm to solve the classic CliffWalking reinforcement learning environment.

📖 Overview

This repository presents a practical demonstration of the Q-Learning algorithm, a fundamental model-free reinforcement learning technique. The goal is to train an autonomous agent to navigate the "CliffWalking" environment, a classic grid-world problem from OpenAI Gym. The agent must learn an optimal policy to reach a designated goal state while avoiding "cliffs" that reset its progress. This project serves as an excellent educational resource for understanding the core mechanics of Q-Learning, including state-action value estimation, exploration-exploitation trade-off using an epsilon-greedy policy, and policy convergence.

✨ Features

  • Q-Learning Algorithm Implementation: A clear and concise implementation of the Q-Learning algorithm.
  • CliffWalking Environment: Utilizes the popular CliffWalking-v1 environment from OpenAI Gym for a standardized problem setup.
  • Epsilon-Greedy Policy: Incorporates an epsilon-greedy strategy to balance exploration of new actions and exploitation of known optimal actions.
  • Hyperparameter Tuning: Demonstrates configurable learning rate (alpha), discount factor (gamma), and exploration rate (epsilon).
  • Policy Visualization: Tracks and visualizes the agent's performance (e.g., total rewards per episode) during training.
  • Optimal Policy Derivation: Learns and displays the optimal policy discovered by the agent to navigate the cliff-walking grid safely.

🛠️ Tech Stack

  • Runtime: Python
  • Libraries: NumPy
  • Development Tools: Jupyter Notebook

🚀 Quick Start

Follow these steps to set up the project and run the Q-Learning simulation.

Prerequisites

  • Python 3.7+
  • pip (Python package installer)

Installation

  1. Clone the repository

    git clone https://github.com/Mayank-Kumar-Maurya/CliffWalking-Reinforcement-Q-Learning.git
    cd CliffWalking-Reinforcement-Q-Learning
  2. Install dependencies It's highly recommended to use a virtual environment to manage dependencies.

    # Create a virtual environment
    python -m venv venv
    
    # Activate the virtual environment
    # On macOS/Linux:
    source venv/bin/activate
    # On Windows:
    .\venv\Scripts\activate
    
    # Install the required packages
    pip install numpy gym jupyter

Usage

  1. Start Jupyter Notebook After installing dependencies and activating the virtual environment, start the Jupyter Notebook server:

    jupyter notebook
  2. Open and Run the Notebook Your web browser should open a new tab with the Jupyter interface. Navigate to and open Q_Learning.ipynb. You can then execute the cells sequentially to observe the Q-Learning agent's training process, policy convergence, and results.

📁 Project Structure

CliffWalking-Reinforcement-Q-Learning/
├── Q_Learning.ipynb  # Main Jupyter Notebook containing the Q-Learning implementation
└── README.md         # Project README file

⚙️ Configuration

The Q-Learning algorithm's behavior is governed by several hyperparameters, which are defined and can be adjusted within the Q_Learning.ipynb notebook:

  • alpha (Learning Rate): Determines how much new information overrides old information. (e.g., 0.1)
  • gamma (Discount Factor): Controls the importance of future rewards. (e.g., 0.99)
  • epsilon (Exploration Rate): The probability of taking a random action instead of the greedy action. This value typically decays over time. (e.g., initial 1.0, decay factor 0.999)
  • n_episodes: The total number of episodes for which the agent will be trained. (e.g., 5000)

Experimenting with these parameters can significantly impact the agent's learning speed and the quality of the learned policy.

📚 Algorithm

Q-Learning Explained

Q-Learning is an off-policy temporal difference control algorithm that learns the optimal Q-value function. The Q-value, Q(s, a), represents the expected maximum future reward achievable by taking action a in state s and then following the optimal policy thereafter.

The update rule for Q-Learning is:

Q(s, a) ← Q(s, a) + α [r + γ max_a' Q(s', a') - Q(s, a)]

Where:

  • s: Current state
  • a: Current action
  • r: Immediate reward received after taking action a in state s
  • s': Next state
  • a': Possible actions in the next state s'
  • α: Learning rate (alpha)
  • γ: Discount factor (gamma)
  • max_a' Q(s', a'): The maximum Q-value for the next state s' over all possible actions a'

Epsilon-Greedy Strategy

To balance exploration and exploitation, this implementation uses an epsilon-greedy policy. At each step, the agent chooses:

  • A random action with probability epsilon (exploration).
  • The action with the highest Q-value for the current state with probability (1 - epsilon) (exploitation).

The epsilon value typically starts high and decays over time, allowing the agent to explore more initially and then exploit its learned knowledge as training progresses.

🤝 Contributing

Contributions are welcome! If you have suggestions for improvements, bug fixes, or new features, please feel free to:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature).
  3. Make your changes.
  4. Commit your changes (git commit -m 'Add new feature').
  5. Push to the branch (git push origin feature/your-feature).
  6. Open a Pull Request.

🙏 Acknowledgments

  • OpenAI Gym: For providing the CliffWalking-v1 environment, a valuable tool for reinforcement learning research and education.
  • NumPy: Essential libraries for numerical operations in Python.

📞 Support & Contact


⭐ Star this repo if you find it helpful!

Made with ❤️ by Mayank Kumar Maurya

About

An implementation of the Q-Learning algorithm to solve the classic CliffWalking reinforcement learning environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors