Skip to content
177 changes: 177 additions & 0 deletions machine_learning/q_learning.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
"""
Q-Learning is a widely-used model-free algorithm in reinforcement learning that

Check failure on line 2 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

machine_learning/q_learning.py:2:80: W291 Trailing whitespace
learns the optimal action-value function Q(s, a), which tells an agent the expected

Check failure on line 3 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

machine_learning/q_learning.py:3:84: W291 Trailing whitespace
utility of taking action a in state s and then following the optimal policy after.
It is able to find the best policy for any given finite Markov decision process (MDP)

Check failure on line 5 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W291)

machine_learning/q_learning.py:5:86: W291 Trailing whitespace
without requiring a model of the environment.

See: [https://en.wikipedia.org/wiki/Q-learning](https://en.wikipedia.org/wiki/Q-learning)
"""

from collections import defaultdict
import random

Check failure on line 12 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/q_learning.py:11:1: I001 Import block is un-sorted or un-formatted

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue came from the line type State = tuple[int, int], which uses the new PEP 695 type alias syntax introduced in Python 3.12. It works fine locally if you’re running Python 3.12 or newer, but the CI/CD environment (Render + Ruff parser) is running on an older Python version, probably 3.11 or below. Since older interpreters don’t understand the type keyword as a valid alias declaration, the parser throws a syntax error like ParserSyntaxError @ line 16. Basically, the parser just doesn’t know what to do with the type statement. To make it compatible across all environments, I tried to replace it with the old typing style alias:
from typing import Tuple
State = Tuple[int, int]
but ruff test is failing with this , so had to revert back .

IT STILL WORKS WITH PYTHON 3.12+

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cclauss hey can you please review if any changes are needed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last):
  File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse
    reports = lint_file(
              ^^^^^^^^^^
libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1.
parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, |

type State = tuple[int, int]
                           ^

# Hyperparameters for Q-Learning
LEARNING_RATE = 0.1
DISCOUNT_FACTOR = 0.97
EPSILON = 0.2
EPSILON_DECAY = 0.995
EPSILON_MIN = 0.01

# Global Q-table to store state-action values
q_table = defaultdict(lambda: defaultdict(float))

# Environment variables for simple grid world
SIZE = 4
GOAL = (SIZE - 1, SIZE - 1)
current_state = (0, 0)


def get_q_value(state, action):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: get_q_value. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: action

"""
Get Q-value for a given state-action pair.

>>> get_q_value((0, 0), 2)
0.0
"""
return q_table[state][action]


def get_best_action(state, available_actions):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: get_best_action. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: available_actions

"""
Get the action with maximum Q-value in the given state.

>>> q_table[(0, 0)][1] = 0.7
>>> q_table[(0, 0)][2] = 0.7
>>> q_table[(0, 0)][3] = 0.5
>>> get_best_action((0, 0), [1, 2, 3]) in [1, 2]
True
"""
if not available_actions:
raise ValueError("No available actions provided")
max_q = max(q_table[state][a] for a in available_actions)
best = [a for a in available_actions if q_table[state][a] == max_q]
return random.choice(best)


def choose_action(state, available_actions):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: choose_action. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: available_actions

"""
Choose action using epsilon-greedy policy.

>>> EPSILON = 0.0
>>> q_table[(0, 0)][1] = 1.0
>>> q_table[(0, 0)][2] = 0.5
>>> choose_action((0, 0), [1, 2])
1
"""
global EPSILON
if not available_actions:
raise ValueError("No available actions provided")
if random.random() < EPSILON:
return random.choice(available_actions)
return get_best_action(state, available_actions)


def update(state, action, reward, next_state, next_available_actions, done=False):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: update. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: action

Please provide type hint for the parameter: reward

Please provide type hint for the parameter: next_state

Please provide type hint for the parameter: next_available_actions

Please provide type hint for the parameter: done

"""
Perform Q-value update for a transition using the Q-learning rule.

Q(s,a) <- Q(s,a) + alpha * (r + gamma * max_a' Q(s',a') - Q(s,a))

>>> LEARNING_RATE = 0.5
>>> DISCOUNT_FACTOR = 0.9
>>> update((0,0), 1, 1.0, (0,1), [1,2], done=True)
>>> get_q_value((0,0), 1)
0.5
"""
global LEARNING_RATE, DISCOUNT_FACTOR
max_q_next = 0.0 if done or not next_available_actions else max(
get_q_value(next_state, a) for a in next_available_actions
)
old_q = get_q_value(state, action)
new_q = (1 - LEARNING_RATE) * old_q + LEARNING_RATE * (
reward + DISCOUNT_FACTOR * max_q_next
)
q_table[state][action] = new_q


def get_policy():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: get_policy. If the function does not return a value, please provide the type hint as: def function() -> None:

"""
Extract a deterministic policy from the Q-table.

>>> q_table[(1,2)][1] = 2.0
>>> q_table[(1,2)][2] = 1.0
>>> get_policy()[(1,2)]
1
"""
policy = {}
for s, a_dict in q_table.items():
if a_dict:
policy[s] = max(a_dict, key=a_dict.get)
return policy


def reset_env():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function reset_env

Please provide return type hint for the function: reset_env. If the function does not return a value, please provide the type hint as: def function() -> None:

"""
Reset the environment to initial state.
"""
global current_state
current_state = (0, 0)
return current_state


def get_available_actions_env():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function get_available_actions_env

Please provide return type hint for the function: get_available_actions_env. If the function does not return a value, please provide the type hint as: def function() -> None:

"""
Get available actions in the current environment state.
"""
return [0, 1, 2, 3]


def step_env(action):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function step_env

Please provide return type hint for the function: step_env. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: action

"""
Take a step in the environment with the given action.
"""
global current_state
x, y = current_state
if action == 0: # up
x = max(0, x - 1)
elif action == 1: # right
y = min(SIZE - 1, y + 1)
elif action == 2: # down
x = min(SIZE - 1, x + 1)
elif action == 3: # left
y = max(0, y - 1)
next_state = (x, y)
reward = 10.0 if next_state == GOAL else -1.0
done = next_state == GOAL
current_state = next_state
return next_state, reward, done


def run_q_learning():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function run_q_learning

Please provide return type hint for the function: run_q_learning. If the function does not return a value, please provide the type hint as: def function() -> None:

"""
Run Q-Learning on the simple grid world environment.
"""
global EPSILON
episodes = 200
for episode in range(episodes):

Check failure on line 157 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (B007)

machine_learning/q_learning.py:157:9: B007 Loop control variable `episode` not used within loop body
state = reset_env()
done = False
while not done:
actions = get_available_actions_env()
action = choose_action(state, actions)
next_state, reward, done = step_env(action)
next_actions = get_available_actions_env()
update(state, action, reward, next_state, next_actions, done)
state = next_state
EPSILON = max(EPSILON * EPSILON_DECAY, EPSILON_MIN)
policy = get_policy()
print("Learned Policy (state: action):")
for s, a in sorted(policy.items()):
print(f"{s}: {a}")


if __name__ == "__main__":
import doctest
doctest.testmod()
run_q_learning()

Check failure on line 177 in machine_learning/q_learning.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W292)

machine_learning/q_learning.py:177:21: W292 No newline at end of file
Loading