Thursday 28 September 2023

Model Based Reinforcement Learning example

 Model Based Reinforcement Learning example:

Source: https://www.google.com/url?sa=i&url=https%3A%2F%2Fmedium.com%2Fanalytics-vidhya%2Fmodel-based-offline-reinforcement-learning-morel-f5cd991d9fd5&psig=AOvVaw17zqlCJPf9ASKzbC2CkiVg&ust=1675731090336000&source=images&cd=vfe&ved=0CBEQjhxqFwoTCOCm6PTW__wCFQAAAAAdAAAAABAE


Example: Robot in a Maze


- Imagine a robot in a maze trying to find a treasure.


Experience:

- The robot explores the maze, moving around and gathering experience.

- It remembers which actions it took, where it went, and the rewards it received.


Model:

- The robot builds a "map" or model of the maze.

- This model includes information about where walls are, possible paths, and what might happen at each location.

- The model helps the robot understand the maze better.


Value Function:

- The robot keeps track of values for different states in the maze.

- These values represent how good it is to be in a particular state.

- For example, finding the treasure has a high value.


Policy:

- The robot uses its value function and model to create a "policy."

- A policy is like a set of rules that tell the robot which actions to take in different situations.

- It helps the robot decide where to go to maximize its rewards.


Tables:

- The robot maintains tables to store information.

- One table keeps track of its experiences.


Experience Table:

| State | Action | Next State | Reward |

|------- |-------- |------------|--------|

| Start | Move Up | Wall       | -1     |

| ...   | ...    | ...        | ...    |

| Treasure | Grab  | Exit       | +100   |


- Another table stores the value function, showing how good each state is.


Value Function Table:

| State | Value |

|------- |------- |

| Start | 0    |

| ...   | ...          |

| Treasure | 100 |


How It Works:


1. The robot starts in the maze, taking actions and learning from rewards.


2. It uses the experiences to update its model of the maze.


3. It calculates values for each state using its value function.


4. With the model and values, it creates a policy for making decisions.


5. The robot follows the policy to find the treasure efficiently.


In this example, model-based reinforcement learning helps the robot build a model of the maze, use it to make decisions, and find the treasure while keeping things simple.


No comments:

Post a Comment