2020-05-21 12:05:06 +02:00
|
|
|
# mujoco-maze
|
2020-10-05 08:09:36 +02:00
|
|
|
[![Actions Status](https://github.com/kngwyu/mujoco-maze/workflows/CI/badge.svg)](https://github.com/kngwyu/mujoco-maze/actions)
|
2020-10-05 11:48:41 +02:00
|
|
|
[![PyPI version](https://img.shields.io/pypi/v/mujoco-maze?style=flat-square)](https://pypi.org/project/mujoco-maze/)
|
2020-10-05 07:07:13 +02:00
|
|
|
[![Black](https://img.shields.io/badge/code%20style-black-000.svg)](https://github.com/psf/black)
|
2020-05-21 12:05:06 +02:00
|
|
|
|
|
|
|
Some maze environments for reinforcement learning(RL) using [mujoco-py] and
|
|
|
|
[openai gym][gym].
|
|
|
|
|
2020-12-11 13:45:57 +01:00
|
|
|
Thankfully, this project is based on the code from [rllab] and
|
|
|
|
[tensorflow/models][models].
|
|
|
|
|
|
|
|
Note that [d4rl][d4rl] and [dm_control][dm_control] have similar maze
|
|
|
|
environment, and you can also check them.
|
|
|
|
But if you want more customizable or minimal one, I recommend this.
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
Importing `mujoco_maze` registers environments and environments listed
|
|
|
|
below are available via `gym.make`.
|
|
|
|
|
|
|
|
E.g.,
|
|
|
|
```python
|
|
|
|
import gym
|
|
|
|
import mujoco_maze # noqa
|
|
|
|
env = gym.make("Ant4Rooms-v0")
|
|
|
|
```
|
2020-06-04 07:34:00 +02:00
|
|
|
|
2020-08-03 18:45:22 +02:00
|
|
|
## Environments
|
|
|
|
|
|
|
|
- PointUMaze/AntUmaze
|
|
|
|
|
|
|
|
![PointUMaze](./screenshots/PointUMaze.png)
|
|
|
|
- PointUMaze-v0/AntUMaze-v0 (Distance-based Reward)
|
|
|
|
- PointUmaze-v1/AntUMaze-v1 (Goal-based Reward i.e., 1.0 or -ε)
|
|
|
|
|
|
|
|
- Point4Rooms/Ant4Rooms
|
|
|
|
|
|
|
|
![Point4Rooms](./screenshots/Point4Rooms.png)
|
|
|
|
- Point4Rooms-v0/Ant4Rooms-v0 (Distance-based Reward)
|
|
|
|
- Point4Rooms-v1/Ant4Rooms-v1 (Goal-based Reward)
|
|
|
|
- Point4Rooms-v2/Ant4Rooms-v2 (Multiple Goals (0.5 pt or 1.0 pt))
|
|
|
|
|
|
|
|
- PointPush/AntPush
|
|
|
|
|
2020-10-05 07:07:13 +02:00
|
|
|
![PointPush](./screenshots/AntPush.png)
|
2020-08-03 18:45:22 +02:00
|
|
|
- PointPush-v0/AntPush-v0 (Distance-based Reward)
|
|
|
|
- PointPush-v1/AntPush-v1 (Goal-based Reward)
|
|
|
|
|
|
|
|
- PointFall/AntFall
|
|
|
|
|
2020-10-05 07:07:13 +02:00
|
|
|
![PointFall](./screenshots/AntFall.png)
|
2020-08-03 18:45:22 +02:00
|
|
|
- PointFall-v0/AntFall-v0 (Distance-based Reward)
|
|
|
|
- PointFall-v1/AntFall-v1 (Goal-based Reward)
|
2020-05-21 12:05:06 +02:00
|
|
|
|
2020-10-05 07:07:13 +02:00
|
|
|
- PointBilliard
|
|
|
|
|
|
|
|
![PointBilliard](./screenshots/PointBilliard.png)
|
|
|
|
- PointBilliard-v0 (Distance-based Reward)
|
|
|
|
- PointBilliard-v1 (Goal-based Reward)
|
|
|
|
- PointBilliard-v2 (Multiple Goals (0.5 pt or 1.0 pt))
|
|
|
|
|
2020-12-11 13:45:57 +01:00
|
|
|
## Customize Environments
|
|
|
|
You can define your own task by using components in `maze_task.py`,
|
|
|
|
like:
|
|
|
|
|
|
|
|
```
|
|
|
|
import gym
|
|
|
|
import numpy as np
|
|
|
|
from mujoco_maze.maze_env_utils import MazeCell
|
|
|
|
from mujoco_maze.maze_task import MazeGoal, MazeTask
|
|
|
|
from mujoco_maze.point import PointEnv
|
|
|
|
|
|
|
|
|
|
|
|
class GoalRewardEMaze(MazeTask):
|
|
|
|
REWARD_THRESHOLD: float = 0.9
|
|
|
|
PENALTY: float = -0.0001
|
|
|
|
|
|
|
|
def __init__(self, scale):
|
|
|
|
super().__init__(scale)
|
|
|
|
self.goals = [MazeGoal(np.array([0.0, 4.0]) * scale)]
|
|
|
|
|
|
|
|
def reward(self, obs):
|
|
|
|
return 1.0 if self.termination(obs) else self.PENALTY
|
|
|
|
|
|
|
|
@staticmethod
|
|
|
|
def create_maze():
|
|
|
|
E, B, R = MazeCell.EMPTY, MazeCell.BLOCK, MazeCell.ROBOT
|
|
|
|
return [
|
|
|
|
[B, B, B, B, B],
|
|
|
|
[B, R, E, E, B],
|
|
|
|
[B, B, B, E, B],
|
|
|
|
[B, E, E, E, B],
|
|
|
|
[B, B, B, E, B],
|
|
|
|
[B, E, E, E, B],
|
|
|
|
[B, B, B, B, B],
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
gym.envs.register(
|
|
|
|
id="PointEMaze-v0",
|
|
|
|
entry_point="mujoco_maze.maze_env:MazeEnv",
|
|
|
|
kwargs=dict(
|
|
|
|
model_cls=PointEnv,
|
|
|
|
maze_task=GoalRewardEMaze,
|
|
|
|
maze_size_scaling=GoalRewardEMaze.MAZE_SIZE_SCALING.point,
|
|
|
|
inner_reward_scaling=GoalRewardEMaze.INNER_REWARD_SCALING,
|
|
|
|
)
|
|
|
|
)
|
|
|
|
```
|
|
|
|
You can also customize models. See `point.py` or so.
|
|
|
|
|
2020-09-28 14:44:14 +02:00
|
|
|
## Warning
|
2020-10-05 17:46:38 +02:00
|
|
|
This project has some other environments (e.g., reacher and swimmer)
|
2020-10-05 07:07:13 +02:00
|
|
|
but if they are not on README, they are work in progress and
|
2020-09-28 14:44:14 +02:00
|
|
|
not tested well.
|
2020-09-26 11:37:20 +02:00
|
|
|
|
2020-05-21 12:05:06 +02:00
|
|
|
## License
|
|
|
|
This project is licensed under Apache License, Version 2.0
|
|
|
|
([LICENSE-APACHE](LICENSE) or http://www.apache.org/licenses/LICENSE-2.0).
|
|
|
|
|
2020-12-11 13:45:57 +01:00
|
|
|
[d4rl]: https://github.com/rail-berkeley/d4rl
|
|
|
|
[dm_control]: https://github.com/deepmind/dm_control
|
2020-05-21 12:05:06 +02:00
|
|
|
[gym]: https://github.com/openai/gym
|
|
|
|
[models]: https://github.com/tensorflow/models/tree/master/research/efficient-hrl
|
|
|
|
[mujoco-py]: https://github.com/openai/mujoco-py
|
|
|
|
[rllab]: https://github.com/rll/rllab
|