mujoco_maze/README.md

149 lines
4.8 KiB
Markdown
Raw Normal View History

2020-05-21 12:05:06 +02:00
# mujoco-maze
2020-10-05 08:09:36 +02:00
[![Actions Status](https://github.com/kngwyu/mujoco-maze/workflows/CI/badge.svg)](https://github.com/kngwyu/mujoco-maze/actions)
2020-10-05 11:48:41 +02:00
[![PyPI version](https://img.shields.io/pypi/v/mujoco-maze?style=flat-square)](https://pypi.org/project/mujoco-maze/)
2020-10-05 07:07:13 +02:00
[![Black](https://img.shields.io/badge/code%20style-black-000.svg)](https://github.com/psf/black)
2020-05-21 12:05:06 +02:00
2021-03-17 04:49:03 +01:00
Some maze environments for reinforcement learning (RL) based on [mujoco-py]
and [openai gym][gym].
2020-05-21 12:05:06 +02:00
2020-12-11 13:45:57 +01:00
Thankfully, this project is based on the code from [rllab] and
[tensorflow/models][models].
2021-05-10 09:59:46 +02:00
Note that [d4rl] and [dm_control] have similar maze
2021-03-17 04:47:12 +01:00
environments, and you can also check them.
But, if you want more customizable or minimal one, I recommend this.
2020-12-11 13:45:57 +01:00
## Usage
2021-05-10 09:59:46 +02:00
Importing `mujoco_maze` registers environments and you can load
environments by `gym.make`.
All available environments listed are listed in [Environments] section.
2020-12-11 13:45:57 +01:00
2020-12-12 06:24:24 +01:00
E.g.,:
2020-12-11 13:45:57 +01:00
```python
import gym
import mujoco_maze # noqa
env = gym.make("Ant4Rooms-v0")
```
2020-06-04 07:34:00 +02:00
2020-08-03 18:45:22 +02:00
## Environments
2021-06-22 11:13:42 +02:00
- PointUMaze/AntUmaze/SwimmerUmaze
2020-08-03 18:45:22 +02:00
![PointUMaze](./screenshots/PointUMaze.png)
2021-06-22 11:13:42 +02:00
- PointUMaze-v0/AntUMaze-v0/SwimmerUMaze-v0 (Distance-based Reward)
- PointUmaze-v1/AntUMaze-v1/SwimmerUMaze-v (Goal-based Reward i.e., 1.0 or -ε)
2020-08-03 18:45:22 +02:00
2021-06-22 11:13:42 +02:00
- PointSquareRoom/AntSquareRoom/SwimmerSquareRoom
![SwimmerSquareRoom](./screenshots/SwimmerSquareRoom.png)
- PointSquareRoom-v0/AntSquareRoom-v0/SwimmerSquareRoom-v0 (Distance-based Reward)
- PointSquareRoom-v1/AntSquareRoom-v1/SwimmerSquareRoom-v1 (Goal-based Reward)
- PointSquareRoom-v2/AntSquareRoom-v2/SwimmerSquareRoom-v2 (No Reward)
- Point4Rooms/Ant4Rooms/Swimmer4Rooms
2020-08-03 18:45:22 +02:00
![Point4Rooms](./screenshots/Point4Rooms.png)
2021-06-22 11:13:42 +02:00
- Point4Rooms-v0/Ant4Rooms-v0/Swimmer4Rooms-v0 (Distance-based Reward)
- Point4Rooms-v1/Ant4Rooms-v1/Swimmer4Rooms-v1 (Goal-based Reward)
- Point4Rooms-v2/Ant4Rooms-v2/Swimmer4Rooms-v2 (Multiple Goals (0.5 pt or 1.0 pt))
- PointCorridor/AntCorridor/SwimmerCorridor
![PointCorridor](./screenshots/PointCorridor.png)
- PointCorridor-v0/AntCorridor-v0/SwimmerCorridor-v0 (Distance-based Reward)
- PointCorridor-v1/AntCorridor-v1/SwimmerCorridor-v1 (Goal-based Reward)
- PointCorridor-v2/AntCorridor-v2/SwimmerCorridor-v2 (No Reward)
2020-08-03 18:45:22 +02:00
- PointPush/AntPush
2020-10-05 07:07:13 +02:00
![PointPush](./screenshots/AntPush.png)
2020-08-03 18:45:22 +02:00
- PointPush-v0/AntPush-v0 (Distance-based Reward)
- PointPush-v1/AntPush-v1 (Goal-based Reward)
- PointFall/AntFall
2020-10-05 07:07:13 +02:00
![PointFall](./screenshots/AntFall.png)
2020-08-03 18:45:22 +02:00
- PointFall-v0/AntFall-v0 (Distance-based Reward)
- PointFall-v1/AntFall-v1 (Goal-based Reward)
2020-05-21 12:05:06 +02:00
2020-10-05 07:07:13 +02:00
- PointBilliard
![PointBilliard](./screenshots/PointBilliard.png)
- PointBilliard-v0 (Distance-based Reward)
- PointBilliard-v1 (Goal-based Reward)
- PointBilliard-v2 (Multiple Goals (0.5 pt or 1.0 pt))
2021-06-22 11:13:42 +02:00
- PointBilliard-v3 (Two goals (0.5 pt or 1.0 pt))
- PointBilliard-v4 (No Reward)
2020-10-05 07:07:13 +02:00
2020-12-11 13:45:57 +01:00
## Customize Environments
You can define your own task by using components in `maze_task.py`,
like:
2020-12-12 06:24:24 +01:00
```python
2020-12-11 13:45:57 +01:00
import gym
import numpy as np
from mujoco_maze.maze_env_utils import MazeCell
from mujoco_maze.maze_task import MazeGoal, MazeTask
from mujoco_maze.point import PointEnv
class GoalRewardEMaze(MazeTask):
REWARD_THRESHOLD: float = 0.9
PENALTY: float = -0.0001
def __init__(self, scale):
super().__init__(scale)
self.goals = [MazeGoal(np.array([0.0, 4.0]) * scale)]
def reward(self, obs):
return 1.0 if self.termination(obs) else self.PENALTY
@staticmethod
def create_maze():
E, B, R = MazeCell.EMPTY, MazeCell.BLOCK, MazeCell.ROBOT
return [
[B, B, B, B, B],
[B, R, E, E, B],
[B, B, B, E, B],
[B, E, E, E, B],
[B, B, B, E, B],
[B, E, E, E, B],
[B, B, B, B, B],
]
gym.envs.register(
id="PointEMaze-v0",
entry_point="mujoco_maze.maze_env:MazeEnv",
kwargs=dict(
model_cls=PointEnv,
maze_task=GoalRewardEMaze,
maze_size_scaling=GoalRewardEMaze.MAZE_SIZE_SCALING.point,
inner_reward_scaling=GoalRewardEMaze.INNER_REWARD_SCALING,
)
)
```
You can also customize models. See `point.py` or so.
2020-09-28 14:44:14 +02:00
## Warning
2021-06-22 11:13:42 +02:00
Reacher enviroments are not tested.
## [Experimental] Web-based visualizer
By passing a port like `gym.make("PointEMaze-v0", websock_port=7777)`,
one can use a web-based visualizer when calling `env.render()`.
![WebBasedVis](./screenshots/WebVis.png)
This feature is experimental and can produce some zombie proceses.
2020-09-26 11:37:20 +02:00
2020-05-21 12:05:06 +02:00
## License
This project is licensed under Apache License, Version 2.0
2021-05-10 09:59:46 +02:00
([LICENSE](LICENSE) or http://www.apache.org/licenses/LICENSE-2.0).
2020-05-21 12:05:06 +02:00
2020-12-11 13:45:57 +01:00
[d4rl]: https://github.com/rail-berkeley/d4rl
[dm_control]: https://github.com/deepmind/dm_control
2020-05-21 12:05:06 +02:00
[gym]: https://github.com/openai/gym
[models]: https://github.com/tensorflow/models/tree/master/research/efficient-hrl
[mujoco-py]: https://github.com/openai/mujoco-py
[rllab]: https://github.com/rll/rllab