Env in reinforcement learning

There are many libraries for reinforcement learning, the most popular one is Gym by OpenAI, which can provide single agent environments for RL. And there are also useful environment libraries like Petting Zoo.

Gym#

🔗 Original link: https://zhuanlan.zhihu.com/p/482821112
⏰ Save time: 2024-04-24 13:28:37 (UTC+8)
✂️ This document was generated by Feishu Save with one click

Gym#

When writing code based on reinforcement learning models, an important step is to write the code for interaction with the environment. Gym is an open-source library provided by OpenAI for reinforcement learning enthusiasts, used to develop and compare reinforcement learning algorithms. The characteristic of Gym is that it makes no assumptions about the Agent and is compatible with any numerical computation library, such as TensorFlow or Theano. Users can use Gym to create Gym Environments suitable for their own models.

Spaces#

When actually constructing reinforcement learning models, many parameters are needed to characterize the environment, and the data types, value ranges, default values, etc., of these parameters are all different. These different parameters need to be classified for better processing, and Gym uses the Spaces class to support these different data types.

CartPole Example#

A classic CartPole problem using Gym is shown below, which describes a scenario where a cart moves left and right on a surface to keep a lever from falling.

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Output

[-0.061586   -0.75893141  0.05793238  1.15547541]
[-0.07676463 -0.95475889  0.08104189  1.46574644]
[-0.0958598  -1.15077434  0.11035682  1.78260485]
[-0.11887529 -0.95705275  0.14600892  1.5261692 ]
[-0.13801635 -0.7639636   0.1765323   1.28239155]
[-0.15329562 -0.57147373  0.20218013  1.04977545]
Episode finished after 14 timesteps
[-0.02786724  0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191  0.26388759]
[-0.03161324  0.00474768 -0.03443415 -0.04105167]

Application of Spaces#

In the example above, we have been performing random sampling operations from the environment's action_space. But what exactly are these actions? Each environment comes with a Space that matches the required type for that environment, describing the format of actions and observations: action_space, observation_space. For example:

import gym
env = gym.make('CartPole-v0')
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(4,)

Where Discrete allows for a fixed range of non-negative numbers, so in this case, valid actions are 0 or 1.
The specific usage of this class is as follows:

class Discrete(Space[int]):
    r"""A discrete space in :math:`\{ 0, 1, \\dots, n-1 \}`.
    A start value can be optionally specified to shift the range
    to :math:`\{ a, a+1, \\dots, a+n-1 \}`.
    Example::
        >>> Discrete(2)            # {0, 1}
        >>> Discrete(3, start=-1)  # {-1, 0, 1}
    """

And Box describes an n-dimensional real number space Rn \mathbb{R}^n, which can specify upper and lower limits or not. The specific usage is as follows:

class Box(Space[np.ndarray]):
    """
    A (possibly unbounded) box in R^n. Specifically, a Box represents the
    Cartesian product of n closed intervals. Each interval has the form of one
    of [a, b], (-oo, b], [a, oo), or (-oo, oo).
    There are two common use cases:
    * Identical bound for each dimension::
        >>> Box(low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
        Box(3, 4)
    * Independent bound for each dimension::
        >>> Box(low=np.array([-1.0, -2.0]), high=np.array([2.0, 4.0]), dtype=np.float32)
        Box(2,)
    """
    def __init__(
        self,
        low: Union[SupportsFloat, np.ndarray],
        high: Union[SupportsFloat, np.ndarray],
        shape: Optional[Sequence[int]] = None,
        dtype: Type = np.float32,
        seed: Optional[int] = None,
    )

Summary: Box and Discrete are the two most commonly used classes in custom environments. In addition, the Spaces class contains many other classes, which will be discussed in the next section.

Other Types of Spaces#

In addition to Box and Discrete, Spaces also provides other types of data structures, all data structures are as follows:

__all__ = [
    "Space",
    "Box",
    "Discrete",
    "MultiDiscrete",
    "MultiBinary",
    "Tuple",
    "Dict",
    "flatdim",
    "flatten_space",
    "flatten",
    "unflatten",
]

Dict is a dictionary-type data structure that can embed different data structures. The specific usage is as follows:

class Dict(Space[TypingDict[str, Space]], Mapping):
    """
    A dictionary of simpler spaces.
    Example usage:
    self.observation_space = spaces.Dict({"position": spaces.Discrete(2), "velocity": spaces.Discrete(3)})
    Example usage [nested]:
    self.nested_observation_space = spaces.Dict({
        'sensors':  spaces.Dict({
            'position': spaces.Box(low=-100, high=100, shape=(3,)),
            'velocity': spaces.Box(low=-1, high=1, shape=(3,)),
            'front_cam': spaces.Tuple((
                spaces.Box(low=0, high=1, shape=(10, 10, 3)),
                spaces.Box(low=0, high=1, shape=(10, 10, 3))
            )),
            'rear_cam': spaces.Box(low=0, high=1, shape=(10, 10, 3)),
        }),
        'ext_controller': spaces.MultiDiscrete((5, 2, 2)),
        'inner_state':spaces.Dict({
            'charge': spaces.Discrete(100),
            'system_checks': spaces.MultiBinary(10),
            'job_status': spaces.Dict({
                'task': spaces.Discrete(5),
                'progress': spaces.Box(low=0, high=100, shape=()),
            })
        })
    })
    """

MultiBinary is a high-dimensional data structure that only contains 0 and 1, and its specific usage is as follows:

class MultiBinary(Space[np.ndarray]):
    """
    An n-shape binary space.
    The argument to MultiBinary defines n, which could be a number or a `list` of numbers.
    Example Usage:
    >> self.observation_space = spaces.MultiBinary(5)
    >> self.observation_space.sample()
        array([0, 1, 0, 1, 0], dtype=int8)
    >> self.observation_space = spaces.MultiBinary([3, 2])
    >> self.observation_space.sample()
        array([[0, 0],
               [0, 1],
               [1, 1]], dtype=int8)
    """

MultiDiscrete is similar to MultiBinary, but it allows for more integers to exist. The specific usage is as follows:

class MultiDiscrete(Space[np.ndarray]):
    """
    - The multi-discrete action space consists of a series of discrete action spaces with different number of actions in each
    - It is useful to represent game controllers or keyboards where each key can be represented as a discrete action space
    - It is parametrized by passing an array of positive integers specifying number of actions for each discrete action space
    Note: Some environment wrappers assume a value of 0 always represents the NOOP action.
    e.g. Nintendo Game Controller
    - Can be conceptualized as 3 discrete action spaces:
        1) Arrow Keys: Discrete 5  - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4]  - params: min: 0, max: 4
        2) Button A:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1
        3) Button B:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1
    - Can be initialized as
        MultiDiscrete([ 5, 2, 2 ])
    """
Tuple is similar to dict, and the specific usage is as follows:
```python
class Tuple(Space[tuple], Sequence):
    """
    A tuple (i.e., product) of simpler spaces
    Example usage:
    self.observation_space = spaces.Tuple((spaces.Discrete(2), spaces.Discrete(3)))
    """

References#

https://zhuanlan.zhihu.com/p/482821112