Basic Usage#

Shimmy provides API compatibility tools to adapt popular external reinforcement learning environments to work with Gymnasium and PettingZoo.

Single-agent#

Single-agent Gymnasium environments can be loaded via gym.make():

import gymnasium as gym
env = gym.make("dm_control/acrobot-swingup_sparse-v0")

Run the environment:

observation, info = env.reset(seed=42)
for _ in range(1000):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()

Multi-agent#

Multi-agent PettingZoo environments can be loaded via Shimmy Compatibility wrappers.

AEC Environments#

Load the environment:

from shimmy import OpenSpielCompatibilityV0
env = OpenSpielCompatibilityV0(game_name="backgammon", render_mode="human")

Run the environment:

env.reset()
for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    if termination or truncation:
        action = None
    else:
        action = env.action_space(agent).sample(info["action_mask"])  # this is where you would insert your policy
    env.step(action)
    env.render()
env.close()

Parallel Environments#

Load the environment:

from shimmy import MeltingPotCompatibilityV0
env = MeltingPotCompatibilityV0(substrate_name="prisoners_dilemma_in_the_matrix__arena")

Run the environment:

observations = env.reset()
while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()

Conversion#

Environments loaded as ParallelEnv can be converted to AECEnv using parallel_to_aec.

Environments loaded as AECEnv can be converted to ParallelEnv using parallel_to_aec

  • Note: this conversion makes the following assumptions about the underlying environment:

    1. The environment steps in a cycle, i.e. it steps through every live agent in order.

    2. The environment does not update the observations of the agents except at the end of a cycle.

For more information, see PettingZoo Wrappers.