mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
Merge pull request #1078 from Azure/release_update/Release-59
update samples from Release-59 as a part of SDK release
This commit is contained in:
@@ -35,6 +35,7 @@ Using these samples, you will learn how to do the following.
|
|||||||
| [cartpole_sc.ipynb](cartpole-on-single-compute/cartpole_sc.ipynb) | Notebook to train a Cartpole playing agent on an Azure Machine Learning Compute Cluster (single node) |
|
| [cartpole_sc.ipynb](cartpole-on-single-compute/cartpole_sc.ipynb) | Notebook to train a Cartpole playing agent on an Azure Machine Learning Compute Cluster (single node) |
|
||||||
| [pong_rllib.ipynb](atari-on-distributed-compute/pong_rllib.ipynb) | Notebook for distributed training of Pong agent using RLlib on multiple compute targets |
|
| [pong_rllib.ipynb](atari-on-distributed-compute/pong_rllib.ipynb) | Notebook for distributed training of Pong agent using RLlib on multiple compute targets |
|
||||||
| [minecraft.ipynb](minecraft-on-distributed-compute/minecraft.ipynb) | Notebook to train an agent to navigate through a lava maze in the Minecraft game |
|
| [minecraft.ipynb](minecraft-on-distributed-compute/minecraft.ipynb) | Notebook to train an agent to navigate through a lava maze in the Minecraft game |
|
||||||
|
| [particle.ipynb](multiagent-particle-envs/particle.ipynb) | Notebook to train policies in a multiagent cooperative navigation scenario based on OpenAI's Particle environments |
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,60 @@
|
|||||||
|
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04
|
||||||
|
|
||||||
|
# Install some basic utilities
|
||||||
|
RUN apt-get update && apt-get install -y \
|
||||||
|
curl \
|
||||||
|
ca-certificates \
|
||||||
|
sudo \
|
||||||
|
cpio \
|
||||||
|
git \
|
||||||
|
bzip2 \
|
||||||
|
libx11-6 \
|
||||||
|
tmux \
|
||||||
|
htop \
|
||||||
|
gcc \
|
||||||
|
xvfb \
|
||||||
|
python-opengl \
|
||||||
|
x11-xserver-utils \
|
||||||
|
ffmpeg \
|
||||||
|
mesa-utils \
|
||||||
|
nano \
|
||||||
|
vim \
|
||||||
|
rsync \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Install python 3.7
|
||||||
|
RUN conda install python==3.7
|
||||||
|
|
||||||
|
# Create a working directory
|
||||||
|
RUN mkdir /app
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install required pip packages
|
||||||
|
RUN pip install --upgrade pip setuptools && pip install --upgrade \
|
||||||
|
pandas \
|
||||||
|
matplotlib \
|
||||||
|
psutil \
|
||||||
|
numpy \
|
||||||
|
scipy \
|
||||||
|
gym \
|
||||||
|
azureml-defaults \
|
||||||
|
tensorboardX \
|
||||||
|
tensorflow==1.15 \
|
||||||
|
tensorflow-probability==0.8.0 \
|
||||||
|
onnxruntime \
|
||||||
|
tf2onnx \
|
||||||
|
cloudpickle==1.2.0 \
|
||||||
|
tabulate \
|
||||||
|
dm_tree \
|
||||||
|
lz4 \
|
||||||
|
opencv-python \
|
||||||
|
ray==0.8.3 \
|
||||||
|
ray[rllib]==0.8.3 \
|
||||||
|
ray[tune]==0.8.3
|
||||||
|
|
||||||
|
# Install particle
|
||||||
|
RUN git clone https://github.com/openai/multiagent-particle-envs.git
|
||||||
|
COPY patch_files/* multiagent-particle-envs/multiagent/
|
||||||
|
RUN cd multiagent-particle-envs && \
|
||||||
|
pip install -e . && \
|
||||||
|
pip install --upgrade pyglet==1.3.2
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
# MIT License
|
||||||
|
|
||||||
|
# Copyright (c) 2018 OpenAI
|
||||||
|
|
||||||
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
# of this software and associated documentation files (the "Software"), to deal
|
||||||
|
# in the Software without restriction, including without limitation the rights
|
||||||
|
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
# copies of the Software, and to permit persons to whom the Software is
|
||||||
|
# furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
# The above copyright notice and this permission notice shall be included in all
|
||||||
|
# copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
# SOFTWARE.
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import gym
|
||||||
|
|
||||||
|
|
||||||
|
class MultiDiscrete(gym.Space):
|
||||||
|
"""
|
||||||
|
- The multi-discrete action space consists of a series of discrete action spaces with different
|
||||||
|
parameters
|
||||||
|
- It can be adapted to both a Discrete action space or a continuous (Box) action space
|
||||||
|
- It is useful to represent game controllers or keyboards where each key can be represented as
|
||||||
|
a discrete action space
|
||||||
|
- It is parametrized by passing an array of arrays containing [min, max] for each discrete action
|
||||||
|
space where the discrete action space can take any integers from `min` to `max` (both inclusive)
|
||||||
|
Note: A value of 0 always need to represent the NOOP action.
|
||||||
|
e.g. Nintendo Game Controller
|
||||||
|
- Can be conceptualized as 3 discrete action spaces:
|
||||||
|
1) Arrow Keys: Discrete 5 - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] - params: min: 0, max: 4
|
||||||
|
2) Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
|
||||||
|
3) Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
|
||||||
|
- Can be initialized as
|
||||||
|
MultiDiscrete([ [0,4], [0,1], [0,1] ])
|
||||||
|
"""
|
||||||
|
def __init__(self, array_of_param_array):
|
||||||
|
self.low = np.array([x[0] for x in array_of_param_array])
|
||||||
|
self.high = np.array([x[1] for x in array_of_param_array])
|
||||||
|
self.num_discrete_space = self.low.shape[0]
|
||||||
|
|
||||||
|
def sample(self):
|
||||||
|
""" Returns a array with one sample from each discrete action space """
|
||||||
|
# For each row: round(random .* (max - min) + min, 0)
|
||||||
|
# random_array = prng.np_random.rand(self.num_discrete_space)
|
||||||
|
random_array = np.random.RandomState().rand(self.num_discrete_space)
|
||||||
|
return [int(x) for x in np.floor(np.multiply((self.high - self.low + 1.), random_array) + self.low)]
|
||||||
|
|
||||||
|
def contains(self, x):
|
||||||
|
return len(x) == self.num_discrete_space \
|
||||||
|
and (np.array(x) >= self.low).all() \
|
||||||
|
and (np.array(x) <= self.high).all()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def shape(self):
|
||||||
|
return self.num_discrete_space
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return "MultiDiscrete" + str(self.num_discrete_space)
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
return np.array_equal(self.low, other.low) and np.array_equal(self.high, other.high)
|
||||||
@@ -0,0 +1,413 @@
|
|||||||
|
# MIT License
|
||||||
|
|
||||||
|
# Copyright (c) 2018 OpenAI
|
||||||
|
|
||||||
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
# of this software and associated documentation files (the "Software"), to deal
|
||||||
|
# in the Software without restriction, including without limitation the rights
|
||||||
|
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
# copies of the Software, and to permit persons to whom the Software is
|
||||||
|
# furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
# The above copyright notice and this permission notice shall be included in all
|
||||||
|
# copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
# SOFTWARE.
|
||||||
|
|
||||||
|
"""
|
||||||
|
2D rendering framework
|
||||||
|
"""
|
||||||
|
from __future__ import division
|
||||||
|
import os
|
||||||
|
import six
|
||||||
|
import sys
|
||||||
|
from gym import error
|
||||||
|
import math
|
||||||
|
import numpy as np
|
||||||
|
import pyglet
|
||||||
|
|
||||||
|
from pyglet.gl import glEnable, glHint, glLineWidth, glBlendFunc, glClearColor, glPushMatrix, \
|
||||||
|
glTranslatef, glRotatef, glScalef, glPopMatrix, glColor4f, glBegin, glVertex3f, glEnd, glLineStipple, \
|
||||||
|
glDisable, glVertex2f, GL_BLEND, GL_LINE_SMOOTH, GL_LINE_SMOOTH_HINT, GL_NICEST, GL_SRC_ALPHA, \
|
||||||
|
GL_ONE_MINUS_SRC_ALPHA, GL_LINE_STIPPLE, GL_POINTS, GL_QUADS, GL_TRIANGLES, GL_POLYGON, GL_LINE_LOOP, \
|
||||||
|
GL_LINE_STRIP, GL_LINES
|
||||||
|
|
||||||
|
|
||||||
|
if "Apple" in sys.version:
|
||||||
|
if 'DYLD_FALLBACK_LIBRARY_PATH' in os.environ:
|
||||||
|
os.environ['DYLD_FALLBACK_LIBRARY_PATH'] += ':/usr/lib'
|
||||||
|
# (JDS 2016/04/15): avoid bug on Anaconda 2.3.0 / Yosemite
|
||||||
|
|
||||||
|
|
||||||
|
RAD2DEG = 57.29577951308232
|
||||||
|
|
||||||
|
|
||||||
|
def get_display(spec):
|
||||||
|
"""Convert a display specification (such as :0) into an actual Display
|
||||||
|
object.
|
||||||
|
|
||||||
|
Pyglet only supports multiple Displays on Linux.
|
||||||
|
"""
|
||||||
|
if spec is None:
|
||||||
|
return None
|
||||||
|
elif isinstance(spec, six.string_types):
|
||||||
|
return pyglet.canvas.Display(spec)
|
||||||
|
else:
|
||||||
|
raise error.Error('Invalid display specification: {}. (Must be a string like :0 or None.)'.format(spec))
|
||||||
|
|
||||||
|
|
||||||
|
class Viewer(object):
|
||||||
|
def __init__(self, width, height, display=None):
|
||||||
|
display = get_display(display)
|
||||||
|
|
||||||
|
self.width = width
|
||||||
|
self.height = height
|
||||||
|
|
||||||
|
self.window = pyglet.window.Window(width=width, height=height, display=display)
|
||||||
|
self.window.on_close = self.window_closed_by_user
|
||||||
|
self.geoms = []
|
||||||
|
self.onetime_geoms = []
|
||||||
|
self.transform = Transform()
|
||||||
|
|
||||||
|
glEnable(GL_BLEND)
|
||||||
|
# glEnable(GL_MULTISAMPLE)
|
||||||
|
glEnable(GL_LINE_SMOOTH)
|
||||||
|
# glHint(GL_LINE_SMOOTH_HINT, GL_DONT_CARE)
|
||||||
|
glHint(GL_LINE_SMOOTH_HINT, GL_NICEST)
|
||||||
|
glLineWidth(2.0)
|
||||||
|
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
self.window.close()
|
||||||
|
|
||||||
|
def window_closed_by_user(self):
|
||||||
|
self.close()
|
||||||
|
|
||||||
|
def set_bounds(self, left, right, bottom, top):
|
||||||
|
assert right > left and top > bottom
|
||||||
|
scalex = self.width / (right - left)
|
||||||
|
scaley = self.height / (top - bottom)
|
||||||
|
self.transform = Transform(
|
||||||
|
translation=(-left * scalex, -bottom * scaley),
|
||||||
|
scale=(scalex, scaley))
|
||||||
|
|
||||||
|
def add_geom(self, geom):
|
||||||
|
self.geoms.append(geom)
|
||||||
|
|
||||||
|
def add_onetime(self, geom):
|
||||||
|
self.onetime_geoms.append(geom)
|
||||||
|
|
||||||
|
def render(self, return_rgb_array=False):
|
||||||
|
glClearColor(1, 1, 1, 1)
|
||||||
|
self.window.clear()
|
||||||
|
self.window.switch_to()
|
||||||
|
self.window.dispatch_events()
|
||||||
|
self.transform.enable()
|
||||||
|
for geom in self.geoms:
|
||||||
|
geom.render()
|
||||||
|
for geom in self.onetime_geoms:
|
||||||
|
geom.render()
|
||||||
|
self.transform.disable()
|
||||||
|
arr = None
|
||||||
|
if return_rgb_array:
|
||||||
|
buffer = pyglet.image.get_buffer_manager().get_color_buffer()
|
||||||
|
image_data = buffer.get_image_data()
|
||||||
|
arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
|
||||||
|
# In https://github.com/openai/gym-http-api/issues/2, we
|
||||||
|
# discovered that someone using Xmonad on Arch was having
|
||||||
|
# a window of size 598 x 398, though a 600 x 400 window
|
||||||
|
# was requested. (Guess Xmonad was preserving a pixel for
|
||||||
|
# the boundary.) So we use the buffer height/width rather
|
||||||
|
# than the requested one.
|
||||||
|
arr = arr.reshape(buffer.height, buffer.width, 4)
|
||||||
|
arr = arr[::-1, :, 0:3]
|
||||||
|
self.window.flip()
|
||||||
|
self.onetime_geoms = []
|
||||||
|
return arr
|
||||||
|
|
||||||
|
# Convenience
|
||||||
|
def draw_circle(self, radius=10, res=30, filled=True, **attrs):
|
||||||
|
geom = make_circle(radius=radius, res=res, filled=filled)
|
||||||
|
_add_attrs(geom, attrs)
|
||||||
|
self.add_onetime(geom)
|
||||||
|
return geom
|
||||||
|
|
||||||
|
def draw_polygon(self, v, filled=True, **attrs):
|
||||||
|
geom = make_polygon(v=v, filled=filled)
|
||||||
|
_add_attrs(geom, attrs)
|
||||||
|
self.add_onetime(geom)
|
||||||
|
return geom
|
||||||
|
|
||||||
|
def draw_polyline(self, v, **attrs):
|
||||||
|
geom = make_polyline(v=v)
|
||||||
|
_add_attrs(geom, attrs)
|
||||||
|
self.add_onetime(geom)
|
||||||
|
return geom
|
||||||
|
|
||||||
|
def draw_line(self, start, end, **attrs):
|
||||||
|
geom = Line(start, end)
|
||||||
|
_add_attrs(geom, attrs)
|
||||||
|
self.add_onetime(geom)
|
||||||
|
return geom
|
||||||
|
|
||||||
|
def get_array(self):
|
||||||
|
self.window.flip()
|
||||||
|
image_data = pyglet.image.get_buffer_manager().get_color_buffer().get_image_data()
|
||||||
|
self.window.flip()
|
||||||
|
arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
|
||||||
|
arr = arr.reshape(self.height, self.width, 4)
|
||||||
|
return arr[::-1, :, 0:3]
|
||||||
|
|
||||||
|
|
||||||
|
def _add_attrs(geom, attrs):
|
||||||
|
if "color" in attrs:
|
||||||
|
geom.set_color(*attrs["color"])
|
||||||
|
if "linewidth" in attrs:
|
||||||
|
geom.set_linewidth(attrs["linewidth"])
|
||||||
|
|
||||||
|
|
||||||
|
class Geom(object):
|
||||||
|
def __init__(self):
|
||||||
|
self._color = Color((0, 0, 0, 1.0))
|
||||||
|
self.attrs = [self._color]
|
||||||
|
|
||||||
|
def render(self):
|
||||||
|
for attr in reversed(self.attrs):
|
||||||
|
attr.enable()
|
||||||
|
self.render1()
|
||||||
|
for attr in self.attrs:
|
||||||
|
attr.disable()
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
def add_attr(self, attr):
|
||||||
|
self.attrs.append(attr)
|
||||||
|
|
||||||
|
def set_color(self, r, g, b, alpha=1):
|
||||||
|
self._color.vec4 = (r, g, b, alpha)
|
||||||
|
|
||||||
|
|
||||||
|
class Attr(object):
|
||||||
|
def enable(self):
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
def disable(self):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class Transform(Attr):
|
||||||
|
def __init__(self, translation=(0.0, 0.0), rotation=0.0, scale=(1, 1)):
|
||||||
|
self.set_translation(*translation)
|
||||||
|
self.set_rotation(rotation)
|
||||||
|
self.set_scale(*scale)
|
||||||
|
|
||||||
|
def enable(self):
|
||||||
|
glPushMatrix()
|
||||||
|
glTranslatef(self.translation[0], self.translation[1], 0) # translate to GL loc ppint
|
||||||
|
glRotatef(RAD2DEG * self.rotation, 0, 0, 1.0)
|
||||||
|
glScalef(self.scale[0], self.scale[1], 1)
|
||||||
|
|
||||||
|
def disable(self):
|
||||||
|
glPopMatrix()
|
||||||
|
|
||||||
|
def set_translation(self, newx, newy):
|
||||||
|
self.translation = (float(newx), float(newy))
|
||||||
|
|
||||||
|
def set_rotation(self, new):
|
||||||
|
self.rotation = float(new)
|
||||||
|
|
||||||
|
def set_scale(self, newx, newy):
|
||||||
|
self.scale = (float(newx), float(newy))
|
||||||
|
|
||||||
|
|
||||||
|
class Color(Attr):
|
||||||
|
def __init__(self, vec4):
|
||||||
|
self.vec4 = vec4
|
||||||
|
|
||||||
|
def enable(self):
|
||||||
|
glColor4f(*self.vec4)
|
||||||
|
|
||||||
|
|
||||||
|
class LineStyle(Attr):
|
||||||
|
def __init__(self, style):
|
||||||
|
self.style = style
|
||||||
|
|
||||||
|
def enable(self):
|
||||||
|
glEnable(GL_LINE_STIPPLE)
|
||||||
|
glLineStipple(1, self.style)
|
||||||
|
|
||||||
|
def disable(self):
|
||||||
|
glDisable(GL_LINE_STIPPLE)
|
||||||
|
|
||||||
|
|
||||||
|
class LineWidth(Attr):
|
||||||
|
def __init__(self, stroke):
|
||||||
|
self.stroke = stroke
|
||||||
|
|
||||||
|
def enable(self):
|
||||||
|
glLineWidth(self.stroke)
|
||||||
|
|
||||||
|
|
||||||
|
class Point(Geom):
|
||||||
|
def __init__(self):
|
||||||
|
Geom.__init__(self)
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
glBegin(GL_POINTS) # draw point
|
||||||
|
glVertex3f(0.0, 0.0, 0.0)
|
||||||
|
glEnd()
|
||||||
|
|
||||||
|
|
||||||
|
class FilledPolygon(Geom):
|
||||||
|
def __init__(self, v):
|
||||||
|
Geom.__init__(self)
|
||||||
|
self.v = v
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
if len(self.v) == 4:
|
||||||
|
glBegin(GL_QUADS)
|
||||||
|
elif len(self.v) > 4:
|
||||||
|
glBegin(GL_POLYGON)
|
||||||
|
else:
|
||||||
|
glBegin(GL_TRIANGLES)
|
||||||
|
for p in self.v:
|
||||||
|
glVertex3f(p[0], p[1], 0) # draw each vertex
|
||||||
|
glEnd()
|
||||||
|
|
||||||
|
color = (
|
||||||
|
self._color.vec4[0] * 0.5,
|
||||||
|
self._color.vec4[1] * 0.5,
|
||||||
|
self._color.vec4[2] * 0.5,
|
||||||
|
self._color.vec4[3] * 0.5)
|
||||||
|
glColor4f(*color)
|
||||||
|
glBegin(GL_LINE_LOOP)
|
||||||
|
for p in self.v:
|
||||||
|
glVertex3f(p[0], p[1], 0) # draw each vertex
|
||||||
|
glEnd()
|
||||||
|
|
||||||
|
|
||||||
|
def make_circle(radius=10, res=30, filled=True):
|
||||||
|
points = []
|
||||||
|
for i in range(res):
|
||||||
|
ang = 2 * math.pi * i / res
|
||||||
|
points.append((math.cos(ang) * radius, math.sin(ang) * radius))
|
||||||
|
if filled:
|
||||||
|
return FilledPolygon(points)
|
||||||
|
else:
|
||||||
|
return PolyLine(points, True)
|
||||||
|
|
||||||
|
|
||||||
|
def make_polygon(v, filled=True):
|
||||||
|
if filled:
|
||||||
|
return FilledPolygon(v)
|
||||||
|
else:
|
||||||
|
return PolyLine(v, True)
|
||||||
|
|
||||||
|
|
||||||
|
def make_polyline(v):
|
||||||
|
return PolyLine(v, False)
|
||||||
|
|
||||||
|
|
||||||
|
def make_capsule(length, width):
|
||||||
|
l, r, t, b = 0, length, width / 2, -width / 2
|
||||||
|
box = make_polygon([(l, b), (l, t), (r, t), (r, b)])
|
||||||
|
circ0 = make_circle(width / 2)
|
||||||
|
circ1 = make_circle(width / 2)
|
||||||
|
circ1.add_attr(Transform(translation=(length, 0)))
|
||||||
|
geom = Compound([box, circ0, circ1])
|
||||||
|
return geom
|
||||||
|
|
||||||
|
|
||||||
|
class Compound(Geom):
|
||||||
|
def __init__(self, gs):
|
||||||
|
Geom.__init__(self)
|
||||||
|
self.gs = gs
|
||||||
|
for g in self.gs:
|
||||||
|
g.attrs = [a for a in g.attrs if not isinstance(a, Color)]
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
for g in self.gs:
|
||||||
|
g.render()
|
||||||
|
|
||||||
|
|
||||||
|
class PolyLine(Geom):
|
||||||
|
def __init__(self, v, close):
|
||||||
|
Geom.__init__(self)
|
||||||
|
self.v = v
|
||||||
|
self.close = close
|
||||||
|
self.linewidth = LineWidth(1)
|
||||||
|
self.add_attr(self.linewidth)
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
glBegin(GL_LINE_LOOP if self.close else GL_LINE_STRIP)
|
||||||
|
for p in self.v:
|
||||||
|
glVertex3f(p[0], p[1], 0) # draw each vertex
|
||||||
|
glEnd()
|
||||||
|
|
||||||
|
def set_linewidth(self, x):
|
||||||
|
self.linewidth.stroke = x
|
||||||
|
|
||||||
|
|
||||||
|
class Line(Geom):
|
||||||
|
def __init__(self, start=(0.0, 0.0), end=(0.0, 0.0)):
|
||||||
|
Geom.__init__(self)
|
||||||
|
self.start = start
|
||||||
|
self.end = end
|
||||||
|
self.linewidth = LineWidth(1)
|
||||||
|
self.add_attr(self.linewidth)
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
glBegin(GL_LINES)
|
||||||
|
glVertex2f(*self.start)
|
||||||
|
glVertex2f(*self.end)
|
||||||
|
glEnd()
|
||||||
|
|
||||||
|
|
||||||
|
class Image(Geom):
|
||||||
|
def __init__(self, fname, width, height):
|
||||||
|
Geom.__init__(self)
|
||||||
|
self.width = width
|
||||||
|
self.height = height
|
||||||
|
img = pyglet.image.load(fname)
|
||||||
|
self.img = img
|
||||||
|
self.flip = False
|
||||||
|
|
||||||
|
def render1(self):
|
||||||
|
self.img.blit(-self.width / 2, -self.height / 2, width=self.width, height=self.height)
|
||||||
|
|
||||||
|
|
||||||
|
class SimpleImageViewer(object):
|
||||||
|
def __init__(self, display=None):
|
||||||
|
self.window = None
|
||||||
|
self.isopen = False
|
||||||
|
self.display = display
|
||||||
|
|
||||||
|
def imshow(self, arr):
|
||||||
|
if self.window is None:
|
||||||
|
height, width, channels = arr.shape
|
||||||
|
self.window = pyglet.window.Window(width=width, height=height, display=self.display)
|
||||||
|
self.width = width
|
||||||
|
self.height = height
|
||||||
|
self.isopen = True
|
||||||
|
assert arr.shape == (self.height, self.width, 3), "You passed in an image with the wrong number shape"
|
||||||
|
image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3)
|
||||||
|
self.window.clear()
|
||||||
|
self.window.switch_to()
|
||||||
|
self.window.dispatch_events()
|
||||||
|
image.blit(0, 0)
|
||||||
|
self.window.flip()
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
if self.isopen:
|
||||||
|
self.window.close()
|
||||||
|
self.isopen = False
|
||||||
|
|
||||||
|
def __del__(self):
|
||||||
|
self.close()
|
||||||
@@ -0,0 +1,124 @@
|
|||||||
|
import argparse
|
||||||
|
import re
|
||||||
|
import os
|
||||||
|
|
||||||
|
import ray
|
||||||
|
from ray.tune import run_experiments
|
||||||
|
from ray.tune.registry import register_trainable, register_env, get_trainable_cls
|
||||||
|
import ray.rllib.contrib.maddpg.maddpg as maddpg
|
||||||
|
|
||||||
|
from rllib_multiagent_particle_env import env_creator
|
||||||
|
from util import parse_args
|
||||||
|
|
||||||
|
|
||||||
|
def setup_ray():
|
||||||
|
ray.init(address='auto')
|
||||||
|
|
||||||
|
register_env('particle', env_creator)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_policy(args, env, id):
|
||||||
|
use_local_critic = [
|
||||||
|
args.adv_policy == 'ddpg' if id < args.num_adversaries else
|
||||||
|
args.good_policy == 'ddpg' for id in range(env.num_agents)
|
||||||
|
]
|
||||||
|
return (
|
||||||
|
None,
|
||||||
|
env.observation_space_dict[id],
|
||||||
|
env.action_space_dict[id],
|
||||||
|
{
|
||||||
|
'agent_id': id,
|
||||||
|
'use_local_critic': use_local_critic[id],
|
||||||
|
'obs_space_dict': env.observation_space_dict,
|
||||||
|
'act_space_dict': env.action_space_dict,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def gen_policies(args, env_config):
|
||||||
|
env = env_creator(env_config)
|
||||||
|
return {'policy_%d' % i: gen_policy(args, env, i) for i in range(len(env.observation_space_dict))}
|
||||||
|
|
||||||
|
|
||||||
|
def to_multiagent_config(policies):
|
||||||
|
policy_ids = list(policies.keys())
|
||||||
|
return {
|
||||||
|
'policies': policies,
|
||||||
|
'policy_mapping_fn': lambda index: policy_ids[index]
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def train(args, env_config):
|
||||||
|
def stop(trial_id, result):
|
||||||
|
max_train_time = int(os.environ.get('AML_MAX_TRAIN_TIME_SECONDS', 2 * 60 * 60))
|
||||||
|
|
||||||
|
return result['episode_reward_mean'] >= args.final_reward \
|
||||||
|
or result['time_total_s'] >= max_train_time
|
||||||
|
|
||||||
|
run_experiments({
|
||||||
|
'MADDPG_RLLib': {
|
||||||
|
'run': 'contrib/MADDPG',
|
||||||
|
'env': 'particle',
|
||||||
|
'stop': stop,
|
||||||
|
# Uncomment to enable more frequent checkpoints:
|
||||||
|
# 'checkpoint_freq': args.checkpoint_freq,
|
||||||
|
'checkpoint_at_end': True,
|
||||||
|
'local_dir': args.local_dir,
|
||||||
|
'restore': args.restore,
|
||||||
|
'config': {
|
||||||
|
# === Log ===
|
||||||
|
'log_level': 'ERROR',
|
||||||
|
|
||||||
|
# === Environment ===
|
||||||
|
'env_config': env_config,
|
||||||
|
'num_envs_per_worker': args.num_envs_per_worker,
|
||||||
|
'horizon': args.max_episode_len,
|
||||||
|
|
||||||
|
# === Policy Config ===
|
||||||
|
# --- Model ---
|
||||||
|
'good_policy': args.good_policy,
|
||||||
|
'adv_policy': args.adv_policy,
|
||||||
|
'actor_hiddens': [args.num_units] * 2,
|
||||||
|
'actor_hidden_activation': 'relu',
|
||||||
|
'critic_hiddens': [args.num_units] * 2,
|
||||||
|
'critic_hidden_activation': 'relu',
|
||||||
|
'n_step': args.n_step,
|
||||||
|
'gamma': args.gamma,
|
||||||
|
|
||||||
|
# --- Exploration ---
|
||||||
|
'tau': 0.01,
|
||||||
|
|
||||||
|
# --- Replay buffer ---
|
||||||
|
'buffer_size': int(1e6),
|
||||||
|
|
||||||
|
# --- Optimization ---
|
||||||
|
'actor_lr': args.lr,
|
||||||
|
'critic_lr': args.lr,
|
||||||
|
'learning_starts': args.train_batch_size * args.max_episode_len,
|
||||||
|
'sample_batch_size': args.sample_batch_size,
|
||||||
|
'train_batch_size': args.train_batch_size,
|
||||||
|
'batch_mode': 'truncate_episodes',
|
||||||
|
|
||||||
|
# --- Parallelism ---
|
||||||
|
'num_workers': args.num_workers,
|
||||||
|
'num_gpus': args.num_gpus,
|
||||||
|
'num_gpus_per_worker': 0,
|
||||||
|
|
||||||
|
# === Multi-agent setting ===
|
||||||
|
'multiagent': to_multiagent_config(gen_policies(args, env_config)),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}, verbose=1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
args = parse_args()
|
||||||
|
setup_ray()
|
||||||
|
|
||||||
|
env_config = {
|
||||||
|
'scenario_name': args.scenario,
|
||||||
|
'horizon': args.max_episode_len,
|
||||||
|
'video_frequency': args.checkpoint_freq,
|
||||||
|
}
|
||||||
|
|
||||||
|
train(args, env_config)
|
||||||
@@ -0,0 +1,113 @@
|
|||||||
|
# Some code taken from: https://github.com/wsjeon/maddpg-rllib/
|
||||||
|
|
||||||
|
import imp
|
||||||
|
import os
|
||||||
|
|
||||||
|
import gym
|
||||||
|
from gym import wrappers
|
||||||
|
from ray import rllib
|
||||||
|
|
||||||
|
from multiagent.environment import MultiAgentEnv
|
||||||
|
import multiagent.scenarios as scenarios
|
||||||
|
|
||||||
|
|
||||||
|
CUSTOM_SCENARIOS = ['simple_switch']
|
||||||
|
|
||||||
|
|
||||||
|
class ParticleEnvRenderWrapper(gym.Wrapper):
|
||||||
|
def __init__(self, env, horizon):
|
||||||
|
super().__init__(env)
|
||||||
|
self.horizon = horizon
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
self._num_steps = 0
|
||||||
|
|
||||||
|
return self.env.reset()
|
||||||
|
|
||||||
|
def render(self, mode):
|
||||||
|
if mode == 'human':
|
||||||
|
self.env.render(mode=mode)
|
||||||
|
else:
|
||||||
|
return self.env.render(mode=mode)[0]
|
||||||
|
|
||||||
|
def step(self, actions):
|
||||||
|
obs_list, rew_list, done_list, info_list = self.env.step(actions)
|
||||||
|
|
||||||
|
self._num_steps += 1
|
||||||
|
done = (all(done_list) or self._num_steps >= self.horizon)
|
||||||
|
|
||||||
|
# Gym monitor expects reward to be an int. This is only used for its
|
||||||
|
# stats reporter, which we're not interested in. To make video recording
|
||||||
|
# work, we package the rewards in the info object and extract it below.
|
||||||
|
return obs_list, 0, done, [rew_list, done_list, info_list]
|
||||||
|
|
||||||
|
|
||||||
|
class RLlibMultiAgentParticleEnv(rllib.MultiAgentEnv):
|
||||||
|
def __init__(self, scenario_name, horizon, monitor_enabled=False, video_frequency=500):
|
||||||
|
self._env = _make_env(scenario_name, horizon, monitor_enabled, video_frequency)
|
||||||
|
self.num_agents = self._env.n
|
||||||
|
self.agent_ids = list(range(self.num_agents))
|
||||||
|
|
||||||
|
self.observation_space_dict = self._make_dict(self._env.observation_space)
|
||||||
|
self.action_space_dict = self._make_dict(self._env.action_space)
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
obs_dict = self._make_dict(self._env.reset())
|
||||||
|
return obs_dict
|
||||||
|
|
||||||
|
def step(self, action_dict):
|
||||||
|
actions = list(action_dict.values())
|
||||||
|
obs_list, _, _, infos = self._env.step(actions)
|
||||||
|
rew_list, done_list, _ = infos
|
||||||
|
|
||||||
|
obs_dict = self._make_dict(obs_list)
|
||||||
|
rew_dict = self._make_dict(rew_list)
|
||||||
|
done_dict = self._make_dict(done_list)
|
||||||
|
done_dict['__all__'] = all(done_list)
|
||||||
|
info_dict = self._make_dict([{'done': done} for done in done_list])
|
||||||
|
|
||||||
|
return obs_dict, rew_dict, done_dict, info_dict
|
||||||
|
|
||||||
|
def render(self, mode='human'):
|
||||||
|
self._env.render(mode=mode)
|
||||||
|
|
||||||
|
def _make_dict(self, values):
|
||||||
|
return dict(zip(self.agent_ids, values))
|
||||||
|
|
||||||
|
|
||||||
|
def _video_callable(video_frequency):
|
||||||
|
def should_record_video(episode_id):
|
||||||
|
if episode_id % video_frequency == 0:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
return should_record_video
|
||||||
|
|
||||||
|
|
||||||
|
def _make_env(scenario_name, horizon, monitor_enabled, video_frequency):
|
||||||
|
if scenario_name in CUSTOM_SCENARIOS:
|
||||||
|
# Scenario file must exist locally
|
||||||
|
file_path = os.path.join(os.path.dirname(__file__), scenario_name + '.py')
|
||||||
|
scenario = imp.load_source('', file_path).Scenario()
|
||||||
|
else:
|
||||||
|
scenario = scenarios.load(scenario_name + '.py').Scenario()
|
||||||
|
|
||||||
|
world = scenario.make_world()
|
||||||
|
|
||||||
|
env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation)
|
||||||
|
env.metadata['video.frames_per_second'] = 8
|
||||||
|
|
||||||
|
env = ParticleEnvRenderWrapper(env, horizon)
|
||||||
|
|
||||||
|
if not monitor_enabled:
|
||||||
|
return env
|
||||||
|
|
||||||
|
return wrappers.Monitor(env, './logs/videos', resume=True, video_callable=_video_callable(video_frequency))
|
||||||
|
|
||||||
|
|
||||||
|
def env_creator(config):
|
||||||
|
monitor_enabled = False
|
||||||
|
if hasattr(config, 'worker_index') and hasattr(config, 'vector_index'):
|
||||||
|
monitor_enabled = (config.worker_index == 1 and config.vector_index == 0)
|
||||||
|
|
||||||
|
return RLlibMultiAgentParticleEnv(**config, monitor_enabled=monitor_enabled)
|
||||||
@@ -0,0 +1,358 @@
|
|||||||
|
import numpy as np
|
||||||
|
import random
|
||||||
|
|
||||||
|
from multiagent.core import World, Agent, Landmark
|
||||||
|
from multiagent.scenario import BaseScenario
|
||||||
|
|
||||||
|
|
||||||
|
class SwitchWorld(World):
|
||||||
|
""" Extended World with hills and switches """
|
||||||
|
def __init__(self, hills, switches):
|
||||||
|
super().__init__()
|
||||||
|
# add hills and switches
|
||||||
|
self.hills = hills
|
||||||
|
self.switches = switches
|
||||||
|
self.landmarks.extend(self.hills)
|
||||||
|
self.landmarks.extend(self.switches)
|
||||||
|
|
||||||
|
def step(self):
|
||||||
|
|
||||||
|
super().step()
|
||||||
|
|
||||||
|
# if all hills are activated, reset the switches and hills
|
||||||
|
if all([hill.active for hill in self.hills]):
|
||||||
|
self.reset_hills()
|
||||||
|
self.reset_switches()
|
||||||
|
else:
|
||||||
|
# Update switches
|
||||||
|
for switch in self.switches:
|
||||||
|
switch.step(self)
|
||||||
|
# Update hills
|
||||||
|
for hill in self.hills:
|
||||||
|
hill.step(self)
|
||||||
|
|
||||||
|
def reset_hills(self):
|
||||||
|
possible_hill_positions = [np.array([-0.8, 0]), np.array([0, 0.8]), np.array([0.8, 0]), np.array([0, -0.8])]
|
||||||
|
hill_positions = random.sample(possible_hill_positions, k=len(self.hills))
|
||||||
|
for i, hill in enumerate(self.hills):
|
||||||
|
hill.state.p_pos = hill_positions[i]
|
||||||
|
hill.deactivate()
|
||||||
|
|
||||||
|
def reset_switches(self):
|
||||||
|
possible_switch_positions = [
|
||||||
|
np.array([-0.8, -0.8]),
|
||||||
|
np.array([-0.8, 0.8]),
|
||||||
|
np.array([0.8, -0.8]),
|
||||||
|
np.array([0.8, 0.8])]
|
||||||
|
switch_positions = random.sample(possible_switch_positions, k=len(self.switches))
|
||||||
|
for i, switch in enumerate(self.switches):
|
||||||
|
switch.state.p_pos = switch_positions[i]
|
||||||
|
switch.deactivate()
|
||||||
|
|
||||||
|
|
||||||
|
class Scenario(BaseScenario):
|
||||||
|
def make_world(self):
|
||||||
|
|
||||||
|
# main configurations
|
||||||
|
num_agents = 2
|
||||||
|
num_hills = 2
|
||||||
|
num_switches = 1
|
||||||
|
self.max_episode_length = 100
|
||||||
|
|
||||||
|
# create hills (on edges)
|
||||||
|
possible_hill_positions = [np.array([-0.8, 0]), np.array([0, 0.8]), np.array([0.8, 0]), np.array([0, -0.8])]
|
||||||
|
hill_positions = random.sample(possible_hill_positions, k=num_hills)
|
||||||
|
hills = [Hill(hill_positions[i]) for i in range(num_hills)]
|
||||||
|
# create switches (in corners)
|
||||||
|
possible_switch_positions = [
|
||||||
|
np.array([-0.8, -0.8]),
|
||||||
|
np.array([-0.8, 0.8]),
|
||||||
|
np.array([0.8, -0.8]),
|
||||||
|
np.array([0.8, 0.8])]
|
||||||
|
switch_positions = random.sample(possible_switch_positions, k=num_switches)
|
||||||
|
switches = [Switch(switch_positions[i]) for i in range(num_switches)]
|
||||||
|
|
||||||
|
# make world and set basic properties
|
||||||
|
world = SwitchWorld(hills, switches)
|
||||||
|
world.dim_c = 2
|
||||||
|
world.collaborative = True
|
||||||
|
|
||||||
|
# add agents
|
||||||
|
world.agents = [Agent() for i in range(num_agents)]
|
||||||
|
for i, agent in enumerate(world.agents):
|
||||||
|
agent.name = 'agent %d' % i
|
||||||
|
agent.collide = True
|
||||||
|
agent.silent = True
|
||||||
|
agent.size = 0.1
|
||||||
|
agent.accel = 5.0
|
||||||
|
agent.max_speed = 5.0
|
||||||
|
if i == 0:
|
||||||
|
agent.color = np.array([0.35, 0.35, 0.85])
|
||||||
|
else:
|
||||||
|
agent.color = np.array([0.35, 0.85, 0.85])
|
||||||
|
|
||||||
|
# make initial conditions
|
||||||
|
self.reset_world(world)
|
||||||
|
|
||||||
|
return world
|
||||||
|
|
||||||
|
def reset_world(self, world):
|
||||||
|
# set random initial states
|
||||||
|
for agent in world.agents:
|
||||||
|
agent.state.p_pos = np.array([random.uniform(-1, +1) for _ in range(world.dim_p)])
|
||||||
|
agent.state.p_vel = np.zeros(world.dim_p)
|
||||||
|
agent.state.c = np.zeros(world.dim_c)
|
||||||
|
# set hills randomly
|
||||||
|
world.reset_hills()
|
||||||
|
# set switches randomly
|
||||||
|
world.reset_switches()
|
||||||
|
|
||||||
|
def is_collision(self, agent1, agent2):
|
||||||
|
delta_pos = agent1.state.p_pos - agent2.state.p_pos
|
||||||
|
dist = np.sqrt(np.sum(np.square(delta_pos)))
|
||||||
|
dist_min = agent1.size + agent2.size
|
||||||
|
return True if dist < dist_min else False
|
||||||
|
|
||||||
|
def reward(self, agent, world):
|
||||||
|
# Agents are rewarded based on number of landmarks activated
|
||||||
|
rew = 0
|
||||||
|
if all([h.active for h in world.hills]):
|
||||||
|
rew += 100
|
||||||
|
else:
|
||||||
|
# give bonus each time a hill is activated
|
||||||
|
for hill in world.hills:
|
||||||
|
if hill.activated_just_now:
|
||||||
|
rew += 50
|
||||||
|
# penalise timesteps where nothing is happening
|
||||||
|
if rew == 0:
|
||||||
|
rew -= 0.1
|
||||||
|
# add collision penalty
|
||||||
|
if agent.collide:
|
||||||
|
for a in world.agents:
|
||||||
|
# note: this also counts collision with "itself", so gives -1 at every timestep
|
||||||
|
# would be good to tune the reward function and use (not a == agent) here
|
||||||
|
if self.is_collision(a, agent):
|
||||||
|
rew -= 1
|
||||||
|
return rew
|
||||||
|
|
||||||
|
def observation(self, agent, world):
|
||||||
|
# get positions of all entities in this agent's reference frame
|
||||||
|
entity_pos = []
|
||||||
|
for entity in world.landmarks: # world.entities:
|
||||||
|
entity_pos.append(entity.state.p_pos - agent.state.p_pos)
|
||||||
|
# entity colors
|
||||||
|
entity_color = []
|
||||||
|
for entity in world.landmarks: # world.entities:
|
||||||
|
entity_color.append(entity.color)
|
||||||
|
# communication of all other agents
|
||||||
|
comm = []
|
||||||
|
other_pos = []
|
||||||
|
for other in world.agents:
|
||||||
|
if other is agent:
|
||||||
|
continue
|
||||||
|
comm.append(other.state.c)
|
||||||
|
other_pos.append(other.state.p_pos - agent.state.p_pos)
|
||||||
|
return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + comm)
|
||||||
|
|
||||||
|
|
||||||
|
class Hill(Landmark):
|
||||||
|
"""
|
||||||
|
A hill that can be captured by an agent.
|
||||||
|
To be captured, a team must occupy a hill for a fixed amount of time.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self,
|
||||||
|
pos=None,
|
||||||
|
size=0.08,
|
||||||
|
capture_time=2
|
||||||
|
):
|
||||||
|
|
||||||
|
# Initialize Landmark super class
|
||||||
|
super().__init__()
|
||||||
|
self.movable = False
|
||||||
|
self.collide = False
|
||||||
|
self.state.p_pos = pos
|
||||||
|
self.size = size
|
||||||
|
|
||||||
|
# Set static configurations
|
||||||
|
self.capture_time = capture_time
|
||||||
|
|
||||||
|
# Initialize all hills to be inactive
|
||||||
|
self.active = False
|
||||||
|
self.color = np.array([0.5, 0.5, 0.5])
|
||||||
|
self.capture_timer = 0
|
||||||
|
|
||||||
|
self.activated_just_now = False
|
||||||
|
|
||||||
|
def activate(self):
|
||||||
|
self.active = True
|
||||||
|
self.color = np.array([0.1, 0.1, 0.9])
|
||||||
|
|
||||||
|
def deactivate(self):
|
||||||
|
self.active = False
|
||||||
|
self.color = np.array([0.5, 0.5, 0.5])
|
||||||
|
|
||||||
|
def _is_occupied(self, agents):
|
||||||
|
# a hill is occupied if an agent stands on it
|
||||||
|
for agent in agents:
|
||||||
|
dist = np.sqrt(np.sum(np.square(agent.state.p_pos - self.state.p_pos)))
|
||||||
|
if dist < agent.size + self.size:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def step(self, world):
|
||||||
|
|
||||||
|
self.activated_just_now = False
|
||||||
|
|
||||||
|
# If hill isn't activated yet, check if an agent activates it
|
||||||
|
# if (not self.active) and (world.switch.is_active()):
|
||||||
|
if (not self.active):
|
||||||
|
|
||||||
|
# Check if an agent is on the hill and all switches are active
|
||||||
|
if (self._is_occupied(world.agents)) and all([switch.active for switch in world.switches]):
|
||||||
|
self.capture_timer += 1
|
||||||
|
|
||||||
|
# activate hill (this is irreversible)
|
||||||
|
if self.capture_timer > self.capture_time:
|
||||||
|
self.activate()
|
||||||
|
self.activated_just_now = True
|
||||||
|
|
||||||
|
# Reset capture timer if hill is not occupied
|
||||||
|
else:
|
||||||
|
self.capture_timer = 0
|
||||||
|
|
||||||
|
|
||||||
|
class Switch(Landmark):
|
||||||
|
"""
|
||||||
|
A switch that can be activated by an agent.
|
||||||
|
The agent has to stay on the switch for it to be active.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self,
|
||||||
|
pos=None,
|
||||||
|
size=0.03,
|
||||||
|
):
|
||||||
|
|
||||||
|
# Initialize Landmark super class
|
||||||
|
super().__init__()
|
||||||
|
self.movable = False
|
||||||
|
self.collide = False
|
||||||
|
self.state.p_pos = pos
|
||||||
|
self.size = size
|
||||||
|
|
||||||
|
# Initialize all hills to be inactive
|
||||||
|
self.active = False
|
||||||
|
self.color = np.array([0.8, 0.05, 0.3])
|
||||||
|
self.capture_timer = 0
|
||||||
|
|
||||||
|
def activate(self):
|
||||||
|
self.active = True
|
||||||
|
self.color = np.array([0.1, 0.9, 0.4])
|
||||||
|
|
||||||
|
def deactivate(self):
|
||||||
|
self.active = False
|
||||||
|
self.color = np.array([0.8, 0.05, 0.3])
|
||||||
|
|
||||||
|
def _is_occupied(self, agents):
|
||||||
|
# a switch is active if an agent stands on it
|
||||||
|
for agent in agents:
|
||||||
|
dist = np.sqrt(np.sum(np.square(agent.state.p_pos - self.state.p_pos)))
|
||||||
|
if dist < agent.size + self.size:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def step(self, world):
|
||||||
|
# check if an agent is on the switch and activate/deactive accordingly
|
||||||
|
if self._is_occupied(world.agents):
|
||||||
|
self.activate()
|
||||||
|
else:
|
||||||
|
self.deactivate()
|
||||||
|
|
||||||
|
|
||||||
|
class SwitchExpertPolicy():
|
||||||
|
"""
|
||||||
|
Hand-coded expert policy for the simple switch environment.
|
||||||
|
Types of possible experts:
|
||||||
|
- always go to the switch
|
||||||
|
- always go to the hills
|
||||||
|
"""
|
||||||
|
def __init__(self, dim_c, agent, world, expert_type=None, discrete_action_input=True):
|
||||||
|
|
||||||
|
self.dim_c = dim_c
|
||||||
|
self.discrete_action_input = discrete_action_input
|
||||||
|
# the agent we control and world we're in
|
||||||
|
self.agent = agent
|
||||||
|
self.world = world
|
||||||
|
|
||||||
|
if expert_type is None:
|
||||||
|
self.expert_type = random.choice(['switch', 'hill'])
|
||||||
|
else:
|
||||||
|
self.expert_type = expert_type
|
||||||
|
if self.expert_type == 'switch':
|
||||||
|
self.target_switch = self.select_inital_target_switch()
|
||||||
|
elif self.expert_type == 'hill':
|
||||||
|
self.target_hill = self.select_inital_target_hill()
|
||||||
|
else:
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
self.step_count = 0
|
||||||
|
|
||||||
|
def select_inital_target_switch(self):
|
||||||
|
return random.choice(self.world.switches)
|
||||||
|
|
||||||
|
def select_inital_target_hill(self):
|
||||||
|
return random.choice(self.world.hills)
|
||||||
|
|
||||||
|
def action(self):
|
||||||
|
|
||||||
|
# select a target!
|
||||||
|
if self.expert_type == 'switch':
|
||||||
|
# if agent is not already on a switch, choose target switch
|
||||||
|
if not any([switch._is_occupied([self.agent]) for switch in self.world.switches]):
|
||||||
|
# select a target switch if there's an inactive one
|
||||||
|
inactive_switches = [switch for switch in self.world.switches if not switch.active]
|
||||||
|
if len(inactive_switches) > 0 and (self.target_switch not in inactive_switches):
|
||||||
|
self.target_switch = random.choice(inactive_switches)
|
||||||
|
target = self.target_switch.state.p_pos
|
||||||
|
elif self.expert_type == 'hill':
|
||||||
|
# select a target hill if we haven't done so yet, or the current target switch is inactive
|
||||||
|
inactive_hills = [hill for hill in self.world.hills if not hill.active]
|
||||||
|
if len(inactive_hills) > 0 and (self.target_hill not in inactive_hills):
|
||||||
|
self.target_hill = random.choice(inactive_hills)
|
||||||
|
target = self.target_hill.state.p_pos
|
||||||
|
|
||||||
|
self.step_count += 1
|
||||||
|
|
||||||
|
impulse = np.clip(target - self.agent.state.p_pos, -self.agent.u_range, self.agent.u_range)
|
||||||
|
|
||||||
|
if self.discrete_action_input:
|
||||||
|
u_idx = np.argmax(np.abs(impulse))
|
||||||
|
if u_idx == 0 and impulse[u_idx] < 0:
|
||||||
|
u = 1
|
||||||
|
elif u_idx == 0 and impulse[u_idx] > 0:
|
||||||
|
u = 2
|
||||||
|
elif u_idx == 1 and impulse[u_idx] < 0:
|
||||||
|
u = 3
|
||||||
|
elif u_idx == 1 and impulse[u_idx] > 0:
|
||||||
|
u = 4
|
||||||
|
else:
|
||||||
|
u = 0
|
||||||
|
else:
|
||||||
|
u = np.zeros(5)
|
||||||
|
if (impulse[0] == impulse[1] == 0) \
|
||||||
|
or (self.step_count < self.burn_in) \
|
||||||
|
or (self.burn_step != 0 and self.step_count % self.burn_step != 0):
|
||||||
|
u[0] = 0.1
|
||||||
|
else:
|
||||||
|
pass
|
||||||
|
# u: noop (?), right, left, down, up
|
||||||
|
if impulse[0] > 0: # x-direction (- left/right + )
|
||||||
|
u[1] = impulse[0] # right
|
||||||
|
elif impulse[0] < 0:
|
||||||
|
u[2] = -impulse[0]
|
||||||
|
if impulse[1] > 0: # y-direction (- up/down + )
|
||||||
|
u[3] = impulse[1]
|
||||||
|
elif impulse[1] < 0:
|
||||||
|
u[4] = -impulse[1]
|
||||||
|
|
||||||
|
return u
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
|
||||||
|
from rllib_multiagent_particle_env import CUSTOM_SCENARIOS
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args():
|
||||||
|
parser = argparse.ArgumentParser('MADDPG with OpenAI MPE')
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
parser.add_argument('--scenario', type=str, default='simple',
|
||||||
|
choices=['simple', 'simple_speaker_listener',
|
||||||
|
'simple_crypto', 'simple_push',
|
||||||
|
'simple_tag', 'simple_spread', 'simple_adversary'
|
||||||
|
] + CUSTOM_SCENARIOS,
|
||||||
|
help='name of the scenario script')
|
||||||
|
parser.add_argument('--max-episode-len', type=int, default=25,
|
||||||
|
help='maximum episode length')
|
||||||
|
parser.add_argument('--num-episodes', type=int, default=60000,
|
||||||
|
help='number of episodes')
|
||||||
|
parser.add_argument('--num-adversaries', type=int, default=0,
|
||||||
|
help='number of adversaries')
|
||||||
|
parser.add_argument('--good-policy', type=str, default='maddpg',
|
||||||
|
help='policy for good agents')
|
||||||
|
parser.add_argument('--adv-policy', type=str, default='maddpg',
|
||||||
|
help='policy of adversaries')
|
||||||
|
|
||||||
|
# Core training parameters
|
||||||
|
parser.add_argument('--lr', type=float, default=1e-2,
|
||||||
|
help='learning rate for Adam optimizer')
|
||||||
|
parser.add_argument('--gamma', type=float, default=0.95,
|
||||||
|
help='discount factor')
|
||||||
|
# NOTE: 1 iteration = sample_batch_size * num_workers timesteps * num_envs_per_worker
|
||||||
|
parser.add_argument('--sample-batch-size', type=int, default=25,
|
||||||
|
help='number of data points sampled /update /worker')
|
||||||
|
parser.add_argument('--train-batch-size', type=int, default=1024,
|
||||||
|
help='number of data points /update')
|
||||||
|
parser.add_argument('--n-step', type=int, default=1,
|
||||||
|
help='length of multistep value backup')
|
||||||
|
parser.add_argument('--num-units', type=int, default=64,
|
||||||
|
help='number of units in the mlp')
|
||||||
|
parser.add_argument('--final-reward', type=int, default=-400,
|
||||||
|
help='final reward after which to stop training')
|
||||||
|
|
||||||
|
# Checkpoint
|
||||||
|
parser.add_argument('--checkpoint-freq', type=int, default=200,
|
||||||
|
help='save model once every time this many iterations are completed')
|
||||||
|
parser.add_argument('--local-dir', type=str, default='./logs',
|
||||||
|
help='path to save checkpoints')
|
||||||
|
parser.add_argument('--restore', type=str, default=None,
|
||||||
|
help='directory in which training state and model are loaded')
|
||||||
|
|
||||||
|
# Parallelism
|
||||||
|
parser.add_argument('--num-workers', type=int, default=1)
|
||||||
|
parser.add_argument('--num-envs-per-worker', type=int, default=4)
|
||||||
|
parser.add_argument('--num-gpus', type=int, default=0)
|
||||||
|
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def find_final_checkpoint(start_dir):
|
||||||
|
def find(pattern, path):
|
||||||
|
result = []
|
||||||
|
for root, _, files in os.walk(path):
|
||||||
|
for name in files:
|
||||||
|
if pattern.match(name):
|
||||||
|
result.append(os.path.join(root, name))
|
||||||
|
return result
|
||||||
|
|
||||||
|
cp_pattern = re.compile('.*checkpoint-\\d+$')
|
||||||
|
checkpoint_files = find(cp_pattern, start_dir)
|
||||||
|
|
||||||
|
checkpoint_numbers = []
|
||||||
|
for file in checkpoint_files:
|
||||||
|
checkpoint_numbers.append(int(file.split('-')[-1]))
|
||||||
|
|
||||||
|
final_checkpoint_number = max(checkpoint_numbers)
|
||||||
|
|
||||||
|
return next(
|
||||||
|
checkpoint_file for checkpoint_file in checkpoint_files
|
||||||
|
if checkpoint_file.endswith(str(final_checkpoint_number)))
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 350 KiB |
@@ -0,0 +1,526 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Reinforcement Learning in Azure Machine Learning - Training multiple agents on collaborative ParticleEnv tasks\n",
|
||||||
|
"\n",
|
||||||
|
"This tutorial will show you how to train policies in a multi-agent scenario.\n",
|
||||||
|
"We use OpenAI Gym's [Particle environments](https://github.com/openai/multiagent-particle-envs),\n",
|
||||||
|
"which model agents and landmarks in a two-dimensional world. Particle comes with\n",
|
||||||
|
"several predefined scenarios, both competitive and collaborative, and with or without communication.\n",
|
||||||
|
"\n",
|
||||||
|
"For this tutorial, we pick a cooperative navigation scenario where N agents are in a world with N\n",
|
||||||
|
"landmarks. The agents' goal is to cover all the landmarks without collisions,\n",
|
||||||
|
"so agents must learn to avoid each other (social distancing!). The video below shows training\n",
|
||||||
|
"results for N=3 agents/landmarks:\n",
|
||||||
|
"\n",
|
||||||
|
"<table style=\"width:50%\">\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th style=\"text-align: center;\">\n",
|
||||||
|
" <img src=\"./images/particle_simple_spread.gif\" alt=\"Particle video\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/>\n",
|
||||||
|
" </th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr style=\"text-align: center;\">\n",
|
||||||
|
" <th>Fig 1. Video of 3 agents covering 3 landmarks in a multiagent Particle scenario.</th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
"</table>\n",
|
||||||
|
"\n",
|
||||||
|
"The tutorial will cover the following steps:\n",
|
||||||
|
"- Initializing Azure Machine Learning resources for training\n",
|
||||||
|
"- Training policies in a multi-agent environment with Azure Machine Learning service\n",
|
||||||
|
"- Monitoring training progress\n",
|
||||||
|
"\n",
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"The user should have completed the Azure Machine Learning introductory tutorial. You will need to make sure that you have a valid subscription id, a resource group and a workspace. For detailed instructions see [Tutorial: Get started creating your first ML experiment](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup).\n",
|
||||||
|
"\n",
|
||||||
|
"Please ensure that you have a current version of IPython (>= 7.15) installed.\n",
|
||||||
|
"\n",
|
||||||
|
"While this is a standalone notebook, we highly recommend going over the introductory notebooks for RL first.\n",
|
||||||
|
"- Getting started:\n",
|
||||||
|
" - [RL using a compute instance with Azure Machine Learning](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb)\n",
|
||||||
|
" - [RL using Azure Machine Learning compute](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/cartpole-on-single-compute/cartpole_sc.ipynb)\n",
|
||||||
|
"- [Scaling RL training runs with Azure Machine Learning](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.ipynb)\n",
|
||||||
|
"\n",
|
||||||
|
"Advanced users might also be interested in [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.ipynb) demonstrating how to train a Minecraft RL agent in Azure Machine Learning.\n",
|
||||||
|
"\n",
|
||||||
|
"## Initialize resources\n",
|
||||||
|
"\n",
|
||||||
|
"All required Azure Machine Learning service resources for this tutorial can be set up from Jupyter. This includes:\n",
|
||||||
|
"\n",
|
||||||
|
"- Connecting to your existing Azure Machine Learning workspace.\n",
|
||||||
|
"- Creating an experiment to track runs.\n",
|
||||||
|
"- Creating remote compute targets for [Ray](https://docs.ray.io/en/latest/index.html).\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"### Azure Machine Learning SDK\n",
|
||||||
|
"\n",
|
||||||
|
"Display the Azure Machine Learning SDK version."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import azureml.core\n",
|
||||||
|
"print('Azure Machine Learning SDK Version: ', azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Connect to workspace\n",
|
||||||
|
"\n",
|
||||||
|
"Get a reference to an existing Azure Machine Learning workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.location, ws.resource_group, sep=' | ')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create an experiment\n",
|
||||||
|
"\n",
|
||||||
|
"Create an experiment to track the runs in your workspace. A\n",
|
||||||
|
"workspace can have multiple experiments and each experiment\n",
|
||||||
|
"can be used to track multiple runs (see [documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py)\n",
|
||||||
|
"for details)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Experiment\n",
|
||||||
|
"\n",
|
||||||
|
"exp = Experiment(workspace=ws, name='particle-multiagent')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create or attach an existing compute resource\n",
|
||||||
|
"\n",
|
||||||
|
"A compute target is a designated compute resource where you run your training script. For more information, see [What are compute targets in Azure Machine Learning service?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target).\n",
|
||||||
|
"\n",
|
||||||
|
"#### CPU target for Ray head\n",
|
||||||
|
"\n",
|
||||||
|
"In the experiment setup for this tutorial, the Ray head node will\n",
|
||||||
|
"run on a CPU node (D3 type). A maximum cluster size of 1 node is\n",
|
||||||
|
"therefore sufficient. If you wish to run multiple experiments in\n",
|
||||||
|
"parallel using the same CPU cluster, you may elect to increase this\n",
|
||||||
|
"number. The cluster will automatically scale down to 0 nodes when\n",
|
||||||
|
"no training jobs are scheduled (see min_nodes).\n",
|
||||||
|
"\n",
|
||||||
|
"The code below creates a compute cluster of D3 type nodes.\n",
|
||||||
|
"If the cluster with the specified name is already in your workspace\n",
|
||||||
|
"the code will skip the creation process.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note: Creation of a compute resource can take several minutes**"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"cpu_cluster_name = 'cpu-cl-d3'\n",
|
||||||
|
"\n",
|
||||||
|
"if cpu_cluster_name in ws.compute_targets:\n",
|
||||||
|
" cpu_cluster = ws.compute_targets[cpu_cluster_name]\n",
|
||||||
|
" if cpu_cluster and type(cpu_cluster) is AmlCompute:\n",
|
||||||
|
" if cpu_cluster.provisioning_state == 'Succeeded':\n",
|
||||||
|
" print('Found existing compute target for {}. Using it.'.format(cpu_cluster_name))\n",
|
||||||
|
" else: \n",
|
||||||
|
" raise Exception('Found existing compute target for {} '.format(cpu_cluster_name)\n",
|
||||||
|
" + 'but it is in state {}'.format(cpu_cluster.provisioning_state))\n",
|
||||||
|
"else:\n",
|
||||||
|
" print('Creating a new compute target for {}...'.format(cpu_cluster_name))\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(\n",
|
||||||
|
" vm_size='STANDARD_D3',\n",
|
||||||
|
" min_nodes=0, \n",
|
||||||
|
" max_nodes=1)\n",
|
||||||
|
"\n",
|
||||||
|
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, provisioning_config)\n",
|
||||||
|
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||||
|
" \n",
|
||||||
|
" print('Cluster created.')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Training the policies\n",
|
||||||
|
"\n",
|
||||||
|
"### Training environment\n",
|
||||||
|
"\n",
|
||||||
|
"This tutorial uses a custom docker image\n",
|
||||||
|
"with the necessary software installed. The [Environment](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)\n",
|
||||||
|
"class stores the configuration for the training environment. The\n",
|
||||||
|
"docker image is set via `env.docker.base_image`.\n",
|
||||||
|
"`user_managed_dependencies` is set so that\n",
|
||||||
|
"the preinstalled Python packages in the image are preserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Note that since we want to capture videos of the training runs requiring a display, we set the interpreter_path such that the Python process is started via **xvfb-run**."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"from azureml.core import Environment\n",
|
||||||
|
" \n",
|
||||||
|
"cpu_particle_env = Environment(name='particle-cpu')\n",
|
||||||
|
"\n",
|
||||||
|
"cpu_particle_env.docker.enabled = True\n",
|
||||||
|
"cpu_particle_env.docker.base_image = 'akdmsft/particle-cpu'\n",
|
||||||
|
"cpu_particle_env.python.interpreter_path = 'xvfb-run -s \"-screen 0 640x480x16 -ac +extension GLX +render\" python'\n",
|
||||||
|
"\n",
|
||||||
|
"max_train_time = os.environ.get('AML_MAX_TRAIN_TIME_SECONDS', 2 * 60 * 60)\n",
|
||||||
|
"cpu_particle_env.environment_variables['AML_MAX_TRAIN_TIME_SECONDS'] = str(max_train_time)\n",
|
||||||
|
"cpu_particle_env.python.user_managed_dependencies = True"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Training script\n",
|
||||||
|
"\n",
|
||||||
|
"This tutorial uses the multiagent algorithm [Multi-Agent Deep Deterministic Policy Gradient (MADDPG)](https://docs.ray.io/en/latest/rllib-algorithms.html?highlight=maddpg#multi-agent-deep-deterministic-policy-gradient-contrib-maddpg).\n",
|
||||||
|
"For training policies in a multiagent scenario, Ray's RLlib also\n",
|
||||||
|
"requires the `multiagent` configuration section to be specified. You\n",
|
||||||
|
"can find more information in the [common parameters](https://docs.ray.io/en/latest/rllib-training.html?highlight=multiagent#common-parameters)\n",
|
||||||
|
"documentation.\n",
|
||||||
|
"\n",
|
||||||
|
"For monitoring and understanding the training progress, one\n",
|
||||||
|
"of the training environments is wrapped in a [Gym monitor](https://github.com/openai/gym/blob/master/gym/wrappers/monitor.py)\n",
|
||||||
|
"which periodically captures videos - by default every 200 training\n",
|
||||||
|
"iterations.\n",
|
||||||
|
"\n",
|
||||||
|
"The stopping criteria are set such that the training run is\n",
|
||||||
|
"terminated after either a mean reward of -400 is observed, or\n",
|
||||||
|
"training has run for over 2 hours.\n",
|
||||||
|
"\n",
|
||||||
|
"### Submitting a training run\n",
|
||||||
|
"\n",
|
||||||
|
"Below, you create the training run using a `ReinforcementLearningEstimator`\n",
|
||||||
|
"object, which contains all the configuration parameters for this experiment:\n",
|
||||||
|
"\n",
|
||||||
|
"- `source_directory`: Contains the training script and helper files to be\n",
|
||||||
|
" copied onto the node.\n",
|
||||||
|
"- `entry_script`: The training script, described in more detail above.\n",
|
||||||
|
"- `script_params`: The command line arguments to pass to the entry script.\n",
|
||||||
|
"- `compute_target`: The compute target for training script execution.\n",
|
||||||
|
"- `environment`: The Azure Machine Learning environment definition for the node running the training.\n",
|
||||||
|
"- `max_run_duration_seconds`: The time after which to abort the run if it is still running.\n",
|
||||||
|
"\n",
|
||||||
|
"For more details, please take a look at the [online documentation](https://docs.microsoft.com/en-us/python/api/azureml-contrib-reinforcementlearning/?view=azure-ml-py)\n",
|
||||||
|
"for Azure Machine Learning service's reinforcement learning offering.\n",
|
||||||
|
"\n",
|
||||||
|
"Note that you can use the same notebook and scripts to experiment with\n",
|
||||||
|
"different Particle environments. You can find a list of supported\n",
|
||||||
|
"environments [here](https://github.com/openai/multiagent-particle-envs/tree/master#list-of-environments).\n",
|
||||||
|
"Simply change the `--scenario` parameter to a supported scenario.\n",
|
||||||
|
"\n",
|
||||||
|
"In order to get the best training results, you can also adjust the\n",
|
||||||
|
"`--final-reward` parameter to determine when to stop training. A greater\n",
|
||||||
|
"reward means longer running time, but improved results. By default,\n",
|
||||||
|
"the final reward will be -400, which should show good progress after\n",
|
||||||
|
"about one hour of run time.\n",
|
||||||
|
"\n",
|
||||||
|
"For this notebook, we use a single D3 nodes, giving us a total of 4 CPUs and\n",
|
||||||
|
"0 GPUs. One CPU is used by the MADDPG trainer, and an additional CPU is\n",
|
||||||
|
"consumed by the RLlib rollout worker. The other 2 CPUs are not used, though\n",
|
||||||
|
"smaller node types will run out of memory for this task.\n",
|
||||||
|
"\n",
|
||||||
|
"Lastly, the RunDetails widget displays information about the submitted RL\n",
|
||||||
|
"experiment, including a link to the Azure portal with more details."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.contrib.train.rl import ReinforcementLearningEstimator\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"\n",
|
||||||
|
"estimator = ReinforcementLearningEstimator(\n",
|
||||||
|
" source_directory='files',\n",
|
||||||
|
" entry_script='particle_train.py',\n",
|
||||||
|
" script_params={\n",
|
||||||
|
" '--scenario': 'simple_spread',\n",
|
||||||
|
" '--final-reward': -400\n",
|
||||||
|
" },\n",
|
||||||
|
" compute_target=cpu_cluster,\n",
|
||||||
|
" environment=cpu_particle_env,\n",
|
||||||
|
" max_run_duration_seconds=3 * 60 * 60\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"train_run = exp.submit(config=estimator)\n",
|
||||||
|
"\n",
|
||||||
|
"RunDetails(train_run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# If you wish to cancel the run before it completes, uncomment and execute:\n",
|
||||||
|
"#train_run.cancel()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Monitoring training progress\n",
|
||||||
|
"\n",
|
||||||
|
"### View the Tensorboard\n",
|
||||||
|
"\n",
|
||||||
|
"The Tensorboard can be displayed via the Azure Machine Learning\n",
|
||||||
|
"service's [Tensorboard API](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-tensorboard).\n",
|
||||||
|
"When running locally, please make sure to follow the instructions\n",
|
||||||
|
"in the link and install required packages. Running this cell will output a URL for the Tensorboard.\n",
|
||||||
|
"\n",
|
||||||
|
"Note that the training script sets the log directory when\n",
|
||||||
|
"starting RLlib via the local_dir parameter. ./logs will automatically\n",
|
||||||
|
"appear in the downloadable files for a run. Since this script is\n",
|
||||||
|
"executed on the Ray head node run, we need to get a reference to it\n",
|
||||||
|
"as shown below.\n",
|
||||||
|
"\n",
|
||||||
|
"The Tensorboard API will continuously stream logs from the run.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note: It may take a couple of minutes after the run is in \"Running\"\n",
|
||||||
|
"state before Tensorboard files are available and the board will refresh automatically**"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import time\n",
|
||||||
|
"from azureml.tensorboard import Tensorboard\n",
|
||||||
|
"\n",
|
||||||
|
"head_run = None\n",
|
||||||
|
"\n",
|
||||||
|
"timeout = 60\n",
|
||||||
|
"while timeout > 0 and head_run is None:\n",
|
||||||
|
" timeout -= 1\n",
|
||||||
|
" \n",
|
||||||
|
" try:\n",
|
||||||
|
" head_run = next(r for r in train_run.get_children() if r.id.endswith('head'))\n",
|
||||||
|
" except StopIteration:\n",
|
||||||
|
" time.sleep(1)\n",
|
||||||
|
"\n",
|
||||||
|
"tb = Tensorboard([head_run])\n",
|
||||||
|
"tb.start()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### View training videos\n",
|
||||||
|
"\n",
|
||||||
|
"As mentioned above, we record videos of the agents interacting with the\n",
|
||||||
|
"Particle world. These videos are often a crucial indicator for training\n",
|
||||||
|
"success. The code below downloads the latest video as it becomes available\n",
|
||||||
|
"and displays it in-line.\n",
|
||||||
|
"\n",
|
||||||
|
"Over time, the agents learn to cooperate and avoid collisions while\n",
|
||||||
|
"traveling to all landmarks.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note: It can take several minutes for a video to appear after the run\n",
|
||||||
|
"was started.**"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import tempfile\n",
|
||||||
|
"from azureml.core import Dataset\n",
|
||||||
|
"from azureml.data.dataset_error_handling import DatasetValidationError\n",
|
||||||
|
"\n",
|
||||||
|
"from IPython.display import clear_output\n",
|
||||||
|
"from IPython.core.display import display, Video\n",
|
||||||
|
"\n",
|
||||||
|
"datastore = ws.get_default_datastore()\n",
|
||||||
|
"path_prefix = './tmp_videos'\n",
|
||||||
|
"\n",
|
||||||
|
"def download_latest_training_video(run, video_checkpoint_counter):\n",
|
||||||
|
" run_artifacts_path = os.path.join('azureml', run.id)\n",
|
||||||
|
" \n",
|
||||||
|
" try:\n",
|
||||||
|
" run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))\n",
|
||||||
|
" except DatasetValidationError as e:\n",
|
||||||
|
" # This happens at the start of the run when there is no data available\n",
|
||||||
|
" # in the run's artifacts\n",
|
||||||
|
" return None, video_checkpoint_counter\n",
|
||||||
|
" \n",
|
||||||
|
" video_files = [file for file in run_artifacts_ds.to_path() if file.endswith('.mp4')]\n",
|
||||||
|
" if len(video_files) == video_checkpoint_counter:\n",
|
||||||
|
" return None, video_checkpoint_counter\n",
|
||||||
|
" \n",
|
||||||
|
" iteration_numbers = [int(vf[vf.rindex('video') + len('video') : vf.index('.mp4')]) for vf in video_files]\n",
|
||||||
|
" latest_video = next(vf for vf in video_files if vf.endswith('{num}.mp4'.format(num=max(iteration_numbers))))\n",
|
||||||
|
" latest_video = os.path.join(run_artifacts_path, os.path.normpath(latest_video[1:]))\n",
|
||||||
|
" \n",
|
||||||
|
" datastore.download(\n",
|
||||||
|
" target_path=path_prefix,\n",
|
||||||
|
" prefix=latest_video.replace('\\\\', '/'),\n",
|
||||||
|
" show_progress=False)\n",
|
||||||
|
" \n",
|
||||||
|
" return os.path.join(path_prefix, latest_video), len(video_files)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def render_video(vf):\n",
|
||||||
|
" clear_output(wait=True)\n",
|
||||||
|
" display(Video(data=vf, embed=True, html_attributes='loop autoplay width=50%'))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import shutil\n",
|
||||||
|
"\n",
|
||||||
|
"terminal_statuses = ['Canceled', 'Completed', 'Failed']\n",
|
||||||
|
"video_checkpoint_counter = 0\n",
|
||||||
|
"\n",
|
||||||
|
"while head_run.get_status() not in terminal_statuses:\n",
|
||||||
|
" video_file, video_checkpoint_counter = download_latest_training_video(head_run, video_checkpoint_counter)\n",
|
||||||
|
" if video_file is not None:\n",
|
||||||
|
" render_video(video_file)\n",
|
||||||
|
" \n",
|
||||||
|
" print('Displaying video number {}'.format(video_checkpoint_counter))\n",
|
||||||
|
" shutil.rmtree(path_prefix)\n",
|
||||||
|
" \n",
|
||||||
|
" # Interrupting the kernel can take up to 15 seconds\n",
|
||||||
|
" # depending on when time.sleep started\n",
|
||||||
|
" time.sleep(15)\n",
|
||||||
|
" \n",
|
||||||
|
"train_run.wait_for_completion()\n",
|
||||||
|
"print('The training run has reached a terminal status.')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Cleaning up\n",
|
||||||
|
"\n",
|
||||||
|
"Below, you can find code snippets for your convenience to clean up any resources created as part of this tutorial you don't wish to retain."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# to stop the Tensorboard, uncomment and run\n",
|
||||||
|
"#tb.stop()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# to delete the cpu compute target, uncomment and run\n",
|
||||||
|
"#cpu_cluster.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Next steps\n",
|
||||||
|
"\n",
|
||||||
|
"We would love to hear your feedback! Please let us know what you think of Reinforcement Learning in Azure Machine Learning and what features you are looking forward to."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "andress"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.7.0"
|
||||||
|
},
|
||||||
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00afLicensed under the MIT License.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00af "
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: particle
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-contrib-reinforcementlearning
|
||||||
|
- azureml-widgets
|
||||||
|
- tensorboard
|
||||||
|
- azureml-tensorboard
|
||||||
|
- ipython
|
||||||
@@ -58,7 +58,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Azure Machine Learning workspace\n",
|
"### Get Azure Machine Learning workspace\n",
|
||||||
"Get a reference to an existing Azure Machine Learning workspace. Please make sure to change `STANDARD_NC6` and `STANDARD_D2_V2` to [the ones available in your region](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=virtual-machines).\n"
|
"Get a reference to an existing Azure Machine Learning workspace.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
1
index.md
1
index.md
@@ -130,6 +130,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
|
|||||||
| [cartpole_ci](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb) | | | | | | |
|
| [cartpole_ci](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb) | | | | | | |
|
||||||
| [cartpole_sc](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/cartpole-on-single-compute/cartpole_sc.ipynb) | | | | | | |
|
| [cartpole_sc](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/cartpole-on-single-compute/cartpole_sc.ipynb) | | | | | | |
|
||||||
| [minecraft](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.ipynb) | | | | | | |
|
| [minecraft](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.ipynb) | | | | | | |
|
||||||
|
| [particle](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/multiagent-particle-envs/particle.ipynb) | | | | | | |
|
||||||
| [devenv_setup](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/setup/devenv_setup.ipynb) | | | | | | |
|
| [devenv_setup](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/reinforcement-learning/setup/devenv_setup.ipynb) | | | | | | |
|
||||||
| [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | None | None | None | None | None |
|
| [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | None | None | None | None | None |
|
||||||
| [distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb) | | | | | | |
|
| [distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb) | | | | | | |
|
||||||
|
|||||||
Reference in New Issue
Block a user