{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/multiagent-particle-envs/particle.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Reinforcement Learning in Azure Machine Learning - Training multiple agents on collaborative ParticleEnv tasks\n",
        "\n",
        "This tutorial will show you how to train policies in a multi-agent scenario.\n",
        "We use OpenAI Gym's [Particle environments](https://github.com/openai/multiagent-particle-envs),\n",
        "which model agents and landmarks in a two-dimensional world. Particle comes with\n",
        "several predefined scenarios, both competitive and collaborative, and with or without communication.\n",
        "\n",
        "For this tutorial, we pick a cooperative navigation scenario where N agents are in a world with N\n",
        "landmarks.  The agents' goal is to cover all the landmarks without collisions,\n",
        "so agents must learn to avoid each other (social distancing!).  The video below shows training\n",
        "results for N=3 agents/landmarks:\n",
        "\n",
        "<table style=\"width:50%\">\n",
        "  <tr>\n",
        "      <th style=\"text-align: center;\">\n",
        "          <img src=\"./images/particle_simple_spread.gif\" alt=\"Particle video\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/>\n",
        "      </th>\n",
        "  </tr>\n",
        "  <tr style=\"text-align: center;\">\n",
        "      <th>Fig 1. Video of 3 agents covering 3 landmarks in a multiagent Particle scenario.</th>\n",
        "  </tr>\n",
        "</table>\n",
        "\n",
        "The tutorial will cover the following steps:\n",
        "- Initializing Azure Machine Learning resources for training\n",
        "- Training policies in a multi-agent environment with Azure Machine Learning service\n",
        "- Monitoring training progress\n",
        "\n",
        "## Prerequisites\n",
        "\n",
        "The user should have completed the Azure Machine Learning introductory tutorial. You will need to make sure that you have a valid subscription id, a resource group and a workspace. For detailed instructions see [Tutorial: Get started creating your first ML experiment](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup).\n",
        "\n",
        "Please ensure that you have a current version of IPython (>= 7.15) installed.\n",
        "\n",
        "While this is a standalone notebook, we highly recommend going over the introductory notebooks for RL first.\n",
        "- Getting started:\n",
        "  - [RL using a compute instance with Azure Machine Learning](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb)\n",
        "  - [RL using Azure Machine Learning compute](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/cartpole-on-single-compute/cartpole_sc.ipynb)\n",
        "- [Scaling RL training runs with Azure Machine Learning](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.ipynb)\n",
        "\n",
        "Advanced users might also be interested in [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.ipynb) demonstrating how to train a Minecraft RL agent in Azure Machine Learning.\n",
        "\n",
        "## Initialize resources\n",
        "\n",
        "All required Azure Machine Learning service resources for this tutorial can be set up from Jupyter. This includes:\n",
        "\n",
        "- Connecting to your existing Azure Machine Learning workspace.\n",
        "- Creating an experiment to track runs.\n",
        "- Creating remote compute targets for [Ray](https://docs.ray.io/en/latest/index.html).\n",
        "\n",
        "\n",
        "### Azure Machine Learning SDK\n",
        "\n",
        "Display the Azure Machine Learning SDK version."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azureml.core\n",
        "print('Azure Machine Learning SDK Version: ', azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Connect to workspace\n",
        "\n",
        "Get a reference to an existing Azure Machine Learning workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.location, ws.resource_group, sep=' | ')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create an experiment\n",
        "\n",
        "Create an experiment to track the runs in your workspace. A\n",
        "workspace can have multiple experiments and each experiment\n",
        "can be used to track multiple runs (see [documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py)\n",
        "for details)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "\n",
        "exp = Experiment(workspace=ws, name='particle-multiagent')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create or attach an existing compute resource\n",
        "\n",
        "A compute target is a designated compute resource where you run your training script. For more information, see [What are compute targets in Azure Machine Learning service?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target).\n",
        "\n",
        "#### CPU target for Ray head\n",
        "\n",
        "In the experiment setup for this tutorial, the Ray head node will\n",
        "run on a CPU node (D3 type). A maximum cluster size of 1 node is\n",
        "therefore sufficient. If you wish to run multiple experiments in\n",
        "parallel using the same CPU cluster, you may elect to increase this\n",
        "number. The cluster will automatically scale down to 0 nodes when\n",
        "no training jobs are scheduled (see min_nodes).\n",
        "\n",
        "The code below creates a compute cluster of D3 type nodes.\n",
        "If the cluster with the specified name is already in your workspace\n",
        "the code will skip the creation process.\n",
        "\n",
        "**Note: Creation of a compute resource can take several minutes**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
        "\n",
        "cpu_cluster_name = 'cpu-cl-d3'\n",
        "\n",
        "if cpu_cluster_name in ws.compute_targets:\n",
        "    cpu_cluster = ws.compute_targets[cpu_cluster_name]\n",
        "    if cpu_cluster and type(cpu_cluster) is AmlCompute:\n",
        "        if cpu_cluster.provisioning_state == 'Succeeded':\n",
        "            print('Found existing compute target for {}. Using it.'.format(cpu_cluster_name))\n",
        "        else: \n",
        "            raise Exception('Found existing compute target for {} '.format(cpu_cluster_name)\n",
        "                            + 'but it is in state {}'.format(cpu_cluster.provisioning_state))\n",
        "else:\n",
        "    print('Creating a new compute target for {}...'.format(cpu_cluster_name))\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size='STANDARD_D3',\n",
        "        min_nodes=0, \n",
        "        max_nodes=1)\n",
        "\n",
        "    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, provisioning_config)\n",
        "    cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "    \n",
        "    print('Cluster created.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Training the policies\n",
        "\n",
        "### Training environment\n",
        "\n",
        "This tutorial uses a custom docker image\n",
        "with the necessary software installed. The [Environment](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)\n",
        "class stores the configuration for the training environment. The\n",
        "docker image is set via `env.docker.base_image`.\n",
        "`user_managed_dependencies` is set so that\n",
        "the preinstalled Python packages in the image are preserved.\n",
        "\n",
        "Note that since we want to capture videos of the training runs requiring a display, we set the interpreter_path such that the Python process is started via **xvfb-run**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "from azureml.core import Environment\n",
        "    \n",
        "cpu_particle_env = Environment(name='particle-cpu')\n",
        "\n",
        "cpu_particle_env.docker.enabled = True\n",
        "cpu_particle_env.docker.base_image = 'akdmsft/particle-cpu'\n",
        "cpu_particle_env.python.interpreter_path = 'xvfb-run -s \"-screen 0 640x480x16 -ac +extension GLX +render\" python'\n",
        "\n",
        "max_train_time = os.environ.get('AML_MAX_TRAIN_TIME_SECONDS', 2 * 60 * 60)\n",
        "cpu_particle_env.environment_variables['AML_MAX_TRAIN_TIME_SECONDS'] = str(max_train_time)\n",
        "cpu_particle_env.python.user_managed_dependencies = True"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Training script\n",
        "\n",
        "This tutorial uses the multiagent algorithm [Multi-Agent Deep Deterministic Policy Gradient (MADDPG)](https://docs.ray.io/en/latest/rllib-algorithms.html?highlight=maddpg#multi-agent-deep-deterministic-policy-gradient-contrib-maddpg).\n",
        "For training policies in a multiagent scenario, Ray's RLlib also\n",
        "requires the `multiagent` configuration section to be specified. You\n",
        "can find more information in the [common parameters](https://docs.ray.io/en/latest/rllib-training.html?highlight=multiagent#common-parameters)\n",
        "documentation.\n",
        "\n",
        "For monitoring and understanding the training progress, one\n",
        "of the training environments is wrapped in a [Gym monitor](https://github.com/openai/gym/blob/master/gym/wrappers/monitor.py)\n",
        "which periodically captures videos - by default every 200 training\n",
        "iterations.\n",
        "\n",
        "The stopping criteria are set such that the training run is\n",
        "terminated after either a mean reward of -400 is observed, or\n",
        "training has run for over 2 hours.\n",
        "\n",
        "### Submitting a training run\n",
        "\n",
        "Below, you create the training run using a `ReinforcementLearningEstimator`\n",
        "object, which contains all the configuration parameters for this experiment:\n",
        "\n",
        "- `source_directory`: Contains the training script and helper files to be\n",
        "  copied onto the node.\n",
        "- `entry_script`: The training script, described in more detail above.\n",
        "- `script_params`: The command line arguments to pass to the entry script.\n",
        "- `compute_target`: The compute target for training script execution.\n",
        "- `environment`: The Azure Machine Learning environment definition for the node running the training.\n",
        "- `max_run_duration_seconds`: The time after which to abort the run if it is still running.\n",
        "\n",
        "For more details, please take a look at the [online documentation](https://docs.microsoft.com/en-us/python/api/azureml-contrib-reinforcementlearning/?view=azure-ml-py)\n",
        "for Azure Machine Learning service's reinforcement learning offering.\n",
        "\n",
        "Note that you can use the same notebook and scripts to experiment with\n",
        "different Particle environments.  You can find a list of supported\n",
        "environments [here](https://github.com/openai/multiagent-particle-envs/tree/master#list-of-environments).\n",
        "Simply change the `--scenario` parameter to a supported scenario.\n",
        "\n",
        "In order to get the best training results, you can also adjust the\n",
        "`--final-reward` parameter to determine when to stop training. A greater\n",
        "reward means longer running time, but improved results. By default,\n",
        "the final reward will be -400, which should show good progress after\n",
        "about one hour of run time.\n",
        "\n",
        "For this notebook, we use a single D3 nodes, giving us a total of 4 CPUs and\n",
        "0 GPUs. One CPU is used by the MADDPG trainer, and an additional CPU is\n",
        "consumed by the RLlib rollout worker. The other 2 CPUs are not used, though\n",
        "smaller node types will run out of memory for this task.\n",
        "\n",
        "Lastly, the RunDetails widget displays information about the submitted RL\n",
        "experiment, including a link to the Azure portal with more details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.train.rl import ReinforcementLearningEstimator\n",
        "from azureml.widgets import RunDetails\n",
        "\n",
        "estimator = ReinforcementLearningEstimator(\n",
        "    source_directory='files',\n",
        "    entry_script='particle_train.py',\n",
        "    script_params={\n",
        "        '--scenario': 'simple_spread',\n",
        "        '--final-reward': -400\n",
        "    },\n",
        "    compute_target=cpu_cluster,\n",
        "    environment=cpu_particle_env,\n",
        "    max_run_duration_seconds=3 * 60 * 60\n",
        ")\n",
        "\n",
        "train_run = exp.submit(config=estimator)\n",
        "\n",
        "RunDetails(train_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If you wish to cancel the run before it completes, uncomment and execute:\n",
        "#train_run.cancel()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Monitoring training progress\n",
        "\n",
        "### View the Tensorboard\n",
        "\n",
        "The Tensorboard can be displayed via the Azure Machine Learning\n",
        "service's [Tensorboard API](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-tensorboard).\n",
        "When running locally, please make sure to follow the instructions\n",
        "in the link and install required packages. Running this cell will output a URL for the Tensorboard.\n",
        "\n",
        "Note that the training script sets the log directory when\n",
        "starting RLlib via the local_dir parameter. ./logs will automatically\n",
        "appear in the downloadable files for a run. Since this script is\n",
        "executed on the Ray head node run, we need to get a reference to it\n",
        "as shown below.\n",
        "\n",
        "The Tensorboard API will continuously stream logs from the run.\n",
        "\n",
        "**Note: It may take a couple of minutes after the run is in \"Running\"\n",
        "state before Tensorboard files are available and the board will refresh automatically**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import time\n",
        "from azureml.tensorboard import Tensorboard\n",
        "\n",
        "head_run = None\n",
        "\n",
        "timeout = 60\n",
        "while timeout > 0 and head_run is None:\n",
        "    timeout -= 1\n",
        "    \n",
        "    try:\n",
        "        head_run = next(r for r in train_run.get_children() if r.id.endswith('head'))\n",
        "    except StopIteration:\n",
        "        time.sleep(1)\n",
        "\n",
        "tb = Tensorboard([head_run])\n",
        "tb.start()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View training videos\n",
        "\n",
        "As mentioned above, we record videos of the agents interacting with the\n",
        "Particle world. These videos are often a crucial indicator for training\n",
        "success. The code below downloads the latest video as it becomes available\n",
        "and displays it in-line.\n",
        "\n",
        "Over time, the agents learn to cooperate and avoid collisions while\n",
        "traveling to all landmarks.\n",
        "\n",
        "**Note: It can take several minutes for a video to appear after the run\n",
        "was started.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import tempfile\n",
        "from azureml.core import Dataset\n",
        "from azureml.data.dataset_error_handling import DatasetValidationError\n",
        "\n",
        "from IPython.display import clear_output\n",
        "from IPython.core.display import display, Video\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "path_prefix = './tmp_videos'\n",
        "\n",
        "def download_latest_training_video(run, video_checkpoint_counter):\n",
        "    run_artifacts_path = os.path.join('azureml', run.id)\n",
        "    \n",
        "    try:\n",
        "        run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))\n",
        "    except DatasetValidationError as e:\n",
        "        # This happens at the start of the run when there is no data available\n",
        "        # in the run's artifacts\n",
        "        return None, video_checkpoint_counter\n",
        "    \n",
        "    video_files = [file for file in run_artifacts_ds.to_path() if file.endswith('.mp4')]\n",
        "    if len(video_files) == video_checkpoint_counter:\n",
        "        return None, video_checkpoint_counter\n",
        "    \n",
        "    iteration_numbers = [int(vf[vf.rindex('video') + len('video') : vf.index('.mp4')]) for vf in video_files]\n",
        "    latest_video = next(vf for vf in video_files if vf.endswith('{num}.mp4'.format(num=max(iteration_numbers))))\n",
        "    latest_video = os.path.join(run_artifacts_path, os.path.normpath(latest_video[1:]))\n",
        "    \n",
        "    datastore.download(\n",
        "        target_path=path_prefix,\n",
        "        prefix=latest_video.replace('\\\\', '/'),\n",
        "        show_progress=False)\n",
        "    \n",
        "    return os.path.join(path_prefix, latest_video), len(video_files)\n",
        "\n",
        "\n",
        "def render_video(vf):\n",
        "    clear_output(wait=True)\n",
        "    display(Video(data=vf, embed=True, html_attributes='loop autoplay width=50%'))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import shutil\n",
        "\n",
        "terminal_statuses = ['Canceled', 'Completed', 'Failed']\n",
        "video_checkpoint_counter = 0\n",
        "\n",
        "while head_run.get_status() not in terminal_statuses:\n",
        "    video_file, video_checkpoint_counter = download_latest_training_video(head_run, video_checkpoint_counter)\n",
        "    if video_file is not None:\n",
        "        render_video(video_file)\n",
        "        \n",
        "        print('Displaying video number {}'.format(video_checkpoint_counter))\n",
        "        shutil.rmtree(path_prefix)\n",
        "    \n",
        "    # Interrupting the kernel can take up to 15 seconds\n",
        "    # depending on when time.sleep started\n",
        "    time.sleep(15)\n",
        "    \n",
        "train_run.wait_for_completion()\n",
        "print('The training run has reached a terminal status.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Cleaning up\n",
        "\n",
        "Below, you can find code snippets for your convenience to clean up any resources created as part of this tutorial you don't wish to retain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to stop the Tensorboard, uncomment and run\n",
        "#tb.stop()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to delete the cpu compute target, uncomment and run\n",
        "#cpu_cluster.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "We would love to hear your feedback! Please let us know what you think of Reinforcement Learning in Azure Machine Learning and what features you are looking forward to."
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "andress"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.0"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00afLicensed under the MIT License.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00af "
  },
  "nbformat": 4,
  "nbformat_minor": 2
}