MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Azure ML Reinforcement Learning Sample - Pong problem\n",
        "Azure ML Reinforcement Learning (Azure ML RL) is a managed service for running distributed RL (reinforcement learning) simulation and training using the Ray framework.\n",
        "This example uses Ray RLlib to train a Pong playing agent on a multi-node cluster.\n",
        "\n",
        "## Pong problem\n",
        "[Pong](https://en.wikipedia.org/wiki/Pong) is a two-dimensional sports game that simulates table tennis. The player controls an in-game paddle by moving it vertically across the left or right side of the screen. They can compete against another player controlling a second paddle on the opposing side. Players use the paddles to hit a ball back and forth."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<table style=\"width:50%\">\n",
        "  <tr>\n",
        "      <th style=\"text-align: center;\"><img src=\"./images/pong.gif\" alt=\"Pong image\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/></th>\n",
        "  </tr>\n",
        "  <tr style=\"text-align: center;\">\n",
        "      <th>Fig 1. Pong game animation (from <a href=\"https://towardsdatascience.com/intro-to-reinforcement-learning-pong-92a94aa0f84d\">towardsdatascience.com</a>).</th>\n",
        "  </tr>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The goal here is to train an agent to win an episode of Pong game against opponent with the score of at least 18 points. An episode in Pong runs until one of the players reaches a score of 21. Episodes are a terminology that is used across all the [OpenAI gym](https://gym.openai.com/envs/Pong-v0/) environments that contains a strictly defined task.\n",
        "\n",
        "Training a Pong agent is a CPU intensive task and this example demonstrates the use of Azure ML RL service to train an agent faster in a distributed, parallel environment. You'll learn more about using the head and the worker compute targets to train an agent in this notebook below."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisite\n",
        "\n",
        "The user should have completed the [Azure ML Reinforcement Learning Sample - Setting Up Development Environment](../setup/devenv_setup.ipynb) to setup a virtual network. This virtual network will be used here for head and worker compute targets. It is highly recommended that the user should go through the [Azure ML Reinforcement Learning Sample - Cartpole Problem](../cartpole-on-single-compute/cartpole_cc.ipynb) to understand the basics of Azure ML RL and Ray RLlib used in this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Set up Development Environment\n",
        "The following subsections show typical steps to setup your development environment. Setup includes:\n",
        "\n",
        "* Connecting to a workspace to enable communication between your local machine and remote resources\n",
        "* Creating an experiment to track all your runs\n",
        "* Creating a remote head and worker compute target on a vnet to use for training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Azure Machine Learning SDK\n",
        "Display the Azure Machine Learning SDK version."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "\n",
        "# Azure ML core imports\n",
        "import azureml.core\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get Azure ML workspace\n",
        "Get a reference to an existing Azure ML workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create Azure ML experiment\n",
        "Create an experiment to track the runs in your workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.experiment import Experiment\n",
        "\n",
        "# Experiment name\n",
        "experiment_name = 'rllib-pong-multi-node'\n",
        "exp = Experiment(workspace=ws, name=experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Specify the name of your vnet\n",
        "\n",
        "The resource group you use must contain a vnet.  Specify the name of the vnet here created in the [Azure ML Reinforcement Learning Sample - Setting Up Development Environment](../setup/devenv_setup.ipynb)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Virtual network name\n",
        "vnet_name = 'your_vnet'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create head computing cluster\n",
        "\n",
        "In this example, we show how to set up separate compute clusters for the Ray head and Ray worker nodes. First we define the head cluster with GPU for the Ray head node. One CPU of the head node will be used for the Ray head process and the rest of the CPUs will be used by the Ray worker processes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
        "\n",
        "# Choose a name for the Ray head cluster\n",
        "head_compute_name = 'head-gpu'\n",
        "head_compute_min_nodes = 0\n",
        "head_compute_max_nodes = 2\n",
        "\n",
        "# This example uses GPU VM. For using CPU VM, set SKU to STANDARD_D2_V2\n",
        "head_vm_size = 'STANDARD_NC6'\n",
        "\n",
        "if head_compute_name in ws.compute_targets:\n",
        "    head_compute_target = ws.compute_targets[head_compute_name]\n",
        "    if head_compute_target and type(head_compute_target) is AmlCompute:\n",
        "        if head_compute_target.provisioning_state == 'Succeeded':\n",
        "            print('found head compute target. just use it', head_compute_name)\n",
        "        else: \n",
        "            raise Exception('found head compute target but it is in state', head_compute_target.provisioning_state)\n",
        "else:\n",
        "    print('creating a new head compute target...')\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(vm_size=head_vm_size,\n",
        "                                                                min_nodes=head_compute_min_nodes, \n",
        "                                                                max_nodes=head_compute_max_nodes,\n",
        "                                                                vnet_resourcegroup_name=ws.resource_group,\n",
        "                                                                vnet_name=vnet_name,\n",
        "                                                                subnet_name='default')\n",
        "\n",
        "    # Create the cluster\n",
        "    head_compute_target = ComputeTarget.create(ws, head_compute_name, provisioning_config)\n",
        "    \n",
        "    # Can poll for a minimum number of nodes and for a specific timeout. \n",
        "    # If no min node count is provided it will use the scale settings for the cluster\n",
        "    head_compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "    \n",
        "     # For a more detailed view of current AmlCompute status, use get_status()\n",
        "    print(head_compute_target.get_status().serialize())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create worker computing cluster\n",
        "\n",
        "Now we create a compute cluster with CPUs for the additional Ray worker nodes. CPUs in these worker nodes are used by Ray worker processes. Each Ray worker node may have multiple Ray worker processes depending on CPUs on the worker node. Ray can distribute multiple worker tasks on each worker node."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Choose a name for your Ray worker cluster\n",
        "worker_compute_name = 'worker-cpu'\n",
        "worker_compute_min_nodes = 0 \n",
        "worker_compute_max_nodes = 4\n",
        "\n",
        "# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n",
        "worker_vm_size = 'STANDARD_D2_V2'\n",
        "\n",
        "# Create the compute target if it hasn't been created already\n",
        "if worker_compute_name in ws.compute_targets:\n",
        "    worker_compute_target = ws.compute_targets[worker_compute_name]\n",
        "    if worker_compute_target and type(worker_compute_target) is AmlCompute:\n",
        "        if worker_compute_target.provisioning_state == 'Succeeded':\n",
        "            print('found worker compute target. just use it', worker_compute_name)\n",
        "        else: \n",
        "            raise Exception('found worker compute target but it is in state', head_compute_target.provisioning_state)\n",
        "else:\n",
        "    print('creating a new worker compute target...')\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(vm_size=worker_vm_size,\n",
        "                                                                min_nodes=worker_compute_min_nodes, \n",
        "                                                                max_nodes=worker_compute_max_nodes,\n",
        "                                                                vnet_resourcegroup_name=ws.resource_group,\n",
        "                                                                vnet_name=vnet_name,\n",
        "                                                                subnet_name='default')\n",
        "\n",
        "    # Create the cluster\n",
        "    worker_compute_target = ComputeTarget.create(ws, worker_compute_name, provisioning_config)\n",
        "    \n",
        "    # Can poll for a minimum number of nodes and for a specific timeout. \n",
        "    # If no min node count is provided it will use the scale settings for the cluster\n",
        "    worker_compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "    \n",
        "     # For a more detailed view of current AmlCompute status, use get_status()\n",
        "    print(worker_compute_target.get_status().serialize())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train Pong Agent Using Azure ML RL\n",
        "To facilitate reinforcement learning, Azure Machine Learning Python SDK provides a high level abstraction, the _ReinforcementLearningEstimator_ class, which allows users to easily construct RL run configurations for the underlying RL framework. Azure ML RL initially supports the [Ray framework](https://ray.io/) and its highly customizable [RLLib](https://ray.readthedocs.io/en/latest/rllib.html#rllib-scalable-reinforcement-learning). In this section we show how to use _ReinforcementLearningEstimator_ and Ray/RLLib framework to train a Pong playing agent.\n",
        "\n",
        "\n",
        "### Define worker configuration\n",
        "Define a `WorkerConfiguration` using your worker compute target. We also specify the number of nodes in the worker compute target to be used for training and additional PIP packages to install on those nodes as a part of setup.\n",
        "In this case, we define the PIP packages as dependencies for both head and worker nodes. With this setup, the game simulations will run directly on the worker compute nodes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.train.rl import WorkerConfiguration\n",
        "\n",
        "# Pip packages we will use for both head and worker\n",
        "pip_packages=[\"ray[rllib]==0.8.3\"] # Latest version of Ray has fixes for isses related to object transfers\n",
        "\n",
        "# Specify the Ray worker configuration\n",
        "worker_conf = WorkerConfiguration(\n",
        "    \n",
        "    # Azure ML compute cluster to run Ray workers\n",
        "    compute_target=worker_compute_target, \n",
        "    \n",
        "    # Number of worker nodes\n",
        "    node_count=4,\n",
        "    \n",
        "    # GPU\n",
        "    use_gpu=False, \n",
        "    \n",
        "    # PIP packages to use\n",
        "    pip_packages=pip_packages\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create reinforcement learning estimator\n",
        "\n",
        "The `ReinforcementLearningEstimator` is used to submit a job to Azure Machine Learning to start the Ray experiment run. We define the training script parameters here that will be passed to estimator. \n",
        "\n",
        "We specify `episode_reward_mean` to 18 as we want to stop the training as soon as the trained agent reaches an average win margin of at least 18 point over opponent over all episodes in the training epoch.\n",
        "Number of Ray worker processes are defined by parameter `num_workers`. We set it to 13 as we have 13 CPUs available in our compute targets. Multiple Ray worker processes parallelizes agent training and helps in achieving our goal faster. \n",
        "\n",
        "```\n",
        "Number of CPUs in head_compute_target = 6 CPUs in 1 node = 6\n",
        "Number of CPUs in worker_compute_target = 2 CPUs in each of 4 nodes = 8\n",
        "Number of CPUs available = (Number of CPUs in head_compute_target) + (Number of CPUs in worker_compute_target) - (1 CPU for head node) = 6 + 8 - 1 = 13\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.train.rl import ReinforcementLearningEstimator, Ray\n",
        "\n",
        "training_algorithm = \"IMPALA\"\n",
        "rl_environment = \"PongNoFrameskip-v4\"\n",
        "\n",
        "# Training script parameters\n",
        "script_params = {\n",
        "    \n",
        "    # Training algorithm, IMPALA in this case\n",
        "    \"--run\": training_algorithm,\n",
        "    \n",
        "    # Environment, Pong in this case\n",
        "    \"--env\": rl_environment,\n",
        "    \n",
        "    # Add additional single quotes at the both ends of string values as we have spaces in the \n",
        "    # string parameters, outermost quotes are not passed to scripts as they are not actually part of string\n",
        "    # Number of GPUs\n",
        "    # Number of ray workers\n",
        "    \"--config\": '\\'{\"num_gpus\": 1, \"num_workers\": 13}\\'',\n",
        "    \n",
        "    # Target episode reward mean to stop the training\n",
        "    # Total training time in seconds\n",
        "    \"--stop\": '\\'{\"episode_reward_mean\": 18, \"time_total_s\": 3600}\\'',\n",
        "}\n",
        "\n",
        "# RL estimator\n",
        "rl_estimator = ReinforcementLearningEstimator(\n",
        "    \n",
        "    # Location of source files\n",
        "    source_directory='files',\n",
        "    \n",
        "    # Python script file\n",
        "    entry_script=\"pong_rllib.py\",\n",
        "    \n",
        "    # Parameters to pass to the script file\n",
        "    # Defined above.\n",
        "    script_params=script_params,\n",
        "    \n",
        "    # The Azure ML compute target set up for Ray head nodes\n",
        "    compute_target=head_compute_target,\n",
        "    \n",
        "    # Pip packages\n",
        "    pip_packages=pip_packages,\n",
        "    \n",
        "    # GPU usage\n",
        "    use_gpu=True,\n",
        "    \n",
        "    # RL framework.  Currently must be Ray.\n",
        "    rl_framework=Ray(),\n",
        "    \n",
        "    # Ray worker configuration defined above.\n",
        "    worker_configuration=worker_conf,\n",
        "    \n",
        "    # How long to wait for whole cluster to start\n",
        "    cluster_coordination_timeout_seconds=3600,\n",
        "    \n",
        "    # Maximum time for the whole Ray job to run\n",
        "    # This will cut off the run after an hour\n",
        "    max_run_duration_seconds=3600,\n",
        "    \n",
        "    # Allow the docker container Ray runs in to make full use\n",
        "    # of the shared memory available from the host OS.\n",
        "    shm_size=24*1024*1024*1024\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Training script\n",
        "As recommended in [RLLib](https://ray.readthedocs.io/en/latest/rllib.html) documentations, we use Ray [Tune](https://ray.readthedocs.io/en/latest/tune.html) API to run training algorithm. All the RLLib built-in trainers are compatible with the Tune API. Here we use tune.run() to execute a built-in training algorithm. For convenience, down below you can see part of the entry script where we make this call.\n",
        "\n",
        "```python\n",
        "    tune.run(run_or_experiment=args.run,\n",
        "             config={\n",
        "                 \"env\": args.env,\n",
        "                 \"num_gpus\": args.config[\"num_gpus\"],\n",
        "                 \"num_workers\": args.config[\"num_workers\"],\n",
        "                 \"callbacks\": {\"on_train_result\": callbacks.on_train_result},\n",
        "                 \"sample_batch_size\": 50,\n",
        "                 \"train_batch_size\": 1000,\n",
        "                 \"num_sgd_iter\": 2,\n",
        "                 \"num_data_loader_buffers\": 2,\n",
        "                 \"model\": {\"dim\": 42},\n",
        "             },\n",
        "             stop=args.stop,\n",
        "             local_dir='./logs')\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit the estimator to start a run\n",
        "Now we use the rl_estimator configured above to submit a run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run = exp.submit(config=rl_estimator)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Monitor the run\n",
        "\n",
        "Azure ML provides a Jupyter widget to show the real-time status of an experiment run. You could use this widget to monitor the status of runs. The widget shows the list of two child runs, one for head compute target run and one for worker compute target run, as well. You can click on the link under Status to see the details of the child run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "\n",
        "RunDetails(run).show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Wait for the run to complete before proceeding. If you want to stop the run, you may skip this and move to next section below. \n",
        "\n",
        "**Note: the run may take anywhere from 30 minutes to 45 minutes to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Stop the run\n",
        "\n",
        "To cancel the run, call run.cancel()."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# run.cancel()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Performance of the agent during training\n",
        "\n",
        "Let's get the reward metrics for the training run agent and observe how the agent's rewards improved over the training iterations and how the agent learns to win the Pong game. \n",
        "\n",
        "Collect the episode reward metrics from the worker run's metrics. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Get all child runs\n",
        "child_runs = list(run.get_children(_rehydrate_runs=False))\n",
        "\n",
        "# Get the reward metrics from worker run\n",
        "if child_runs[0].id.endswith(\"_worker\"):\n",
        "    episode_reward_mean = child_runs[0].get_metrics(name='episode_reward_mean')\n",
        "else:\n",
        "    episode_reward_mean = child_runs[1].get_metrics(name='episode_reward_mean')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Plot the reward metrics. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "plt.plot(episode_reward_mean['episode_reward_mean'])\n",
        "plt.xlabel('training_iteration')\n",
        "plt.ylabel('episode_reward_mean')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We observe that during the training over multiple episodes, the agent learn to win the Pong game against opponent with our target of 18 points in each episode of 21 points.\n",
        "**Congratulations!! You have trained your Pong agent to win a game marvelously.**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Cleaning up\n",
        "For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# To archive the created experiment:\n",
        "#experiment.archive()\n",
        "\n",
        "# To delete the compute targets:\n",
        "#head_compute_target.delete()\n",
        "#worker_compute_target.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next\n",
        "In this example, you learnt how to solve distributed RL training problems using head and worker compute targets. This is currently the last introductory tutorial for Azure Machine Learning service's Reinforcement Learning offering. We would love to hear your feedback to build the features you need!"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "vineetg"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.4"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00e2\u20ac\u00afLicensed under the MIT License.\u00e2\u20ac\u00af "
  },
  "nbformat": 4,
  "nbformat_minor": 4
}