MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Reinforcement Learning in Azure Machine Learning - Pong problem\n",
        "Reinforcement Learning in Azure Machine Learning is a managed service for running distributed reinforcement learning training and simulation using the open source Ray framework.\n",
        "This example uses Ray RLlib to train a Pong playing agent on a multi-node cluster.\n",
        "\n",
        "## Pong problem\n",
        "[Pong](https://en.wikipedia.org/wiki/Pong) is a two-dimensional sports game that simulates table tennis. The player controls an in-game paddle by moving it vertically across the left or right side of the screen. They can compete against another player controlling a second paddle on the opposing side. Players use the paddles to hit a ball back and forth."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<table style=\"width:50%\">\n",
        "  <tr>\n",
        "      <th style=\"text-align: center;\"><img src=\"./images/pong.gif\" alt=\"Pong image\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/></th>\n",
        "  </tr>\n",
        "  <tr style=\"text-align: center;\">\n",
        "      <th>Fig 1. Pong game animation (from <a href=\"https://towardsdatascience.com/intro-to-reinforcement-learning-pong-92a94aa0f84d\">towardsdatascience.com</a>).</th>\n",
        "  </tr>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The goal here is to train an agent to win an episode of Pong game against opponent with the score of at least 18 points. An episode in Pong runs until one of the players reaches a score of 21. Episodes are a terminology that is used across all the [OpenAI gym](https://gym.openai.com/envs/Pong-v0/) environments that contains a strictly defined task.\n",
        "\n",
        "Training a Pong agent is a compute-intensive task and this example demonstrates the use of Reinforcement Learning in Azure Machine Learning service to train an agent faster in a distributed, parallel environment. You'll learn more about using the head and the worker compute targets to train an agent in this notebook below."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisite\n",
        "\n",
        "It is highly recommended that the user should go through the [Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Single Compute](../cartpole-on-single-compute/cartpole_sc.ipynb) to understand the basics of Reinforcement Learning in Azure Machine Learning and Ray RLlib used in this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Set up Development Environment\n",
        "The following subsections show typical steps to setup your development environment. Setup includes:\n",
        "\n",
        "* Connecting to a workspace to enable communication between your local machine and remote resources\n",
        "* Creating an experiment to track all your runs\n",
        "* Setting up a virtual network\n",
        "* Creating remote head and worker compute target on a virtual network to use for training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Azure Machine Learning SDK\n",
        "Display the Azure Machine Learning SDK version."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646081765827
        }
      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "\n",
        "# Azure Machine Learning Core imports\n",
        "import azureml.core\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"Azure Machine Learning SDK Version: \", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get Azure Machine Learning workspace\n",
        "Get a reference to an existing Azure Machine Learning workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646081772340
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create Azure Machine Learning experiment\n",
        "Create an experiment to track the runs in your workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646081775643
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core.experiment import Experiment\n",
        "\n",
        "# Experiment name\n",
        "experiment_name = 'rllib-pong-multi-node'\n",
        "exp = Experiment(workspace=ws, name=experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create compute targets\n",
        "\n",
        "In this example, we show how to set up separate compute targets for the Ray nodes.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Create head compute target\n",
        "\n",
        "First we define the head cluster with GPU for the Ray head node. One CPU of the head node will be used for the Ray head process and the rest of the CPUs will be used by the Ray worker processes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646086081229
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
        "\n",
        "# Choose a name for the Ray cluster\n",
        "compute_name = 'compute-gpu'\n",
        "compute_min_nodes = 0\n",
        "compute_max_nodes = 2\n",
        "\n",
        "# This example uses GPU VM. For using CPU VM, set SKU to STANDARD_D2_V2\n",
        "vm_size = 'STANDARD_NC6'\n",
        "\n",
        "if compute_name in ws.compute_targets:\n",
        "    compute_target = ws.compute_targets[compute_name]\n",
        "    if compute_target and type(compute_target) is AmlCompute:\n",
        "        if compute_target.provisioning_state == 'Succeeded':\n",
        "            print('found compute target. just use it', compute_name)\n",
        "        else: \n",
        "            raise Exception(\n",
        "                'found compute target but it is in state', compute_target.provisioning_state)\n",
        "else:\n",
        "    print('creating a new compute target...')\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=vm_size,\n",
        "        min_nodes=compute_min_nodes, \n",
        "        max_nodes=compute_max_nodes,\n",
        "    )\n",
        "\n",
        "    # Create the cluster\n",
        "    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
        "    \n",
        "    # Can poll for a minimum number of nodes and for a specific timeout. \n",
        "    # If no min node count is provided it will use the scale settings for the cluster\n",
        "    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "    \n",
        "    # For a more detailed view of current AmlCompute status, use get_status()\n",
        "    print(compute_target.get_status().serialize())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646093795069
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import Environment\n",
        "import os\n",
        "\n",
        "ray_environment_name = 'pong-cpu'\n",
        "ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'docker', 'Dockerfile-cpu')\n",
        "\n",
        "# Build CPU image\n",
        "ray_cpu_env = Environment. \\\n",
        "    from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
        "    register(workspace=ws)\n",
        "ray_cpu_build_details = ray_cpu_env.build(workspace=ws)\n",
        "\n",
        "ray_cpu_build_details.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646160884910
        },
        "jupyter": {
          "outputs_hidden": true,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import Environment\n",
        "\n",
        "ray_environment_name = 'pong-gpu'\n",
        "ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'docker', 'Dockerfile-gpu')\n",
        "\n",
        "# Build GPU image\n",
        "ray_gpu_env = Environment. \\\n",
        "    from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
        "    register(workspace=ws)\n",
        "ray_gpu_build_details = ray_gpu_env.build(workspace=ws)\n",
        "\n",
        "ray_gpu_build_details.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create reinforcement learning training run\n",
        "\n",
        "The code below submits the training run using a `ScriptRunConfig`. By providing the\n",
        "command to run the training, and a `RunConfig` object configured with your\n",
        "compute target, number of nodes, and environment image to use.\n",
        "\n",
        "We specify `episode_reward_mean` to 18 as we want to stop the training as soon as the trained agent reaches an average win margin of at least 18 point over opponent over all episodes in the training epoch.\n",
        "Number of Ray worker processes are defined by parameter `num_workers`. We set it to 13 as we have 13 CPUs available in our compute targets. Multiple Ray worker processes parallelizes agent training and helps in achieving our goal faster. \n",
        "\n",
        "```\n",
        "Number of CPUs in head_compute_target = 6 CPUs in 1 node = 6\n",
        "Number of CPUs in worker_compute_target = 2 CPUs in each of 4 nodes = 8\n",
        "Number of CPUs available = (Number of CPUs in head_compute_target) + (Number of CPUs in worker_compute_target) - (1 CPU for head node) = 6 + 8 - 1 = 13\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1646162435310
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import RunConfiguration, ScriptRunConfig, Experiment\n",
        "from azureml.core.runconfig import DockerConfiguration, RunConfiguration\n",
        "\n",
        "experiment_name = 'rllib-pong-multi-node'\n",
        "\n",
        "experiment = Experiment(workspace=ws, name=experiment_name)\n",
        "ray_environment = Environment.get(workspace=ws, name=ray_environment_name)\n",
        "\n",
        "aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
        "aml_run_config_ml.target = compute_target\n",
        "aml_run_config_ml.docker = DockerConfiguration(use_docker=True)\n",
        "aml_run_config_ml.node_count = 2\n",
        "aml_run_config_ml.environment = ray_environment\n",
        "\n",
        "training_algorithm = \"IMPALA\"\n",
        "rl_environment = \"PongNoFrameskip-v4\"\n",
        "script_name='pong_rllib.py'\n",
        "\n",
        "command=[\n",
        "    'python', script_name,\n",
        "    '--run', training_algorithm,\n",
        "    '--env', rl_environment,\n",
        "    '--config', '\\'{\"num_gpus\": 1, \"num_workers\": 11}\\'',\n",
        "    '--stop', '\\'{\"episode_reward_mean\": 18, \"time_total_s\": 3600}\\''\n",
        "]\n",
        "\n",
        "config = ScriptRunConfig(source_directory='./files',\n",
        "                    command=command,\n",
        "                    run_config = aml_run_config_ml\n",
        "                   )\n",
        "training_run = experiment.submit(config)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Training script\n",
        "As recommended in [RLlib](https://ray.readthedocs.io/en/latest/rllib.html) documentations, we use Ray [Tune](https://ray.readthedocs.io/en/latest/tune.html) API to run the training algorithm. All the RLlib built-in trainers are compatible with the Tune API. Here we use tune.run() to execute a built-in training algorithm. For convenience, down below you can see part of the entry script where we make this call.\n",
        "\n",
        "```python\n",
        "    tune.run(\n",
        "        run_or_experiment=args.run,\n",
        "        config={\n",
        "            \"env\": args.env,\n",
        "            \"num_gpus\": args.config[\"num_gpus\"],\n",
        "            \"num_workers\": args.config[\"num_workers\"],\n",
        "            \"callbacks\": {\"on_train_result\": callbacks.on_train_result},\n",
        "            \"sample_batch_size\": 50,\n",
        "            \"train_batch_size\": 1000,\n",
        "            \"num_sgd_iter\": 2,\n",
        "            \"num_data_loader_buffers\": 2,\n",
        "            \"model\": {\"dim\": 42},\n",
        "        },\n",
        "        stop=args.stop,\n",
        "        local_dir='./logs')\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Monitor the run\n",
        "\n",
        "Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs. The widget shows the list of two child runs, one for head compute target run and one for worker compute target run. You can click on the link under **Status** to see the details of the child run. It will also show the metrics being logged."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "\n",
        "RunDetails(training_run).show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Stop the run\n",
        "\n",
        "To stop the run, call `training_run.cancel()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Uncomment line below to cancel the run\n",
        "# training_run.cancel()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Wait for completion\n",
        "Wait for the run to complete before proceeding. If you want to stop the run, you may skip this and move to next section below. \n",
        "\n",
        "**Note: The run may take anywhere from 30 minutes to 45 minutes to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Performance of the agent during training\n",
        "\n",
        "Let's get the reward metrics for the training run agent and observe how the agent's rewards improved over the training iterations and how the agent learns to win the Pong game. \n",
        "\n",
        "Collect the episode reward metrics from the worker run's metrics. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Get the reward metrics from training_run\n",
        "episode_reward_mean = training_run.get_metrics(name='episode_reward_mean')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Plot the reward metrics. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "plt.plot(episode_reward_mean['episode_reward_mean'])\n",
        "plt.xlabel('training_iteration')\n",
        "plt.ylabel('episode_reward_mean')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We observe that during the training over multiple episodes, the agent learns to win the Pong game against opponent with our target of 18 points in each episode of 21 points.\n",
        "**Congratulations!! You have trained your Pong agent to win a game.**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Cleaning up\n",
        "For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# To archive the created experiment:\n",
        "#experiment.archive()\n",
        "\n",
        "# To delete the compute targets:\n",
        "#head_compute_target.delete()\n",
        "#worker_compute_target.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next\n",
        "In this example, you learned how to solve distributed reinforcement learning training problems using head and worker compute targets. This was an introductory tutorial on Reinforement Learning in Azure Machine Learning service offering. We would love to hear your feedback to build the features you need!"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "vineetg"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "reinforcement-learning"
    ],
    "interpreter": {
      "hash": "13382f70c1d0595120591d2e358c8d446daf961bf951d1fba9a32631e205d5ab"
    },
    "kernel_info": {
      "name": "python3-azureml"
    },
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.0"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00afLicensed under the MIT License.\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00af ",
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}