MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/minecraft-on-distributed-compute/minecraft.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Reinforcement Learning in Azure Machine Learning - Training a Minecraft agent using custom environments\n",
        "\n",
        "This tutorial will show how to set up a more complex reinforcement\n",
        "learning (RL) training scenario.  It demonstrates how to train an agent to\n",
        "navigate through a lava maze in the Minecraft game using Azure Machine\n",
        "Learning.\n",
        "\n",
        "**Please note:** This notebook trains an agent on a randomly generated\n",
        "Minecraft level.  As a result, on rare occasions, a training run may fail\n",
        "to produce a model that can solve the maze.  If this happens, you can\n",
        "re-run the training step as indicated below.\n",
        "\n",
        "**Please note:** This notebook uses 1 NC6 type node and 8 D2 type nodes\n",
        "for up to 5 hours of training, which corresponds to approximately $9.06 (USD)\n",
        "as of May 2020.\n",
        "\n",
        "Minecraft is currently one of the most popular video\n",
        "games and as such has been a study object for RL.  [Project \n",
        "Malmo](https://www.microsoft.com/en-us/research/project/project-malmo/) is\n",
        "a platform for artificial intelligence experimentation and research built on\n",
        "top of Minecraft.  We will use Minecraft [gym](https://gym.openai.com) environments from Project\n",
        "Malmo's 2019 MineRL competition, which are part of the \n",
        "[MineRL](http://minerl.io/docs/index.html) Python package.\n",
        "\n",
        "Minecraft environments require a display to run, so we will demonstrate\n",
        "how to set up a virtual display within the docker container used for training.\n",
        "Learning will be based on the agent's visual observations.  To\n",
        "generate the necessary amount of sample data, we will run several\n",
        "instances of the Minecraft game in parallel.  Below, you can see a video of\n",
        "a trained agent navigating a lava maze. Starting from the green position,\n",
        "it moves to the blue position by moving forward, turning left or turning right:\n",
        "\n",
        "<table style=\"width:50%\">\n",
        "  <tr>\n",
        "      <th style=\"text-align: center;\">\n",
        "          <img src=\"./images/lava_maze_minecraft.gif\" alt=\"Minecraft lava maze\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/>\n",
        "      </th>\n",
        "  </tr>\n",
        "  <tr style=\"text-align: center;\">\n",
        "      <th>Fig 1. Video of a trained Minecraft agent navigating a lava maze.</th>\n",
        "  </tr>\n",
        "</table>\n",
        "\n",
        "The tutorial will cover the following steps:\n",
        "- Initializing Azure Machine Learning resources for training\n",
        "- Training the RL agent with Azure Machine Learning service\n",
        "- Monitoring training progress\n",
        "- Reviewing training results\n",
        "\n",
        "\n",
        "## Prerequisites\n",
        "\n",
        "The user should have completed the Azure Machine Learning introductory tutorial.\n",
        "You will need to make sure that you have a valid subscription id, a resource group and a\n",
        "workspace.  For detailed instructions see [Tutorial: Get started creating\n",
        "your first ML experiment.](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup)\n",
        "\n",
        "While this is a standalone notebook, we highly recommend going over the\n",
        "introductory notebooks for RL first.\n",
        "- Getting started:\n",
        "  - [RL using a compute instance with Azure Machine Learning service](../cartpole-on-compute-instance/cartpole_ci.ipynb)\n",
        "  - [Using Azure Machine Learning compute](../cartpole-on-single-compute/cartpole_sc.ipynb)\n",
        "- [Scaling RL training runs with Azure Machine Learning service](../atari-on-distributed-compute/pong_rllib.ipynb)\n",
        "\n",
        "\n",
        "## Initialize resources\n",
        "\n",
        "All required Azure Machine Learning service resources for this tutorial can be set up from Jupyter.\n",
        "This includes:\n",
        "- Connecting to your existing Azure Machine Learning workspace.\n",
        "- Creating an experiment to track runs.\n",
        "- Setting up a virtual network\n",
        "- Creating remote compute targets for [Ray](https://docs.ray.io/en/latest/index.html).\n",
        "\n",
        "### Azure Machine Learning SDK\n",
        "\n",
        "Display the Azure Machine Learning SDK version."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azureml.core\n",
        "print(\"Azure Machine Learning SDK Version: \", azureml.core.VERSION) "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Connect to workspace\n",
        "\n",
        "Get a reference to an existing Azure Machine Learning workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.location, ws.resource_group, sep=' | ')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create an experiment\n",
        "\n",
        "Create an experiment to track the runs in your workspace.  A\n",
        "workspace can have multiple experiments and each experiment\n",
        "can be used to track multiple runs (see [documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py)\n",
        "for details)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "nbpresent": {
          "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59"
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "\n",
        "exp = Experiment(workspace=ws, name='minecraft-maze')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create Virtual Network\n",
        "\n",
        "If you are using separate compute targets for the Ray head and worker, a virtual network must be created in the resource group.  If you have alraeady created a virtual network in the resource group, you can skip this step.\n",
        "\n",
        "To do this, you first must install the Azure Networking API.\n",
        "\n",
        "`pip install --upgrade azure-mgmt-network`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If you need to install the Azure Networking SDK, uncomment the following line.\n",
        "#!pip install --upgrade azure-mgmt-network"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azure.mgmt.network import NetworkManagementClient\n",
        "\n",
        "# Virtual network name\n",
        "vnet_name =\"rl_minecraft_vnet\"\n",
        "\n",
        "# Default subnet\n",
        "subnet_name =\"default\"\n",
        "\n",
        "# The Azure subscription you are using\n",
        "subscription_id=ws.subscription_id\n",
        "\n",
        "# The resource group for the reinforcement learning cluster\n",
        "resource_group=ws.resource_group\n",
        "\n",
        "# Azure region of the resource group\n",
        "location=ws.location\n",
        "\n",
        "network_client = NetworkManagementClient(ws._auth_object, subscription_id)\n",
        "\n",
        "async_vnet_creation = network_client.virtual_networks.create_or_update(\n",
        "    resource_group,\n",
        "    vnet_name,\n",
        "    {\n",
        "        'location': location,\n",
        "        'address_space': {\n",
        "            'address_prefixes': ['10.0.0.0/16']\n",
        "        }\n",
        "    }\n",
        ")\n",
        "\n",
        "async_vnet_creation.wait()\n",
        "print(\"Virtual network created successfully: \", async_vnet_creation.result())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Set up Network Security Group on Virtual Network\n",
        "\n",
        "Depending on your Azure setup, you may need to open certain ports to make it possible for Azure to manage the compute targets that you create.  The ports that need to be opened are described [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-enable-virtual-network).\n",
        "\n",
        "A common situation is that ports `29876-29877` are closed.  The following code will add a security rule to open these ports.    Or you can do this manually in the [Azure portal](https://portal.azure.com).\n",
        "\n",
        "You may need to modify the code below to match your scenario."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azure.mgmt.network.models\n",
        "\n",
        "security_group_name = vnet_name + '-' + \"nsg\"\n",
        "security_rule_name = \"AllowAML\"\n",
        "\n",
        "# Create a network security group\n",
        "nsg_params = azure.mgmt.network.models.NetworkSecurityGroup(\n",
        "    location=location,\n",
        "    security_rules=[\n",
        "        azure.mgmt.network.models.SecurityRule(\n",
        "            name=security_rule_name,\n",
        "            access=azure.mgmt.network.models.SecurityRuleAccess.allow,\n",
        "            description='Reinforcement Learning in Azure Machine Learning rule',\n",
        "            destination_address_prefix='*',\n",
        "            destination_port_range='29876-29877',\n",
        "            direction=azure.mgmt.network.models.SecurityRuleDirection.inbound,\n",
        "            priority=400,\n",
        "            protocol=azure.mgmt.network.models.SecurityRuleProtocol.tcp,\n",
        "            source_address_prefix='BatchNodeManagement',\n",
        "            source_port_range='*'\n",
        "        ),\n",
        "    ],\n",
        ")\n",
        "\n",
        "async_nsg_creation = network_client.network_security_groups.create_or_update(\n",
        "    resource_group,\n",
        "    security_group_name,\n",
        "    nsg_params,\n",
        ")\n",
        "\n",
        "async_nsg_creation.wait() \n",
        "print(\"Network security group created successfully:\", async_nsg_creation.result())\n",
        "\n",
        "network_security_group = network_client.network_security_groups.get(\n",
        "    resource_group,\n",
        "    security_group_name,\n",
        ")\n",
        "\n",
        "# Define a subnet to be created with network security group\n",
        "subnet = azure.mgmt.network.models.Subnet(\n",
        "            id='default',\n",
        "            address_prefix='10.0.0.0/24',\n",
        "            network_security_group=network_security_group\n",
        "            )\n",
        "    \n",
        "# Create subnet on virtual network\n",
        "async_subnet_creation = network_client.subnets.create_or_update(\n",
        "    resource_group_name=resource_group,\n",
        "    virtual_network_name=vnet_name,\n",
        "    subnet_name=subnet_name,\n",
        "    subnet_parameters=subnet\n",
        ")\n",
        "\n",
        "async_subnet_creation.wait()\n",
        "print(\"Subnet created successfully:\", async_subnet_creation.result())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Review the virtual network security rules\n",
        "Ensure that the virtual network is configured correctly with required ports open. It is possible that you have configured rules with broader range of ports that allows ports 29876-29877 to be opened. Kindly review your network security group rules.  "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from files.networkutils import *\n",
        "\n",
        "check_vnet_security_rules(ws._auth_object, ws.subscription_id, ws.resource_group, vnet_name, True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create or attach an existing compute resource\n",
        "\n",
        "A compute target is a designated compute resource where you\n",
        "run your training script.  For more information, see [What\n",
        "are compute targets in Azure Machine Learning service?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target).\n",
        "\n",
        "#### GPU target for Ray head\n",
        "\n",
        "In the experiment setup for this tutorial, the Ray head node\n",
        "will run on a GPU-enabled node.  A maximum cluster size\n",
        "of 1 node is therefore sufficient.  If you wish to run\n",
        "multiple experiments in parallel using the same GPU\n",
        "cluster, you may elect to increase this number.  The cluster\n",
        "will automatically scale down to 0 nodes when no training jobs\n",
        "are scheduled (see `min_nodes`).\n",
        "\n",
        "The code below creates a compute cluster of GPU-enabled NC6\n",
        "nodes.  If the cluster with the specified name is already in\n",
        "your workspace the code will skip the creation process.\n",
        "\n",
        "Note that we must specify a Virtual Network during compute\n",
        "creation to allow communication between the cluster running\n",
        "the Ray head node and the additional Ray compute nodes.  For\n",
        "details on how to setup the Virtual Network, please follow the\n",
        "instructions in the \"Prerequisites\" section above.\n",
        "\n",
        "**Note: Creation of a compute resource can take several minutes**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "gpu_cluster_name = 'gpu-cl-nc6-vnet'\n",
        "\n",
        "try:\n",
        "    gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
        "    print('Found existing compute target')\n",
        "except ComputeTargetException:\n",
        "    print('Creating a new compute target...')\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size='Standard_NC6',\n",
        "        min_nodes=0,\n",
        "        max_nodes=1,\n",
        "        vnet_resourcegroup_name=ws.resource_group,\n",
        "        vnet_name=vnet_name,\n",
        "        subnet_name=subnet_name)\n",
        "\n",
        "    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
        "    gpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "\n",
        "    print('Cluster created.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### CPU target for additional Ray nodes\n",
        "\n",
        "The code below creates a compute cluster of D2 nodes. If the cluster with the specified name is already in your workspace the code will skip the creation process.\n",
        "\n",
        "This cluster will be used to start additional Ray nodes\n",
        "increasing the clusters CPU resources.\n",
        "\n",
        "**Note: Creation of a compute resource can take several minutes**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cpu_cluster_name = 'cpu-cl-d2-vnet'\n",
        "\n",
        "try:\n",
        "    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print('Found existing compute target')\n",
        "except ComputeTargetException:\n",
        "    print('Creating a new compute target...')\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size='STANDARD_D2',\n",
        "        min_nodes=0,\n",
        "        max_nodes=10,\n",
        "        vnet_resourcegroup_name=ws.resource_group,\n",
        "        vnet_name=vnet_name,\n",
        "        subnet_name=subnet_name)\n",
        "\n",
        "    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "    cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "\n",
        "    print('Cluster created.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Training the agent\n",
        "\n",
        "### Training environments\n",
        "\n",
        "This tutorial uses custom docker images (CPU and GPU respectively)\n",
        "with the necessary software installed.  The\n",
        "[Environment](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)\n",
        "class stores the configuration for the training environment.  The docker\n",
        "image is set via `env.docker.base_image` which can point to any\n",
        "publicly available docker image.  `user_managed_dependencies`\n",
        "is set so that the preinstalled Python packages in the image are preserved.\n",
        "\n",
        "Note that since Minecraft requires a display to start, we set the `interpreter_path`\n",
        "such that the Python process is started via **xvfb-run**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "from azureml.core import Environment\n",
        "\n",
        "max_train_time = os.environ.get(\"AML_MAX_TRAIN_TIME_SECONDS\", 5 * 60 * 60)\n",
        "\n",
        "def create_env(env_type):\n",
        "    env = Environment(name='minecraft-{env_type}'.format(env_type=env_type))\n",
        "\n",
        "    env.docker.enabled = True\n",
        "    env.docker.base_image = 'akdmsft/minecraft-{env_type}'.format(env_type=env_type)\n",
        "\n",
        "    env.python.interpreter_path = \"xvfb-run -s '-screen 0 640x480x16 -ac +extension GLX +render' python\"\n",
        "    env.environment_variables[\"AML_MAX_TRAIN_TIME_SECONDS\"] = str(max_train_time)\n",
        "    env.python.user_managed_dependencies = True\n",
        "    \n",
        "    return env\n",
        "    \n",
        "cpu_minecraft_env = create_env('cpu')\n",
        "gpu_minecraft_env = create_env('gpu')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Training script\n",
        "\n",
        "As described above, we use the MineRL Python package to launch\n",
        "Minecraft game instances.  MineRL provides several OpenAI gym\n",
        "environments for different scenarios, such as chopping wood.\n",
        "Besides predefined environments, MineRL lets its users create\n",
        "custom Minecraft environments through\n",
        "[minerl.env](http://minerl.io/docs/api/env.html).  In the helper\n",
        "file **minecraft_environment.py** provided with this tutorial, we use the\n",
        "latter option to customize a Minecraft level with a lava maze\n",
        "that the agent has to navigate. The agent receives a negative\n",
        "reward of -1 for falling into the lava, a negative reward of\n",
        "-0.02 for sending a command (i.e. navigating through the maze\n",
        "with fewer actions yields a higher total reward) and a positive reward\n",
        "of 1 for reaching the goal.  To encourage the agent to explore\n",
        "the maze, it also receives a positive reward of 0.1 for visiting\n",
        "a tile for the first time.\n",
        "\n",
        "The agent learns purely from visual observations and the image\n",
        "is scaled to an 84x84 format, stacking four frames.  For the\n",
        "purposes of this example, we use a small action space of size\n",
        "three: move forward, turn 90 degrees to the left, and turn 90\n",
        "degrees to the right.\n",
        "\n",
        "The training script itself registers the function to create training\n",
        "environments with the `tune.register_env` function and connects to\n",
        "the Ray cluster Azure Machine Learning service started on the GPU \n",
        "and CPU nodes. Lastly, it starts a RL training run with `tune.run()`.\n",
        "\n",
        "We recommend setting the `local_dir` parameter to `./logs` as this\n",
        "directory will automatically become available as part of the training\n",
        "run's files in the Azure Portal.  The Tensorboard integration\n",
        "(see \"View the Tensorboard\" section below) also depends on the files'\n",
        "availability.  For a list of common parameter options, please refer\n",
        "to the [Ray documentation](https://docs.ray.io/en/latest/rllib-training.html#common-parameters).\n",
        "\n",
        "\n",
        "```python\n",
        "# Taken from minecraft_environment.py and minecraft_train.py\n",
        "\n",
        "# Define a function to create a MineRL environment\n",
        "def create_env(config):\n",
        "    mission = config['mission']\n",
        "    port = 1000 * config.worker_index + config.vector_index\n",
        "    print('*********************************************')\n",
        "    print(f'* Worker {config.worker_index} creating from mission: {mission}, port {port}')\n",
        "    print('*********************************************')\n",
        "\n",
        "    if config.worker_index == 0:\n",
        "        # The first environment is only used for checking the action and observation space.\n",
        "        # By using a dummy environment, there's no need to spin up a Minecraft instance behind it\n",
        "        # saving some CPU resources on the head node.\n",
        "        return DummyEnv()\n",
        "\n",
        "    env = EnvWrapper(mission, port)\n",
        "    env = TrackingEnv(env)\n",
        "    env = FrameStack(env, 2)\n",
        "    \n",
        "    return env\n",
        "\n",
        "\n",
        "def stop(trial_id, result):\n",
        "    return result[\"episode_reward_mean\"] >= 1 \\\n",
        "        or result[\"time_total_s\"] > 5 * 60 * 60\n",
        "\n",
        "\n",
        "if __name__ == '__main__':\n",
        "    tune.register_env(\"Minecraft\", create_env)\n",
        "\n",
        "    ray.init(address='auto')\n",
        "\n",
        "    tune.run(\n",
        "        run_or_experiment=\"IMPALA\",\n",
        "        config={\n",
        "            \"env\": \"Minecraft\",\n",
        "            \"env_config\": {\n",
        "                \"mission\": \"minecraft_missions/lava_maze-v0.xml\"\n",
        "            },\n",
        "            \"num_workers\": 10,\n",
        "            \"num_cpus_per_worker\": 2,\n",
        "            \"rollout_fragment_length\": 50,\n",
        "            \"train_batch_size\": 1024,\n",
        "            \"replay_buffer_num_slots\": 4000,\n",
        "            \"replay_proportion\": 10,\n",
        "            \"learner_queue_timeout\": 900,\n",
        "            \"num_sgd_iter\": 2,\n",
        "            \"num_data_loader_buffers\": 2,\n",
        "            \"exploration_config\": {\n",
        "                \"type\": \"EpsilonGreedy\",\n",
        "                \"initial_epsilon\": 1.0,\n",
        "                \"final_epsilon\": 0.02,\n",
        "                \"epsilon_timesteps\": 500000\n",
        "            },\n",
        "            \"callbacks\": {\"on_train_result\": callbacks.on_train_result},\n",
        "        },\n",
        "        stop=stop,\n",
        "        checkpoint_at_end=True,\n",
        "        local_dir='./logs'\n",
        "    )\n",
        "```\n",
        "\n",
        "### Submitting a training run\n",
        "\n",
        "Below, you create the training run using a `ReinforcementLearningEstimator`\n",
        "object, which contains all the configuration parameters for this experiment:\n",
        "- `source_directory`: Contains the training script and helper files to be\n",
        "copied onto the node running the Ray head.\n",
        "- `entry_script`: The training script, described in more detail above..\n",
        "- `compute_target`: The compute target for the Ray head and training\n",
        "script execution.\n",
        "- `environment`: The Azure machine learning environment definition for\n",
        "the node running the Ray head.\n",
        "- `worker_configuration`: The configuration object for the additional\n",
        "Ray nodes to be attached to the Ray cluster:\n",
        "  - `compute_target`: The compute target for the additional Ray nodes.\n",
        "  - `node_count`: The number of nodes to attach to the Ray cluster.\n",
        "  - `environment`: The environment definition for the additional Ray nodes.\n",
        "- `max_run_duration_seconds`: The time after which to abort the run if it\n",
        "is still running.\n",
        "- `shm_size`: The size of docker container's shared memory block. \n",
        "\n",
        "For more details, please take a look at the [online documentation](https://docs.microsoft.com/en-us/python/api/azureml-contrib-reinforcementlearning/?view=azure-ml-py)\n",
        "for Azure Machine Learning service's reinforcement learning offering.\n",
        "\n",
        "We configure 8 extra D2 (worker) nodes for the Ray cluster, giving us a total of\n",
        "22 CPUs and 1 GPU.  The GPU and one CPU are used by the IMPALA learner,\n",
        "and each MineRL environment receives 2 CPUs allowing us to spawn a total\n",
        "of 10 rollout workers (see `num_workers` parameter in the training script).\n",
        "\n",
        "\n",
        "Lastly, the `RunDetails` widget displays information about the submitted\n",
        "RL experiment, including a link to the Azure portal with more details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.train.rl import ReinforcementLearningEstimator, WorkerConfiguration\n",
        "from azureml.widgets import RunDetails\n",
        "\n",
        "worker_config = WorkerConfiguration(\n",
        "    compute_target=cpu_cluster, \n",
        "    node_count=8,\n",
        "    environment=cpu_minecraft_env)\n",
        "\n",
        "rl_est = ReinforcementLearningEstimator(\n",
        "    source_directory='files',\n",
        "    entry_script='minecraft_train.py',\n",
        "    compute_target=gpu_cluster,\n",
        "    environment=gpu_minecraft_env,\n",
        "    worker_configuration=worker_config,\n",
        "    max_run_duration_seconds=6 * 60 * 60,\n",
        "    shm_size=1024 * 1024 * 1024 * 30)\n",
        "\n",
        "train_run = exp.submit(rl_est)\n",
        "\n",
        "RunDetails(train_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If you wish to cancel the run before it completes, uncomment and execute:\n",
        "#train_run.cancel()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Monitoring training progress\n",
        "\n",
        "### View the Tensorboard\n",
        "\n",
        "The Tensorboard can be displayed via the Azure Machine Learning service's\n",
        "[Tensorboard API](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-tensorboard).\n",
        "When running locally, please make sure to follow the instructions in the\n",
        "link and install required packages.  Running this cell will output a URL\n",
        "for the Tensorboard.\n",
        "\n",
        "Note that the training script sets the log directory when starting RLlib\n",
        "via the `local_dir` parameter.  `./logs` will automatically appear in\n",
        "the downloadable files for a run.  Since this script is executed on the\n",
        "Ray head node run, we need to get a reference to it as shown below.\n",
        "\n",
        "The Tensorboard API will continuously stream logs from the run.\n",
        "\n",
        "**Note: It may take a couple of minutes after the run is in \"Running\" state\n",
        "before Tensorboard files are available and the board will refresh automatically**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import time\n",
        "from azureml.tensorboard import Tensorboard\n",
        "\n",
        "head_run = None\n",
        "\n",
        "timeout = 60\n",
        "while timeout > 0 and head_run is None:\n",
        "    timeout -= 1\n",
        "    \n",
        "    try:\n",
        "        head_run = next(r for r in train_run.get_children() if r.id.endswith('head'))\n",
        "    except StopIteration:\n",
        "        time.sleep(1)\n",
        "\n",
        "tb = Tensorboard([head_run], port=6007)\n",
        "tb.start()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Review results\n",
        "\n",
        "Please ensure that the training run has completed before continuing with this section."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "train_run.wait_for_completion()\n",
        "\n",
        "print('Training run completed.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Please note:** If the final \"episode_reward_mean\" metric from the training run is negative,\n",
        "the produced model does not solve the problem of navigating the maze well.  You can view\n",
        "the metric on the Tensorboard or in \"Metrics\" section of the head run in the Azure Machine Learning\n",
        "portal.  We recommend training a new model by rerunning the notebook starting from \"Submitting a training run\".\n",
        "\n",
        "\n",
        "### Export final model\n",
        "\n",
        "The key result from the training run is the final checkpoint\n",
        "containing the state of the IMPALA trainer (model) upon meeting the\n",
        "stopping criteria specified in `minecraft_train.py`.\n",
        "\n",
        "Azure Machine Learning service offers the [Model.register()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)\n",
        "API which allows you to persist the model files from the\n",
        "training run.  We identify the directory containing the\n",
        "final model written during the training run and register\n",
        "it with Azure Machine Learning service.  We use a Dataset\n",
        "object to filter out the correct files."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import re\n",
        "import tempfile\n",
        "\n",
        "from azureml.core import Dataset\n",
        "\n",
        "path_prefix = os.path.join(tempfile.gettempdir(), 'tmp_training_artifacts')\n",
        "\n",
        "run_artifacts_path = os.path.join('azureml', head_run.id)\n",
        "datastore = ws.get_default_datastore()\n",
        "\n",
        "run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))\n",
        "\n",
        "cp_pattern = re.compile('.*checkpoint-\\\\d+$')\n",
        "\n",
        "checkpoint_files = [file for file in run_artifacts_ds.to_path() if cp_pattern.match(file)]\n",
        "\n",
        "# There should only be one checkpoint with our training settings...\n",
        "final_checkpoint = os.path.dirname(os.path.join(run_artifacts_path, os.path.normpath(checkpoint_files[-1][1:])))\n",
        "datastore.download(target_path=path_prefix, prefix=final_checkpoint.replace('\\\\', '/'), show_progress=True)\n",
        "\n",
        "print('Download complete.')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.model import Model\n",
        "\n",
        "model_name = 'final_model_minecraft_maze'\n",
        "\n",
        "model = Model.register(\n",
        "    workspace=ws,\n",
        "    model_path=os.path.join(path_prefix, final_checkpoint),\n",
        "    model_name=model_name,\n",
        "    description='Model of an agent trained to navigate a lava maze in Minecraft.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Models can be used through a varity of APIs.  Please see the\n",
        "[documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where)\n",
        "for more details.\n",
        "\n",
        "### Test agent performance in a rollout\n",
        "\n",
        "To observe the trained agent's behavior, it is a common practice to\n",
        "view its behavior in a rollout.  The previous reinforcement learning\n",
        "tutorials explain rollouts in more detail.\n",
        "\n",
        "The provided `minecraft_rollout.py` script loads the final checkpoint\n",
        "of the trained agent from the model registered with Azure Machine Learning\n",
        "service.  It then starts a rollout on 4 different lava maze layouts, that\n",
        "are all larger and thus more difficult than the maze the agent was trained\n",
        "on.  The script further records videos by replaying the agent's decisions\n",
        "in [Malmo](https://github.com/microsoft/malmo).  Malmo supports multiple\n",
        "agents in the same environment, thus allowing us to capture videos that\n",
        "depict the agent from another agent's perspective.  The provided\n",
        "`malmo_video_recorder.py` file and the Malmo Github repository have more\n",
        "details on the video recording setup.\n",
        "\n",
        "You can view the rewards for each rollout episode in the logs for the 'head'\n",
        "run submitted below.  In some episodes, the agent may fail to reach the goal\n",
        "due to the higher level of difficulty - in practice, we could continue\n",
        "training the agent on harder tasks starting with the final checkpoint."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "script_params = {\n",
        "    '--model_name': model_name\n",
        "}\n",
        "\n",
        "rollout_est = ReinforcementLearningEstimator(\n",
        "    source_directory='files',\n",
        "    entry_script='minecraft_rollout.py',\n",
        "    script_params=script_params,\n",
        "    compute_target=gpu_cluster,\n",
        "    environment=gpu_minecraft_env,\n",
        "    shm_size=1024 * 1024 * 1024 * 30)\n",
        "\n",
        "rollout_run = exp.submit(rollout_est)\n",
        "\n",
        "RunDetails(rollout_run).show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View videos captured during rollout\n",
        "\n",
        "To inspect the agent's training progress you can view the videos captured\n",
        "during the rollout episodes.  First, ensure that the training run has\n",
        "completed."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "rollout_run.wait_for_completion()\n",
        "\n",
        "head_run_rollout = next(r for r in rollout_run.get_children() if r.id.endswith('head'))\n",
        "\n",
        "print('Rollout completed.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, you need to download the video files from the training run.  We use a\n",
        "Dataset to filter out the video files which are in tgz archives."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "rollout_run_artifacts_path = os.path.join('azureml', head_run_rollout.id)\n",
        "datastore = ws.get_default_datastore()\n",
        "\n",
        "rollout_run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(rollout_run_artifacts_path, '**')))\n",
        "\n",
        "video_archives = [file for file in rollout_run_artifacts_ds.to_path() if file.endswith('.tgz')]\n",
        "video_archives = [os.path.join(rollout_run_artifacts_path, os.path.normpath(file[1:])) for file in video_archives]\n",
        "\n",
        "datastore.download(\n",
        "    target_path=path_prefix,\n",
        "    prefix=os.path.dirname(video_archives[0]).replace('\\\\', '/'),\n",
        "    show_progress=True)\n",
        "\n",
        "print('Download complete.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, unzip the video files and rename them by the Minecraft mission seed used\n",
        "(see `minecraft_rollout.py` for more details on how the seed is used)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import tarfile\n",
        "import shutil\n",
        "\n",
        "training_artifacts_dir = './training_artifacts'\n",
        "video_dir = os.path.join(training_artifacts_dir, 'videos')\n",
        "video_files = []\n",
        "\n",
        "for tar_file_path in video_archives:\n",
        "    seed = tar_file_path[tar_file_path.index('rollout_') + len('rollout_'): tar_file_path.index('.tgz')]\n",
        "    \n",
        "    tar = tarfile.open(os.path.join(path_prefix, tar_file_path).replace('\\\\', '/'), 'r')\n",
        "    tar_info = next(t_info for t_info in tar.getmembers() if t_info.name.endswith('mp4'))\n",
        "    tar.extract(tar_info, video_dir)\n",
        "    tar.close()\n",
        "    \n",
        "    unzipped_folder = os.path.join(video_dir, next(f_ for f_ in os.listdir(video_dir) if not f_.endswith('mp4')))    \n",
        "    video_file = os.path.join(unzipped_folder,'video.mp4')\n",
        "    final_video_path = os.path.join(video_dir, '{seed}.mp4'.format(seed=seed))\n",
        "    \n",
        "    shutil.move(video_file, final_video_path)    \n",
        "    video_files.append(final_video_path)\n",
        "    \n",
        "    shutil.rmtree(unzipped_folder)\n",
        "\n",
        "# Clean up any downloaded 'tmp' files\n",
        "shutil.rmtree(path_prefix)\n",
        "\n",
        "print('Local video files:\\n', video_files)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, run the cell below to display the videos in-line.  In some cases,\n",
        "the agent may struggle to find the goal since the maze size was increased\n",
        "compared to training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.core.display import display, HTML\n",
        "\n",
        "index = 0\n",
        "while index < len(video_files) - 1:\n",
        "    display(\n",
        "        HTML('\\\n",
        "        <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
        "            <source src=\"{f1}\" type=\"video/mp4\"> \\\n",
        "        </video> \\\n",
        "        <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
        "            <source src=\"{f2}\" type=\"video/mp4\"> \\\n",
        "        </video>'.format(f1=video_files[index], f2=video_files[index + 1]))\n",
        "    )\n",
        "    \n",
        "    index += 2\n",
        "\n",
        "if index < len(video_files):\n",
        "    display(\n",
        "        HTML('\\\n",
        "        <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
        "            <source src=\"{f1}\" type=\"video/mp4\"> \\\n",
        "        </video>'.format(f1=video_files[index]))\n",
        "    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Cleaning up\n",
        "\n",
        "Below, you can find code snippets for your convenience to clean up any resources created as part of this tutorial you don't wish to retain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to stop the Tensorboard, uncomment and run\n",
        "#tb.stop()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to delete the gpu compute target, uncomment and run\n",
        "#gpu_cluster.delete()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to delete the cpu compute target, uncomment and run\n",
        "#cpu_cluster.delete()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to delete the registered model, uncomment and run\n",
        "#model.delete()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# to delete the local video files, uncomment and run\n",
        "#shutil.rmtree(training_artifacts_dir)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "This is currently the last introductory tutorial for Azure Machine Learning\n",
        "service's Reinforcement\n",
        "Learning offering.  We would love to hear your feedback to build the features\n",
        "you need!\n",
        "\n"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "andress"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.0"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00e2\u20ac\u00afLicensed under the MIT License.\u00e2\u20ac\u00af "
  },
  "nbformat": 4,
  "nbformat_minor": 4
}