mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 01:27:06 -05:00
928 lines
38 KiB
Plaintext
928 lines
38 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Reinforcement Learning in Azure Machine Learning - Training a Minecraft agent using custom environments\n",
|
|
"\n",
|
|
"This tutorial will show how to set up a more complex reinforcement\n",
|
|
"learning (RL) training scenario. It demonstrates how to train an agent to\n",
|
|
"navigate through a lava maze in the Minecraft game using Azure Machine\n",
|
|
"Learning.\n",
|
|
"\n",
|
|
"**Please note:** This notebook trains an agent on a randomly generated\n",
|
|
"Minecraft level. As a result, on rare occasions, a training run may fail\n",
|
|
"to produce a model that can solve the maze. If this happens, you can\n",
|
|
"re-run the training step as indicated below.\n",
|
|
"\n",
|
|
"**Please note:** This notebook uses 1 NC6 type node and 8 D2 type nodes\n",
|
|
"for up to 5 hours of training, which corresponds to approximately $9.06 (USD)\n",
|
|
"as of May 2020.\n",
|
|
"\n",
|
|
"Minecraft is currently one of the most popular video\n",
|
|
"games and as such has been a study object for RL. [Project \n",
|
|
"Malmo](https://www.microsoft.com/en-us/research/project/project-malmo/) is\n",
|
|
"a platform for artificial intelligence experimentation and research built on\n",
|
|
"top of Minecraft. We will use Minecraft [gym](https://gym.openai.com) environments from Project\n",
|
|
"Malmo's 2019 MineRL competition, which are part of the \n",
|
|
"[MineRL](http://minerl.io/docs/index.html) Python package.\n",
|
|
"\n",
|
|
"Minecraft environments require a display to run, so we will demonstrate\n",
|
|
"how to set up a virtual display within the docker container used for training.\n",
|
|
"Learning will be based on the agent's visual observations. To\n",
|
|
"generate the necessary amount of sample data, we will run several\n",
|
|
"instances of the Minecraft game in parallel. Below, you can see a video of\n",
|
|
"a trained agent navigating a lava maze. Starting from the green position,\n",
|
|
"it moves to the blue position by moving forward, turning left or turning right:\n",
|
|
"\n",
|
|
"<table style=\"width:50%\">\n",
|
|
" <tr>\n",
|
|
" <th style=\"text-align: center;\">\n",
|
|
" <img src=\"./images/lava_maze_minecraft.gif\" alt=\"Minecraft lava maze\" align=\"middle\" margin-left=\"auto\" margin-right=\"auto\"/>\n",
|
|
" </th>\n",
|
|
" </tr>\n",
|
|
" <tr style=\"text-align: center;\">\n",
|
|
" <th>Fig 1. Video of a trained Minecraft agent navigating a lava maze.</th>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
"\n",
|
|
"The tutorial will cover the following steps:\n",
|
|
"- Initializing Azure Machine Learning resources for training\n",
|
|
"- Training the RL agent with Azure Machine Learning service\n",
|
|
"- Monitoring training progress\n",
|
|
"- Reviewing training results\n",
|
|
"\n",
|
|
"\n",
|
|
"## Prerequisites\n",
|
|
"\n",
|
|
"The user should have completed the Azure Machine Learning introductory tutorial.\n",
|
|
"You will need to make sure that you have a valid subscription id, a resource group and a\n",
|
|
"workspace. For detailed instructions see [Tutorial: Get started creating\n",
|
|
"your first ML experiment.](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup)\n",
|
|
"\n",
|
|
"In addition, please follow the instructions in the [Reinforcement Learning in\n",
|
|
"Azure Machine Learning - Setting Up Development Environment](../setup/devenv_setup.ipynb)\n",
|
|
"notebook to correctly set up a Virtual Network which is required for completing \n",
|
|
"this tutorial.\n",
|
|
"\n",
|
|
"While this is a standalone notebook, we highly recommend going over the\n",
|
|
"introductory notebooks for RL first.\n",
|
|
"- Getting started:\n",
|
|
" - [RL using a compute instance with Azure Machine Learning service](../cartpole-on-compute-instance/cartpole_ci.ipynb)\n",
|
|
" - [Using Azure Machine Learning compute](../cartpole-on-single-compute/cartpole_sc.ipynb)\n",
|
|
"- [Scaling RL training runs with Azure Machine Learning service](../atari-on-distributed-compute/pong_rllib.ipynb)\n",
|
|
"\n",
|
|
"\n",
|
|
"## Initialize resources\n",
|
|
"\n",
|
|
"All required Azure Machine Learning service resources for this tutorial can be set up from Jupyter.\n",
|
|
"This includes:\n",
|
|
"- Connecting to your existing Azure Machine Learning workspace.\n",
|
|
"- Creating an experiment to track runs.\n",
|
|
"- Creating remote compute targets for [Ray](https://docs.ray.io/en/latest/index.html).\n",
|
|
"\n",
|
|
"### Azure Machine Learning SDK\n",
|
|
"\n",
|
|
"Display the Azure Machine Learning SDK version."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"print(\"Azure Machine Learning SDK Version: \", azureml.core.VERSION) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Connect to workspace\n",
|
|
"\n",
|
|
"Get a reference to an existing Azure Machine Learning workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.location, ws.resource_group, sep=' | ')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create an experiment\n",
|
|
"\n",
|
|
"Create an experiment to track the runs in your workspace. A\n",
|
|
"workspace can have multiple experiments and each experiment\n",
|
|
"can be used to track multiple runs (see [documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py)\n",
|
|
"for details)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"nbpresent": {
|
|
"id": "bc70f780-c240-4779-96f3-bc5ef9a37d59"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"exp = Experiment(workspace=ws, name='minecraft-maze')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create or attach an existing compute resource\n",
|
|
"\n",
|
|
"A compute target is a designated compute resource where you\n",
|
|
"run your training script. For more information, see [What\n",
|
|
"are compute targets in Azure Machine Learning service?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target).\n",
|
|
"\n",
|
|
"#### GPU target for Ray head\n",
|
|
"\n",
|
|
"In the experiment setup for this tutorial, the Ray head node\n",
|
|
"will run on a GPU-enabled node. A maximum cluster size\n",
|
|
"of 1 node is therefore sufficient. If you wish to run\n",
|
|
"multiple experiments in parallel using the same GPU\n",
|
|
"cluster, you may elect to increase this number. The cluster\n",
|
|
"will automatically scale down to 0 nodes when no training jobs\n",
|
|
"are scheduled (see `min_nodes`).\n",
|
|
"\n",
|
|
"The code below creates a compute cluster of GPU-enabled NC6\n",
|
|
"nodes. If the cluster with the specified name is already in\n",
|
|
"your workspace the code will skip the creation process.\n",
|
|
"\n",
|
|
"Note that we must specify a Virtual Network during compute\n",
|
|
"creation to allow communication between the cluster running\n",
|
|
"the Ray head node and the additional Ray compute nodes. For\n",
|
|
"details on how to setup the Virtual Network, please follow the\n",
|
|
"instructions in the \"Prerequisites\" section above.\n",
|
|
"\n",
|
|
"**Note: Creation of a compute resource can take several minutes**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# please enter the name of your Virtual Network (see Prerequisites -> Workspace setup)\n",
|
|
"vnet_name = 'your_vnet'\n",
|
|
"\n",
|
|
"# name of the Virtual Network subnet ('default' the default name)\n",
|
|
"subnet_name = 'default'\n",
|
|
"\n",
|
|
"gpu_cluster_name = 'gpu-cluster-nc6'\n",
|
|
"\n",
|
|
"try:\n",
|
|
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
|
|
" print('Found existing compute target')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(\n",
|
|
" vm_size='Standard_NC6',\n",
|
|
" min_nodes=0,\n",
|
|
" max_nodes=1,\n",
|
|
" vnet_resourcegroup_name=ws.resource_group,\n",
|
|
" vnet_name=vnet_name,\n",
|
|
" subnet_name=subnet_name)\n",
|
|
"\n",
|
|
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
|
|
" gpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
|
"\n",
|
|
" print('Cluster created.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### CPU target for additional Ray nodes\n",
|
|
"\n",
|
|
"The code below creates a compute cluster of D2 nodes. If the cluster with the specified name is already in your workspace the code will skip the creation process.\n",
|
|
"\n",
|
|
"This cluster will be used to start additional Ray nodes\n",
|
|
"increasing the clusters CPU resources.\n",
|
|
"\n",
|
|
"**Note: Creation of a compute resource can take several minutes**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"cpu_cluster_name = 'cpu-cluster-d2'\n",
|
|
"\n",
|
|
"try:\n",
|
|
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
|
|
" print('Found existing compute target')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(\n",
|
|
" vm_size='STANDARD_D2',\n",
|
|
" min_nodes=0,\n",
|
|
" max_nodes=10,\n",
|
|
" vnet_resourcegroup_name=ws.resource_group,\n",
|
|
" vnet_name=vnet_name,\n",
|
|
" subnet_name=subnet_name)\n",
|
|
"\n",
|
|
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
|
|
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
|
"\n",
|
|
" print('Cluster created.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Training the agent\n",
|
|
"\n",
|
|
"### Training environments\n",
|
|
"\n",
|
|
"This tutorial uses custom docker images (CPU and GPU respectively)\n",
|
|
"with the necessary software installed. The\n",
|
|
"[Environment](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)\n",
|
|
"class stores the configuration for the training environment. The docker\n",
|
|
"image is set via `env.docker.base_image` which can point to any\n",
|
|
"publicly available docker image. `user_managed_dependencies`\n",
|
|
"is set so that the preinstalled Python packages in the image are preserved.\n",
|
|
"\n",
|
|
"Note that since Minecraft requires a display to start, we set the `interpreter_path`\n",
|
|
"such that the Python process is started via **xvfb-run**."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"max_train_time = os.environ.get(\"AML_MAX_TRAIN_TIME_SECONDS\", 5 * 60 * 60)\n",
|
|
"\n",
|
|
"def create_env(env_type):\n",
|
|
" env = Environment(name='minecraft-{env_type}'.format(env_type=env_type))\n",
|
|
"\n",
|
|
" env.docker.enabled = True\n",
|
|
" env.docker.base_image = 'akdmsft/minecraft-{env_type}'.format(env_type=env_type)\n",
|
|
"\n",
|
|
" env.python.interpreter_path = \"xvfb-run -s '-screen 0 640x480x16 -ac +extension GLX +render' python\"\n",
|
|
" env.environment_variables[\"AML_MAX_TRAIN_TIME_SECONDS\"] = str(max_train_time)\n",
|
|
" env.python.user_managed_dependencies = True\n",
|
|
" \n",
|
|
" return env\n",
|
|
" \n",
|
|
"cpu_minecraft_env = create_env('cpu')\n",
|
|
"gpu_minecraft_env = create_env('gpu')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Training script\n",
|
|
"\n",
|
|
"As described above, we use the MineRL Python package to launch\n",
|
|
"Minecraft game instances. MineRL provides several OpenAI gym\n",
|
|
"environments for different scenarios, such as chopping wood.\n",
|
|
"Besides predefined environments, MineRL lets its users create\n",
|
|
"custom Minecraft environments through\n",
|
|
"[minerl.env](http://minerl.io/docs/api/env.html). In the helper\n",
|
|
"file **minecraft_environment.py** provided with this tutorial, we use the\n",
|
|
"latter option to customize a Minecraft level with a lava maze\n",
|
|
"that the agent has to navigate. The agent receives a negative\n",
|
|
"reward of -1 for falling into the lava, a negative reward of\n",
|
|
"-0.02 for sending a command (i.e. navigating through the maze\n",
|
|
"with fewer actions yields a higher total reward) and a positive reward\n",
|
|
"of 1 for reaching the goal. To encourage the agent to explore\n",
|
|
"the maze, it also receives a positive reward of 0.1 for visiting\n",
|
|
"a tile for the first time.\n",
|
|
"\n",
|
|
"The agent learns purely from visual observations and the image\n",
|
|
"is scaled to an 84x84 format, stacking four frames. For the\n",
|
|
"purposes of this example, we use a small action space of size\n",
|
|
"three: move forward, turn 90 degrees to the left, and turn 90\n",
|
|
"degrees to the right.\n",
|
|
"\n",
|
|
"The training script itself registers the function to create training\n",
|
|
"environments with the `tune.register_env` function and connects to\n",
|
|
"the Ray cluster Azure Machine Learning service started on the GPU \n",
|
|
"and CPU nodes. Lastly, it starts a RL training run with `tune.run()`.\n",
|
|
"\n",
|
|
"We recommend setting the `local_dir` parameter to `./logs` as this\n",
|
|
"directory will automatically become available as part of the training\n",
|
|
"run's files in the Azure Portal. The Tensorboard integration\n",
|
|
"(see \"View the Tensorboard\" section below) also depends on the files'\n",
|
|
"availability. For a list of common parameter options, please refer\n",
|
|
"to the [Ray documentation](https://docs.ray.io/en/latest/rllib-training.html#common-parameters).\n",
|
|
"\n",
|
|
"\n",
|
|
"```python\n",
|
|
"# Taken from minecraft_environment.py and minecraft_train.py\n",
|
|
"\n",
|
|
"# Define a function to create a MineRL environment\n",
|
|
"def create_env(config):\n",
|
|
" mission = config['mission']\n",
|
|
" port = 1000 * config.worker_index + config.vector_index\n",
|
|
" print('*********************************************')\n",
|
|
" print(f'* Worker {config.worker_index} creating from mission: {mission}, port {port}')\n",
|
|
" print('*********************************************')\n",
|
|
"\n",
|
|
" if config.worker_index == 0:\n",
|
|
" # The first environment is only used for checking the action and observation space.\n",
|
|
" # By using a dummy environment, there's no need to spin up a Minecraft instance behind it\n",
|
|
" # saving some CPU resources on the head node.\n",
|
|
" return DummyEnv()\n",
|
|
"\n",
|
|
" env = EnvWrapper(mission, port)\n",
|
|
" env = TrackingEnv(env)\n",
|
|
" env = FrameStack(env, 2)\n",
|
|
" \n",
|
|
" return env\n",
|
|
"\n",
|
|
"\n",
|
|
"def stop(trial_id, result):\n",
|
|
" return result[\"episode_reward_mean\"] >= 1 \\\n",
|
|
" or result[\"time_total_s\"] > 5 * 60 * 60\n",
|
|
"\n",
|
|
"\n",
|
|
"if __name__ == '__main__':\n",
|
|
" tune.register_env(\"Minecraft\", create_env)\n",
|
|
"\n",
|
|
" ray.init(address='auto')\n",
|
|
"\n",
|
|
" tune.run(\n",
|
|
" run_or_experiment=\"IMPALA\",\n",
|
|
" config={\n",
|
|
" \"env\": \"Minecraft\",\n",
|
|
" \"env_config\": {\n",
|
|
" \"mission\": \"minecraft_missions/lava_maze-v0.xml\"\n",
|
|
" },\n",
|
|
" \"num_workers\": 10,\n",
|
|
" \"num_cpus_per_worker\": 2,\n",
|
|
" \"rollout_fragment_length\": 50,\n",
|
|
" \"train_batch_size\": 1024,\n",
|
|
" \"replay_buffer_num_slots\": 4000,\n",
|
|
" \"replay_proportion\": 10,\n",
|
|
" \"learner_queue_timeout\": 900,\n",
|
|
" \"num_sgd_iter\": 2,\n",
|
|
" \"num_data_loader_buffers\": 2,\n",
|
|
" \"exploration_config\": {\n",
|
|
" \"type\": \"EpsilonGreedy\",\n",
|
|
" \"initial_epsilon\": 1.0,\n",
|
|
" \"final_epsilon\": 0.02,\n",
|
|
" \"epsilon_timesteps\": 500000\n",
|
|
" },\n",
|
|
" \"callbacks\": {\"on_train_result\": callbacks.on_train_result},\n",
|
|
" },\n",
|
|
" stop=stop,\n",
|
|
" checkpoint_at_end=True,\n",
|
|
" local_dir='./logs'\n",
|
|
" )\n",
|
|
"```\n",
|
|
"\n",
|
|
"### Submitting a training run\n",
|
|
"\n",
|
|
"Below, you create the training run using a `ReinforcementLearningEstimator`\n",
|
|
"object, which contains all the configuration parameters for this experiment:\n",
|
|
"- `source_directory`: Contains the training script and helper files to be\n",
|
|
"copied onto the node running the Ray head.\n",
|
|
"- `entry_script`: The training script, described in more detail above..\n",
|
|
"- `compute_target`: The compute target for the Ray head and training\n",
|
|
"script execution.\n",
|
|
"- `environment`: The Azure machine learning environment definition for\n",
|
|
"the node running the Ray head.\n",
|
|
"- `worker_configuration`: The configuration object for the additional\n",
|
|
"Ray nodes to be attached to the Ray cluster:\n",
|
|
" - `compute_target`: The compute target for the additional Ray nodes.\n",
|
|
" - `node_count`: The number of nodes to attach to the Ray cluster.\n",
|
|
" - `environment`: The environment definition for the additional Ray nodes.\n",
|
|
"- `max_run_duration_seconds`: The time after which to abort the run if it\n",
|
|
"is still running.\n",
|
|
"- `shm_size`: The size of docker container's shared memory block. \n",
|
|
"\n",
|
|
"For more details, please take a look at the [online documentation](https://docs.microsoft.com/en-us/python/api/azureml-contrib-reinforcementlearning/?view=azure-ml-py)\n",
|
|
"for Azure Machine Learning service's reinforcement learning offering.\n",
|
|
"\n",
|
|
"We configure 8 extra D2 (worker) nodes for the Ray cluster, giving us a total of\n",
|
|
"22 CPUs and 1 GPU. The GPU and one CPU are used by the IMPALA learner,\n",
|
|
"and each MineRL environment receives 2 CPUs allowing us to spawn a total\n",
|
|
"of 10 rollout workers (see `num_workers` parameter in the training script).\n",
|
|
"\n",
|
|
"\n",
|
|
"Lastly, the `RunDetails` widget displays information about the submitted\n",
|
|
"RL experiment, including a link to the Azure portal with more details."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.contrib.train.rl import ReinforcementLearningEstimator, WorkerConfiguration\n",
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"worker_config = WorkerConfiguration(\n",
|
|
" compute_target=cpu_cluster, \n",
|
|
" node_count=8,\n",
|
|
" environment=cpu_minecraft_env)\n",
|
|
"\n",
|
|
"rl_est = ReinforcementLearningEstimator(\n",
|
|
" source_directory='files',\n",
|
|
" entry_script='minecraft_train.py',\n",
|
|
" compute_target=gpu_cluster,\n",
|
|
" environment=gpu_minecraft_env,\n",
|
|
" worker_configuration=worker_config,\n",
|
|
" max_run_duration_seconds=6 * 60 * 60,\n",
|
|
" shm_size=1024 * 1024 * 1024 * 30)\n",
|
|
"\n",
|
|
"train_run = exp.submit(rl_est)\n",
|
|
"\n",
|
|
"RunDetails(train_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# If you wish to cancel the run before it completes, uncomment and execute:\n",
|
|
"#train_run.cancel()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Monitoring training progress\n",
|
|
"\n",
|
|
"### View the Tensorboard\n",
|
|
"\n",
|
|
"The Tensorboard can be displayed via the Azure Machine Learning service's\n",
|
|
"[Tensorboard API](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-tensorboard).\n",
|
|
"When running locally, please make sure to follow the instructions in the\n",
|
|
"link and install required packages. Running this cell will output a URL\n",
|
|
"for the Tensorboard.\n",
|
|
"\n",
|
|
"Note that the training script sets the log directory when starting RLlib\n",
|
|
"via the `local_dir` parameter. `./logs` will automatically appear in\n",
|
|
"the downloadable files for a run. Since this script is executed on the\n",
|
|
"Ray head node run, we need to get a reference to it as shown below.\n",
|
|
"\n",
|
|
"The Tensorboard API will continuously stream logs from the run.\n",
|
|
"\n",
|
|
"**Note: It may take a couple of minutes after the run is in \"Running\" state\n",
|
|
"before Tensorboard files are available and the board will refresh automatically**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import time\n",
|
|
"from azureml.tensorboard import Tensorboard\n",
|
|
"\n",
|
|
"head_run = None\n",
|
|
"\n",
|
|
"timeout = 60\n",
|
|
"while timeout > 0 and head_run is None:\n",
|
|
" timeout -= 1\n",
|
|
" \n",
|
|
" try:\n",
|
|
" head_run = next(r for r in train_run.get_children() if r.id.endswith('head'))\n",
|
|
" except StopIteration:\n",
|
|
" time.sleep(1)\n",
|
|
"\n",
|
|
"tb = Tensorboard([head_run])\n",
|
|
"tb.start()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Review results\n",
|
|
"\n",
|
|
"Please ensure that the training run has completed before continuing with this section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"train_run.wait_for_completion()\n",
|
|
"\n",
|
|
"print('Training run completed.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Please note:** If the final \"episode_reward_mean\" metric from the training run is negative,\n",
|
|
"the produced model does not solve the problem of navigating the maze well. You can view\n",
|
|
"the metric on the Tensorboard or in \"Metrics\" section of the head run in the Azure Machine Learning\n",
|
|
"portal. We recommend training a new model by rerunning the notebook starting from \"Submitting a training run\".\n",
|
|
"\n",
|
|
"\n",
|
|
"### Export final model\n",
|
|
"\n",
|
|
"The key result from the training run is the final checkpoint\n",
|
|
"containing the state of the IMPALA trainer (model) upon meeting the\n",
|
|
"stopping criteria specified in `minecraft_train.py`.\n",
|
|
"\n",
|
|
"Azure Machine Learning service offers the [Model.register()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)\n",
|
|
"API which allows you to persist the model files from the\n",
|
|
"training run. We identify the directory containing the\n",
|
|
"final model written during the training run and register\n",
|
|
"it with Azure Machine Learning service. We use a Dataset\n",
|
|
"object to filter out the correct files."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import re\n",
|
|
"import tempfile\n",
|
|
"\n",
|
|
"from azureml.core import Dataset\n",
|
|
"\n",
|
|
"path_prefix = os.path.join(tempfile.gettempdir(), 'tmp_training_artifacts')\n",
|
|
"\n",
|
|
"run_artifacts_path = os.path.join('azureml', head_run.id)\n",
|
|
"datastore = ws.get_default_datastore()\n",
|
|
"\n",
|
|
"run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))\n",
|
|
"\n",
|
|
"cp_pattern = re.compile('.*checkpoint-\\\\d+$')\n",
|
|
"\n",
|
|
"checkpoint_files = [file for file in run_artifacts_ds.to_path() if cp_pattern.match(file)]\n",
|
|
"\n",
|
|
"# There should only be one checkpoint with our training settings...\n",
|
|
"final_checkpoint = os.path.dirname(os.path.join(run_artifacts_path, os.path.normpath(checkpoint_files[-1][1:])))\n",
|
|
"datastore.download(target_path=path_prefix, prefix=final_checkpoint.replace('\\\\', '/'), show_progress=True)\n",
|
|
"\n",
|
|
"print('Download complete.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"model_name = 'final_model_minecraft_maze'\n",
|
|
"\n",
|
|
"model = Model.register(\n",
|
|
" workspace=ws,\n",
|
|
" model_path=os.path.join(path_prefix, final_checkpoint),\n",
|
|
" model_name=model_name,\n",
|
|
" description='Model of an agent trained to navigate a lava maze in Minecraft.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Models can be used through a varity of APIs. Please see the\n",
|
|
"[documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where)\n",
|
|
"for more details.\n",
|
|
"\n",
|
|
"### Test agent performance in a rollout\n",
|
|
"\n",
|
|
"To observe the trained agent's behavior, it is a common practice to\n",
|
|
"view its behavior in a rollout. The previous reinforcement learning\n",
|
|
"tutorials explain rollouts in more detail.\n",
|
|
"\n",
|
|
"The provided `minecraft_rollout.py` script loads the final checkpoint\n",
|
|
"of the trained agent from the model registered with Azure Machine Learning\n",
|
|
"service. It then starts a rollout on 4 different lava maze layouts, that\n",
|
|
"are all larger and thus more difficult than the maze the agent was trained\n",
|
|
"on. The script further records videos by replaying the agent's decisions\n",
|
|
"in [Malmo](https://github.com/microsoft/malmo). Malmo supports multiple\n",
|
|
"agents in the same environment, thus allowing us to capture videos that\n",
|
|
"depict the agent from another agent's perspective. The provided\n",
|
|
"`malmo_video_recorder.py` file and the Malmo Github repository have more\n",
|
|
"details on the video recording setup.\n",
|
|
"\n",
|
|
"You can view the rewards for each rollout episode in the logs for the 'head'\n",
|
|
"run submitted below. In some episodes, the agent may fail to reach the goal\n",
|
|
"due to the higher level of difficulty - in practice, we could continue\n",
|
|
"training the agent on harder tasks starting with the final checkpoint."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"script_params = {\n",
|
|
" '--model_name': model_name\n",
|
|
"}\n",
|
|
"\n",
|
|
"rollout_est = ReinforcementLearningEstimator(\n",
|
|
" source_directory='files',\n",
|
|
" entry_script='minecraft_rollout.py',\n",
|
|
" script_params=script_params,\n",
|
|
" compute_target=gpu_cluster,\n",
|
|
" environment=gpu_minecraft_env,\n",
|
|
" shm_size=1024 * 1024 * 1024 * 30)\n",
|
|
"\n",
|
|
"rollout_run = exp.submit(rollout_est)\n",
|
|
"\n",
|
|
"RunDetails(rollout_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### View videos captured during rollout\n",
|
|
"\n",
|
|
"To inspect the agent's training progress you can view the videos captured\n",
|
|
"during the rollout episodes. First, ensure that the training run has\n",
|
|
"completed."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rollout_run.wait_for_completion()\n",
|
|
"\n",
|
|
"head_run_rollout = next(r for r in rollout_run.get_children() if r.id.endswith('head'))\n",
|
|
"\n",
|
|
"print('Rollout completed.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next, you need to download the video files from the training run. We use a\n",
|
|
"Dataset to filter out the video files which are in tgz archives."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rollout_run_artifacts_path = os.path.join('azureml', head_run_rollout.id)\n",
|
|
"datastore = ws.get_default_datastore()\n",
|
|
"\n",
|
|
"rollout_run_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(rollout_run_artifacts_path, '**')))\n",
|
|
"\n",
|
|
"video_archives = [file for file in rollout_run_artifacts_ds.to_path() if file.endswith('.tgz')]\n",
|
|
"video_archives = [os.path.join(rollout_run_artifacts_path, os.path.normpath(file[1:])) for file in video_archives]\n",
|
|
"\n",
|
|
"datastore.download(\n",
|
|
" target_path=path_prefix,\n",
|
|
" prefix=os.path.dirname(video_archives[0]).replace('\\\\', '/'),\n",
|
|
" show_progress=True)\n",
|
|
"\n",
|
|
"print('Download complete.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next, unzip the video files and rename them by the Minecraft mission seed used\n",
|
|
"(see `minecraft_rollout.py` for more details on how the seed is used)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import tarfile\n",
|
|
"import shutil\n",
|
|
"\n",
|
|
"training_artifacts_dir = './training_artifacts'\n",
|
|
"video_dir = os.path.join(training_artifacts_dir, 'videos')\n",
|
|
"video_files = []\n",
|
|
"\n",
|
|
"for tar_file_path in video_archives:\n",
|
|
" seed = tar_file_path[tar_file_path.index('rollout_') + len('rollout_'): tar_file_path.index('.tgz')]\n",
|
|
" \n",
|
|
" tar = tarfile.open(os.path.join(path_prefix, tar_file_path).replace('\\\\', '/'), 'r')\n",
|
|
" tar_info = next(t_info for t_info in tar.getmembers() if t_info.name.endswith('mp4'))\n",
|
|
" tar.extract(tar_info, video_dir)\n",
|
|
" tar.close()\n",
|
|
" \n",
|
|
" unzipped_folder = os.path.join(video_dir, next(f_ for f_ in os.listdir(video_dir) if not f_.endswith('mp4'))) \n",
|
|
" video_file = os.path.join(unzipped_folder,'video.mp4')\n",
|
|
" final_video_path = os.path.join(video_dir, '{seed}.mp4'.format(seed=seed))\n",
|
|
" \n",
|
|
" shutil.move(video_file, final_video_path) \n",
|
|
" video_files.append(final_video_path)\n",
|
|
" \n",
|
|
" shutil.rmtree(unzipped_folder)\n",
|
|
"\n",
|
|
"# Clean up any downloaded 'tmp' files\n",
|
|
"shutil.rmtree(path_prefix)\n",
|
|
"\n",
|
|
"print('Local video files:\\n', video_files)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, run the cell below to display the videos in-line. In some cases,\n",
|
|
"the agent may struggle to find the goal since the maze size was increased\n",
|
|
"compared to training."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from IPython.core.display import display, HTML\n",
|
|
"\n",
|
|
"index = 0\n",
|
|
"while index < len(video_files) - 1:\n",
|
|
" display(\n",
|
|
" HTML('\\\n",
|
|
" <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
|
|
" <source src=\"{f1}\" type=\"video/mp4\"> \\\n",
|
|
" </video> \\\n",
|
|
" <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
|
|
" <source src=\"{f2}\" type=\"video/mp4\"> \\\n",
|
|
" </video>'.format(f1=video_files[index], f2=video_files[index + 1]))\n",
|
|
" )\n",
|
|
" \n",
|
|
" index += 2\n",
|
|
"\n",
|
|
"if index < len(video_files):\n",
|
|
" display(\n",
|
|
" HTML('\\\n",
|
|
" <video controls alt=\"cannot display video\" autoplay loop width=49%> \\\n",
|
|
" <source src=\"{f1}\" type=\"video/mp4\"> \\\n",
|
|
" </video>'.format(f1=video_files[index]))\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cleaning up\n",
|
|
"\n",
|
|
"Below, you can find code snippets for your convenience to clean up any resources created as part of this tutorial you don't wish to retain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to stop the Tensorboard, uncomment and run\n",
|
|
"#tb.stop()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to delete the gpu compute target, uncomment and run\n",
|
|
"#gpu_cluster.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to delete the cpu compute target, uncomment and run\n",
|
|
"#cpu_cluster.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to delete the registered model, uncomment and run\n",
|
|
"#model.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to delete the local video files, uncomment and run\n",
|
|
"#shutil.rmtree(training_artifacts_dir)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"This is currently the last introductory tutorial for Azure Machine Learning\n",
|
|
"service's Reinforcement\n",
|
|
"Learning offering. We would love to hear your feedback to build the features\n",
|
|
"you need!\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "andress"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.0"
|
|
},
|
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved.\u00e2\u20ac\u00afLicensed under the MIT License.\u00e2\u20ac\u00af "
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |