mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
917 lines
31 KiB
Plaintext
917 lines
31 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Single Compute\n",
|
|
"\n",
|
|
"Reinforcement Learning in Azure Machine Learning is a managed service for running reinforcement learning training and simulation. With Reinforcement Learning in Azure Machine Learning, data scientists can start developing reinforcement learning systems on one machine, and scale to compute targets with 100s of nodes if needed.\n",
|
|
"\n",
|
|
"This example shows how to use Reinforcement Learning in Azure Machine Learning to train a Cartpole playing agent on a single compute. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Cartpole problem\n",
|
|
"\n",
|
|
"Cartpole, also known as [Inverted Pendulum](https://en.wikipedia.org/wiki/Inverted_pendulum), is a pendulum with a center of mass above its pivot point. This formation is essentially unstable and will easily fall over but can be kept balanced by applying appropriate horizontal forces to the pivot point.\n",
|
|
"\n",
|
|
"<table style=\"width:50%\">\n",
|
|
" <tr>\n",
|
|
" <th>\n",
|
|
" <img src=\"./images/cartpole.png\" alt=\"Cartpole image\" /> \n",
|
|
" </th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th><p>Fig 1. Cartpole problem schematic description (from <a href=\"https://towardsdatascience.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288\">towardsdatascience.com</a>).</p></th>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
"\n",
|
|
"The goal here is to train an agent to keep the cartpole balanced by applying appropriate forces to the pivot point.\n",
|
|
"\n",
|
|
"See [this video](https://www.youtube.com/watch?v=XiigTGKZfks) for a real-world demonstration of cartpole problem."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prerequisite\n",
|
|
"The user should have completed the Azure Machine Learning Tutorial: [Get started creating your first ML experiment with the Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). You will need to make sure that you have a valid subscription ID, a resource group, and an Azure Machine Learning workspace. All datastores and datasets you use should be associated with your workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set up Development Environment\n",
|
|
"The following subsections show typical steps to setup your development environment. Setup includes:\n",
|
|
"\n",
|
|
"* Connecting to a workspace to enable communication between your local machine and remote resources\n",
|
|
"* Creating an experiment to track all your runs\n",
|
|
"* Creating a remote compute target to use for training"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Azure Machine Learning SDK \n",
|
|
"Display the Azure Machine Learning SDK version."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683056824182
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"\n",
|
|
"print(\"Azure Machine Learning SDK version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get Azure Machine Learning workspace\n",
|
|
"Get a reference to an existing Azure Machine Learning workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683056825821
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a new compute resource or attach an existing one\n",
|
|
"\n",
|
|
"A compute target is a designated compute resource where you run your training and simulation scripts. This location may be your local machine or a cloud-based compute resource. The code below shows how to create a cloud-based compute target. For more information see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)\n",
|
|
"\n",
|
|
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
|
|
"\n",
|
|
"**Note: Creation of a compute resource can take several minutes**. Please make sure to change `STANDARD_D2_V2` to a [size available in your region](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=virtual-machines)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683056826903
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
|
|
"import os\n",
|
|
"\n",
|
|
"# Choose a name and maximum size for your cluster\n",
|
|
"compute_name = \"cpu-cluster-d2\"\n",
|
|
"compute_min_nodes = 0\n",
|
|
"compute_max_nodes = 4\n",
|
|
"vm_size = \"STANDARD_D2_V2\"\n",
|
|
"\n",
|
|
"if compute_name in ws.compute_targets:\n",
|
|
" print(\"Found an existing compute target of name: \" + compute_name)\n",
|
|
" compute_target = ws.compute_targets[compute_name]\n",
|
|
" # Note: you may want to make sure compute_target is of type AmlCompute \n",
|
|
"else:\n",
|
|
" print(\"Creating new compute target...\")\n",
|
|
" provisioning_config = AmlCompute.provisioning_configuration(\n",
|
|
" vm_size=vm_size,\n",
|
|
" min_nodes=compute_min_nodes, \n",
|
|
" max_nodes=compute_max_nodes)\n",
|
|
" \n",
|
|
" # Create the cluster\n",
|
|
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
|
|
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
|
"\n",
|
|
"print(compute_target.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Azure Machine Learning experiment\n",
|
|
"Create an experiment to track the runs in your workspace. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683056827252
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'CartPole-v1-SC'\n",
|
|
"experiment = Experiment(workspace=ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646417962898
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"import os\n",
|
|
"\n",
|
|
"ray_environment_name = 'cartpole-ray-sc'\n",
|
|
"ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'files', 'docker', 'Dockerfile')\n",
|
|
"\n",
|
|
"# Build environment image\n",
|
|
"ray_environment = Environment. \\\n",
|
|
" from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
|
|
" register(workspace=ws)\n",
|
|
"ray_env_build_details = ray_environment.build(workspace=ws)\n",
|
|
"\n",
|
|
"ray_env_build_details.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train Cartpole Agent\n",
|
|
"In this section, we show how to use Azure Machine Learning jobs and Ray/RLlib framework to train a cartpole playing agent. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create reinforcement learning training run\n",
|
|
"\n",
|
|
"The code below submits the training run using a `ScriptRunConfig`. By providing the\n",
|
|
"command to run the training, and a `RunConfig` object configured with your\n",
|
|
"compute target, number of nodes, and environment image to use."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683059658819
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"from azureml.core import RunConfiguration, ScriptRunConfig, Experiment\n",
|
|
"from azureml.core.runconfig import DockerConfiguration, RunConfiguration\n",
|
|
"\n",
|
|
"config_name = 'cartpole-ppo.yaml'\n",
|
|
"script_name = 'cartpole_training.py'\n",
|
|
"video_capture = True\n",
|
|
"script_arguments = [\n",
|
|
" '--config', config_name\n",
|
|
"]\n",
|
|
"command=[\"python\", script_name, *script_arguments]\n",
|
|
"\n",
|
|
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
|
|
"aml_run_config_ml.target = compute_target\n",
|
|
"aml_run_config_ml.node_count = 1\n",
|
|
"aml_run_config_ml.environment = ray_environment\n",
|
|
"\n",
|
|
"if video_capture:\n",
|
|
" command = [\"xvfb-run -s '-screen 0 640x480x16 -ac +extension GLX +render' \"] + command\n",
|
|
" aml_run_config_ml.environment_variables[\"SDL_VIDEODRIVER\"] = \"dummy\"\n",
|
|
"\n",
|
|
"training_config = ScriptRunConfig(source_directory='./files',\n",
|
|
" command=command,\n",
|
|
" run_config = aml_run_config_ml\n",
|
|
" )\n",
|
|
"training_run = experiment.submit(training_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Training configuration\n",
|
|
"\n",
|
|
"This is the training configuration (in yaml) that we use to train an agent to solve the CartPole problem using\n",
|
|
"the PPO algorithm.\n",
|
|
"\n",
|
|
"```yaml\n",
|
|
"cartpole-ppo:\n",
|
|
" env: CartPole-v1\n",
|
|
" run: PPO\n",
|
|
" stop:\n",
|
|
" episode_reward_mean: 475\n",
|
|
" time_total_s: 300\n",
|
|
" checkpoint_config:\n",
|
|
" checkpoint_frequency: 2\n",
|
|
" checkpoint_at_end: true\n",
|
|
" config:\n",
|
|
" # Works for both torch and tf.\n",
|
|
" framework: torch\n",
|
|
" gamma: 0.99\n",
|
|
" lr: 0.0003\n",
|
|
" num_workers: 1\n",
|
|
" observation_filter: MeanStdFilter\n",
|
|
" num_sgd_iter: 6\n",
|
|
" vf_loss_coeff: 0.01\n",
|
|
" model:\n",
|
|
" fcnet_hiddens: [32]\n",
|
|
" fcnet_activation: linear\n",
|
|
" vf_share_layers: true\n",
|
|
" enable_connectors: true\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor experiment\n",
|
|
"\n",
|
|
"Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060289002
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(training_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Stop the run\n",
|
|
"To stop the run, call `training_run.cancel()`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"# training_run.cancel()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Wait for completion\n",
|
|
"Wait for the run to complete before proceeding.\n",
|
|
"\n",
|
|
"**Note: The length of the run depends on the provisioning time of the compute target and it may take several minutes to complete.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060297005
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"training_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get access to training artifacts\n",
|
|
"We can simply use run id to get a handle to an in-progress or a previously concluded run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060517858
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Run\n",
|
|
"\n",
|
|
"run_id = training_run.id # Or set to run id of a completed run (e.g. 'rl-cartpole-v0_1587572312_06e04ace_head')\n",
|
|
"run = Run(experiment, run_id=run_id)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can use the Run API to download policy training artifacts (saved model and checkpoints) to local compute."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060521847
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from os import path\n",
|
|
"from distutils import dir_util\n",
|
|
"\n",
|
|
"training_artifacts_path = path.join(\"logs\", \"cartpole-ppo\")\n",
|
|
"print(\"Training artifacts path:\", training_artifacts_path)\n",
|
|
"\n",
|
|
"if path.exists(training_artifacts_path):\n",
|
|
" dir_util.remove_tree(training_artifacts_path)\n",
|
|
"\n",
|
|
"# Download run artifacts to local compute\n",
|
|
"training_run.download_files(training_artifacts_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Display movies of selected training episodes\n",
|
|
"\n",
|
|
"Ray creates video output of selected training episodes in mp4 format. Here we will display two of these, i.e. the first and the last recorded videos, so you could see the improvement of the agent after training.\n",
|
|
"\n",
|
|
"First we introduce a few helper functions: a function to download the movies from our dataset, another one to find mp4 movies in a local directory, and one more to display a downloaded movie."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060867182
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import shutil\n",
|
|
"\n",
|
|
"# A helper function to find movies in a directory\n",
|
|
"def find_movies(movie_path):\n",
|
|
" print(\"Looking in path:\", movie_path)\n",
|
|
" mp4_movies = []\n",
|
|
" for root, _, files in os.walk(movie_path):\n",
|
|
" for name in files:\n",
|
|
" if name.endswith('.mp4'):\n",
|
|
" mp4_movies.append(path.join(root, name))\n",
|
|
" print('Found {} movies'.format(len(mp4_movies)))\n",
|
|
"\n",
|
|
" return mp4_movies\n",
|
|
"\n",
|
|
"\n",
|
|
"# A helper function to display a movie\n",
|
|
"from IPython.core.display import Video\n",
|
|
"from IPython.display import display\n",
|
|
"def display_movie(movie_file):\n",
|
|
" display(Video(movie_file, embed=True, html_attributes='controls'))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Look for the downloaded movies in the local directory and sort them."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060871682
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"mp4_files = find_movies(training_artifacts_path)\n",
|
|
"mp4_files.sort()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Display a movie of the first training episode. This is how the agent performs with no training."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060900828
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"first_movie = mp4_files[0] if len(mp4_files) > 0 else None\n",
|
|
"print(\"First movie:\", first_movie)\n",
|
|
"\n",
|
|
"if first_movie:\n",
|
|
" display_movie(first_movie)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Display a movie of the last training episode. This is how a fully-trained agent performs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683060914790
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"last_movie = mp4_files[-1] if len(mp4_files) > 0 else None\n",
|
|
"print(\"Last movie:\", last_movie)\n",
|
|
"\n",
|
|
"if last_movie:\n",
|
|
" display_movie(last_movie)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Evaluate Trained Agent and See Results\n",
|
|
"\n",
|
|
"We can evaluate a previously trained policy using the `rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
|
|
"In this section we show how to use these checkpoints to evaluate the trained policy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Evaluate a trained policy\n",
|
|
"In this section, we submit another job, to evalute a trained policy. The entrypoint for this job is\n",
|
|
"`cartpole-rollout.py` script, and we we pass the checkpoints dataset to this script as a dataset refrence.\n",
|
|
"\n",
|
|
"We are using script parameters to pass in the same algorithm and the same environment used during training. We also specify the checkpoint number of the checkpoint we wish to evaluate, `checkpoint-number`, and number of the steps we shall run the rollout, `steps`.\n",
|
|
"\n",
|
|
"The training artifacts dataset will be accessible to the rollout script as a mounted folder. The mounted folder and the checkpoint number, passed in via `checkpoint-number`, will be used to create a path to the checkpoint we are going to evaluate. The created checkpoint path then will be passed into RLlib rollout script for evaluation.\n",
|
|
"\n",
|
|
"Let's find the checkpoints and the last checkpoint number first."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683061167899
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A helper function to find all of the checkpoint directories located within a larger directory tree\n",
|
|
"def find_checkpoints(file_path):\n",
|
|
" print(\"Looking in path:\", file_path)\n",
|
|
" checkpoints = []\n",
|
|
" for root, dirs, files in os.walk(file_path):\n",
|
|
" trimmed_root = root[len(file_path)+1:]\n",
|
|
" for name in dirs:\n",
|
|
" if name.startswith('checkpoint_'):\n",
|
|
" checkpoints.append(path.join(trimmed_root, name))\n",
|
|
" return checkpoints"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683061170184
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Find checkpoints and last checkpoint number\n",
|
|
"checkpoint_files = find_checkpoints(training_artifacts_path)\n",
|
|
"\n",
|
|
"last_checkpoint_path = None\n",
|
|
"last_checkpoint_number = -1\n",
|
|
"for checkpoint_file in checkpoint_files:\n",
|
|
" checkpoint_number = int(os.path.basename(checkpoint_file).split('_')[1])\n",
|
|
" if checkpoint_number > last_checkpoint_number:\n",
|
|
" last_checkpoint_path = checkpoint_file\n",
|
|
" last_checkpoint_number = checkpoint_number\n",
|
|
"\n",
|
|
"print(\"Last checkpoint number:\", last_checkpoint_number)\n",
|
|
"print(\"Last checkpoint path:\", last_checkpoint_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683061176740
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Upload the checkpoint files and create a DataSet\n",
|
|
"from azureml.data.dataset_factory import FileDatasetFactory\n",
|
|
"\n",
|
|
"datastore = ws.get_default_datastore()\n",
|
|
"checkpoint_ds = FileDatasetFactory.upload_directory(training_artifacts_path, (datastore, 'cartpole_checkpoints_' + training_run.id), overwrite=False, show_progress=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can submit the training run using a `ScriptRunConfig`. By providing the\n",
|
|
"command to run the training, and a `RunConfig` object configured w"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062377151
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"ray_environment_name = 'cartpole-ray-sc'\n",
|
|
"\n",
|
|
"experiment_name = 'CartPole-v1-SC'\n",
|
|
"training_algorithm = 'PPO'\n",
|
|
"rl_environment = 'CartPole-v1'\n",
|
|
"\n",
|
|
"experiment = Experiment(workspace=ws, name=experiment_name)\n",
|
|
"ray_environment = Environment.get(workspace=ws, name=ray_environment_name)\n",
|
|
"\n",
|
|
"script_name = 'cartpole_rollout.py'\n",
|
|
"script_arguments = [\n",
|
|
" '--steps', '2000',\n",
|
|
" '--checkpoint', last_checkpoint_path,\n",
|
|
" '--algo', 'PPO',\n",
|
|
" '--render', 'true',\n",
|
|
" '--dataset_path', checkpoint_ds.as_named_input('dataset_path').as_mount()\n",
|
|
"]\n",
|
|
"\n",
|
|
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
|
|
"aml_run_config_ml.target = compute_target\n",
|
|
"aml_run_config_ml.node_count = 1\n",
|
|
"aml_run_config_ml.environment = ray_environment\n",
|
|
"aml_run_config_ml.data\n",
|
|
"\n",
|
|
"rollout_config = ScriptRunConfig(\n",
|
|
" source_directory='./files',\n",
|
|
" script=script_name,\n",
|
|
" arguments=script_arguments,\n",
|
|
" run_config = aml_run_config_ml\n",
|
|
" )\n",
|
|
" \n",
|
|
"rollout_run = experiment.submit(rollout_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And then, similar to the training section, we can monitor the real-time progress of the rollout run and its chid as follows. If you browse logs of the child run you can see the evaluation results recorded in driver_log.txt file. Note that you may need to wait several minutes before these results become available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062379999
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"RunDetails(rollout_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Wait for completion of the rollout run before moving to the next section, or you may cancel the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062451723
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"#rollout_run.cancel()\n",
|
|
"rollout_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Display movies of selected rollout episodes\n",
|
|
"\n",
|
|
"To display recorded movies first we download recorded videos to local machine. Here again we create a dataset of rollout artifacts and use the helper functions introduced above to download and displays rollout videos."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062747822
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Download rollout artifacts\n",
|
|
"rollout_artifacts_path = path.join(\"logs\", \"rollout\")\n",
|
|
"print(\"Rollout artifacts path:\", rollout_artifacts_path)\n",
|
|
"\n",
|
|
"if path.exists(rollout_artifacts_path):\n",
|
|
" dir_util.remove_tree(rollout_artifacts_path)\n",
|
|
"\n",
|
|
"# Download videos to local compute\n",
|
|
"rollout_run.download_files(\"logs/video\", output_directory = rollout_artifacts_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, similar to the training section, we look for the last video."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062752847
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Look for the downloaded movie in local directory\n",
|
|
"mp4_files = find_movies(rollout_artifacts_path)\n",
|
|
"mp4_files.sort()\n",
|
|
"last_movie = mp4_files[-1] if len(mp4_files) > 1 else None\n",
|
|
"print(\"Last movie:\", last_movie)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Display last video recorded during the rollout."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1683062763275
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"last_movie = mp4_files[-1] if len(mp4_files) > 0 else None\n",
|
|
"print(\"Last movie:\", last_movie)\n",
|
|
"\n",
|
|
"if last_movie:\n",
|
|
" display_movie(last_movie)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cleaning up\n",
|
|
"For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# To archive the created experiment:\n",
|
|
"#exp.archive()\n",
|
|
"\n",
|
|
"# To delete the compute target:\n",
|
|
"#compute_target.delete()\n",
|
|
"\n",
|
|
"# To delete downloaded training artifacts\n",
|
|
"#if os.path.exists(training_artifacts_path):\n",
|
|
"# dir_util.remove_tree(training_artifacts_path)\n",
|
|
"\n",
|
|
"# To delete downloaded rollout videos\n",
|
|
"#if path.exists(rollout_artifacts_path):\n",
|
|
"# dir_util.remove_tree(rollout_artifacts_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next\n",
|
|
"This example was about running Reinforcement Learning in Azure Machine Learning (Ray/RLlib Framework) on a single compute. Please see [Pong Problem](../atari-on-distributed-compute/pong_rllib.ipynb)\n",
|
|
"example which uses Ray RLlib to train a Pong playing agent on a multi-node cluster."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "hoazari"
|
|
},
|
|
{
|
|
"name": "dasommer"
|
|
}
|
|
],
|
|
"categories": [
|
|
"how-to-use-azureml",
|
|
"reinforcement-learning"
|
|
],
|
|
"kernel_info": {
|
|
"name": "python38-azureml"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.8 - AzureML",
|
|
"language": "python",
|
|
"name": "python38-azureml"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.5"
|
|
},
|
|
"microsoft": {
|
|
"host": {
|
|
"AzureML": {
|
|
"notebookHasBeenCompleted": true
|
|
}
|
|
},
|
|
"ms_spell_check": {
|
|
"ms_spell_check_language": "en"
|
|
}
|
|
},
|
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
|
"nteract": {
|
|
"version": "nteract-front-end@1.0.0"
|
|
},
|
|
"vscode": {
|
|
"interpreter": {
|
|
"hash": "00c28698cbad9eaca051e9759b1181630e646922505b47b4c6352eb5aa72ddfc"
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
} |