mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
700 lines
27 KiB
Plaintext
700 lines
27 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Azure ML Reinforcement Learning Sample - Cartpole Problem on Compute Instance\n",
|
|
"\n",
|
|
"Azure ML Reinforcement Learning (Azure ML RL) is a managed service for running reinforcement learning training and simulation. With Azure MLRL, data scientists can start developing RL systems on one machine, and scale to compute clusters with 100\u00e2\u20ac\u2122s of nodes if needed.\n",
|
|
"\n",
|
|
"This example shows how to use Azure ML RL to train a Cartpole playing agent on a compute instance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Cartpole problem\n",
|
|
"\n",
|
|
"Cartpole, also known as [Inverted Pendulum](https://en.wikipedia.org/wiki/Inverted_pendulum), is a pendulum with a center of mass above its pivot point. This formation is essentially unstable and will easily fall over but can be kept balanced by applying appropriate horizontal forces to the pivot point.\n",
|
|
"\n",
|
|
"<table style=\"width:50%\">\n",
|
|
" <tr>\n",
|
|
" <th>\n",
|
|
" <img src=\"./images/cartpole.png\" alt=\"Cartpole image\" /> \n",
|
|
" </th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th><p>Fig 1. Cartpole problem schematic description (from <a href=\"https://towardsdatascience.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288\">towardsdatascience.com</a>).</p></th>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
"\n",
|
|
"The goal here is to train an agent to keep the cartpole balanced by applying appropriate forces to the pivot point.\n",
|
|
"\n",
|
|
"See [this video](https://www.youtube.com/watch?v=XiigTGKZfks) for a real-world demonstration of cartpole problem."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prerequisite\n",
|
|
"The user should have completed the Azure Machine Learning Tutorial: [Get started creating your first ML experiment with the Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). You will need to make sure that you have a valid subscription id, a resource group and a workspace. All datastores and datasets you use should be associated with your workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set up Development Environment\n",
|
|
"The following subsections show typical steps to setup your development environment. Setup includes:\n",
|
|
"\n",
|
|
"* Connecting to a workspace to enable communication between your local machine and remote resources\n",
|
|
"* Creating an experiment to track all your runs\n",
|
|
"* Using a Compute Instance as compute target"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Azure ML SDK \n",
|
|
"Display the Azure ML SDK version."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get Azure ML workspace\n",
|
|
"Get a reference to an existing Azure ML workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Use Compute Instance as compute target\n",
|
|
"\n",
|
|
"A compute target is a designated compute resource where you run your training and simulation scripts. This location may be your local machine or a cloud-based compute resource. For more information see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)\n",
|
|
"\n",
|
|
"The code below shows how to use current compute instance as a compute target. First some helper functions:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os.path\n",
|
|
"\n",
|
|
"\n",
|
|
"# Get information about the currently running compute instance (notebook VM), like its name and prefix.\n",
|
|
"def load_nbvm():\n",
|
|
" if not os.path.isfile(\"/mnt/azmnt/.nbvm\"):\n",
|
|
" return None\n",
|
|
" with open(\"/mnt/azmnt/.nbvm\", 'r') as file:\n",
|
|
" return {key:value for (key, value) in [line.strip().split('=') for line in file]}\n",
|
|
"\n",
|
|
"\n",
|
|
"# Get information about the capabilities of an azureml.core.compute.AmlCompute target\n",
|
|
"# In particular how much RAM + GPU + HDD it has.\n",
|
|
"def get_compute_size(self, workspace):\n",
|
|
" for size in self.supported_vmsizes(workspace):\n",
|
|
" if(size['name'].upper() == self.vm_size):\n",
|
|
" return size\n",
|
|
"\n",
|
|
"azureml.core.compute.ComputeTarget.size = get_compute_size\n",
|
|
"del(get_compute_size)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then we use these helper functions to get a handle to current compute instance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load current compute instance info\n",
|
|
"current_compute_instance = load_nbvm()\n",
|
|
"print(\"Current compute instance:\", current_compute_instance)\n",
|
|
"\n",
|
|
"# For this demo, let's use the current compute instance as the compute target, if available\n",
|
|
"if current_compute_instance:\n",
|
|
" instance_name = current_compute_instance['instance']\n",
|
|
"else:\n",
|
|
" instance_name = next(iter(ws.compute_targets))\n",
|
|
"\n",
|
|
"compute_target = ws.compute_targets[instance_name]\n",
|
|
"\n",
|
|
"print(\"Compute target status:\")\n",
|
|
"print(compute_target.get_status().serialize())\n",
|
|
"\n",
|
|
"print(\"Compute target size:\")\n",
|
|
"print(compute_target.size(ws))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Azure ML experiment\n",
|
|
"Create an experiment to track the runs in your workspace. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'CartPole-v0-CI'\n",
|
|
"exp = Experiment(workspace=ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train Cartpole Agent Using Azure ML RL\n",
|
|
"To facilitate reinforcement learning, Azure Machine Learning Python SDK provides a high level abstraction, the _ReinforcementLearningEstimator_ class, which allows users to easily construct RL run configurations for the underlying RL framework. Azure ML RL initially supports the [Ray framework](https://ray.io/) and its highly customizable [RLlib](https://ray.readthedocs.io/en/latest/rllib.html#rllib-scalable-reinforcement-learning). In this section we show how to use _ReinforcementLearningEstimator_ and Ray/RLlib framework to train a cartpole playing agent. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create reinforcement learning estimator\n",
|
|
"\n",
|
|
"The code below creates an instance of *ReinforcementLearningEstimator*, `training_estimator`, which then will be used to submit a job to Azure Machine Learning to start the Ray experiment run.\n",
|
|
"\n",
|
|
"Note that this example is purposely simplified to the minimum. Here is a short description of the parameters we are passing into the constructor:\n",
|
|
"\n",
|
|
"- `source_directory`, local directory containing your training script(s) and helper modules,\n",
|
|
"- `entry_script`, path to your entry script relative to the source directory,\n",
|
|
"- `script_params`, constant parameters to be passed to each run of training script,\n",
|
|
"- `compute_target`, reference to the compute target in which the trainer and worker(s) jobs will be executed,\n",
|
|
"- `rl_framework`, the RL framework to be used (currently must be Ray).\n",
|
|
"\n",
|
|
"We use the `script_params` parameter to pass in general and algorithm-specific parameters to the training script.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.contrib.train.rl import ReinforcementLearningEstimator, Ray\n",
|
|
"\n",
|
|
"training_algorithm = \"PPO\"\n",
|
|
"rl_environment = \"CartPole-v0\"\n",
|
|
"\n",
|
|
"script_params = {\n",
|
|
"\n",
|
|
" # Training algorithm\n",
|
|
" \"--run\": training_algorithm,\n",
|
|
" \n",
|
|
" # Training environment\n",
|
|
" \"--env\": rl_environment,\n",
|
|
" \n",
|
|
" # Algorithm-specific parameters\n",
|
|
" \"--config\": '\\'{\"num_gpus\": 0, \"num_workers\": 1}\\'',\n",
|
|
" \n",
|
|
" # Stop conditions\n",
|
|
" \"--stop\": '\\'{\"episode_reward_mean\": 200, \"time_total_s\": 300}\\'',\n",
|
|
" \n",
|
|
" # Frequency of taking checkpoints\n",
|
|
" \"--checkpoint-freq\": 2,\n",
|
|
" \n",
|
|
" # If a checkpoint should be taken at the end - optional argument with no value\n",
|
|
" \"--checkpoint-at-end\": \"\",\n",
|
|
" \n",
|
|
" # Log directory\n",
|
|
" \"--local-dir\": './logs'\n",
|
|
"}\n",
|
|
"\n",
|
|
"training_estimator = ReinforcementLearningEstimator(\n",
|
|
"\n",
|
|
" # Location of source files\n",
|
|
" source_directory='files',\n",
|
|
" \n",
|
|
" # Python script file\n",
|
|
" entry_script='cartpole_training.py',\n",
|
|
" \n",
|
|
" # A dictionary of arguments to pass to the training script specified in ``entry_script``\n",
|
|
" script_params=script_params,\n",
|
|
" \n",
|
|
" # The Azure ML compute target set up for Ray head nodes\n",
|
|
" compute_target=compute_target,\n",
|
|
" \n",
|
|
" # RL framework. Currently must be Ray.\n",
|
|
" rl_framework=Ray()\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Training script\n",
|
|
"\n",
|
|
"As recommended in RLlib documentations, we use Ray Tune API to run the training algorithm. All the RLlib built-in trainers are compatible with the Tune API. Here we use `tune.run()` to execute a built-in training algorithm. For convenience, down below you can see part of the entry script where we make this call.\n",
|
|
"\n",
|
|
"This is the list of parameters we are passing into `tune.run()` via the `script_params` parameter:\n",
|
|
"\n",
|
|
"- `run_or_experiment`: name of the [built-in algorithm](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#rllib-algorithms), 'PPO' in our example,\n",
|
|
"- `config`: Algorithm-specific configuration. This includes specifying the environment, `env`, which in our example is the gym **[CartPole-v0](https://gym.openai.com/envs/CartPole-v0/)** environment,\n",
|
|
"- `stop`: stopping conditions, which could be any of the metrics returned by the trainer. Here we use \"mean of episode reward\", and \"total training time in seconds\" as stop conditions, and\n",
|
|
"- `checkpoint_freq` and `checkpoint_at_end`: Frequency of taking checkpoints (number of training iterations between checkpoints), and if a checkpoint should be taken at the end.\n",
|
|
"\n",
|
|
"We also specify the `local_dir`, the directory in which the training logs, checkpoints and other training artificats will be recorded. \n",
|
|
"\n",
|
|
"See [RLlib Training APIs](https://ray.readthedocs.io/en/latest/rllib-training.html#rllib-training-apis) for more details, and also [Training (tune.run, tune.Experiment)](https://ray.readthedocs.io/en/latest/tune/api_docs/execution.html#training-tune-run-tune-experiment) for the complete list of parameters.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import ray\n",
|
|
"import ray.tune as tune\n",
|
|
"\n",
|
|
"if __name__ == \"__main__\":\n",
|
|
"\n",
|
|
" # parse arguments ...\n",
|
|
" \n",
|
|
" # Intitialize ray\n",
|
|
" ay.init(address=args.ray_address)\n",
|
|
"\n",
|
|
" # Run training task using tune.run\n",
|
|
" tune.run(\n",
|
|
" run_or_experiment=args.run,\n",
|
|
" config=dict(args.config, env=args.env),\n",
|
|
" stop=args.stop,\n",
|
|
" checkpoint_freq=args.checkpoint_freq,\n",
|
|
" checkpoint_at_end=args.checkpoint_at_end,\n",
|
|
" local_dir=args.local_dir\n",
|
|
" )\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Submit the estimator to start experiment\n",
|
|
"Now we use the *training_estimator* to submit a run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"training_run = exp.submit(training_estimator)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor experiment\n",
|
|
"Azure ML provides a Jupyter widget to show the real-time status of an experiment run. You could use this widget to monitor status of the runs.\n",
|
|
"\n",
|
|
"Note that _ReinforcementLearningEstimator_ creates at least two runs: (a) A parent run, i.e. the run returned above, and (b) a collection of child runs. The number of the child runs depends on the configuration of the reinforcement learning estimator. In our simple scenario, configured above, only one child run will be created.\n",
|
|
"\n",
|
|
"The widget will show a list of the child runs as well. You can click on the link under **Status** to see the details of a child run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(training_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Stop the run\n",
|
|
"\n",
|
|
"To cancel the run, call `training_run.cancel()`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"# training_run.cancel()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Wait for completion\n",
|
|
"Wait for the run to complete before proceeding.\n",
|
|
"\n",
|
|
"**Note: The run may take a few minutes to complete.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"training_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get a handle to the child run\n",
|
|
"You can obtain a handle to the child run as follows. In our scenario, there is only one child run, we have it called `child_run_0`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import time\n",
|
|
"\n",
|
|
"child_run_0 = None\n",
|
|
"timeout = 30\n",
|
|
"while timeout > 0 and not child_run_0:\n",
|
|
" child_runs = list(training_run.get_children())\n",
|
|
" print('Number of child runs:', len(child_runs))\n",
|
|
" if len(child_runs) > 0:\n",
|
|
" child_run_0 = child_runs[0]\n",
|
|
" break\n",
|
|
" time.sleep(2) # Wait for 2 seconds\n",
|
|
" timeout -= 2\n",
|
|
"\n",
|
|
"print('Child run info:')\n",
|
|
"print(child_run_0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Evaluate Trained Agent and See Results\n",
|
|
"\n",
|
|
"We can evaluate a previously trained policy using the `rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
|
|
"\n",
|
|
"In this section we show how to get access to these checkpoints data, and then how to use them to evaluate the trained policy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a dataset of training artifacts\n",
|
|
"To evaluate a trained policy (a checkpoint) we need to make the checkpoint accessible to the rollout script. All the training artifacts are stored in workspace default datastore under **azureml/<run_id>** directory.\n",
|
|
"\n",
|
|
"Here we create a file dataset from the stored artifacts, and then use this dataset to feed these data to rollout estimator."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Dataset\n",
|
|
"\n",
|
|
"run_id = child_run_0.id # Or set to run id of a completed run (e.g. 'rl-cartpole-v0_1587572312_06e04ace_head')\n",
|
|
"run_artifacts_path = os.path.join('azureml', run_id)\n",
|
|
"print(\"Run artifacts path:\", run_artifacts_path)\n",
|
|
"\n",
|
|
"# Create a file dataset object from the files stored on default datastore\n",
|
|
"datastore = ws.get_default_datastore()\n",
|
|
"training_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To verify, we can print out the number (and paths) of all the files in the dataset, as follows."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"artifacts_paths = training_artifacts_ds.to_path()\n",
|
|
"print(\"Number of files in dataset:\", len(artifacts_paths))\n",
|
|
"\n",
|
|
"# Uncomment line below to print all file paths\n",
|
|
"#print(\"Artifacts dataset file paths: \", artifacts_paths)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Evaluate a trained policy\n",
|
|
"We need to configure another reinforcement learning estimator, `rollout_estimator`, and then use it to submit another run. Note that the entry script for this estimator now points to `cartpole-rollout.py` script.\n",
|
|
"Also note how we pass the checkpoints dataset to this script using `inputs` parameter of the _ReinforcementLearningEstimator_.\n",
|
|
"\n",
|
|
"We are using script parameters to pass in the same algorithm and the same environment used during training. We also specify the checkpoint number of the checkpoint we wish to evaluate, `checkpoint-number`, and number of the steps we shall run the rollout, `steps`.\n",
|
|
"\n",
|
|
"The checkpoints dataset will be accessible to the rollout script as a mounted folder. The mounted folder and the checkpoint number, passed in via `checkpoint-number`, will be used to create a path to the checkpoint we are going to evaluate. The created checkpoint path then will be passed into RLlib rollout script for evaluation.\n",
|
|
"\n",
|
|
"Let's find the checkpoints and the last checkpoint number first."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Find checkpoints and last checkpoint number\n",
|
|
"from os import path\n",
|
|
"checkpoint_files = [\n",
|
|
" os.path.basename(file) for file in training_artifacts_ds.to_path() \\\n",
|
|
" if os.path.basename(file).startswith('checkpoint-') and \\\n",
|
|
" not os.path.basename(file).endswith('tune_metadata')\n",
|
|
"]\n",
|
|
"\n",
|
|
"checkpoint_numbers = []\n",
|
|
"for file in checkpoint_files:\n",
|
|
" checkpoint_numbers.append(int(file.split('-')[1]))\n",
|
|
"\n",
|
|
"print(\"Checkpoints:\", checkpoint_numbers)\n",
|
|
"\n",
|
|
"last_checkpoint_number = max(checkpoint_numbers)\n",
|
|
"print(\"Last checkpoint number:\", last_checkpoint_number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now let's configure rollout estimator. Note that we use the last checkpoint for evaluation. The assumption is that the last checkpoint points to our best trained agent. You may change this to any of the checkpoint numbers printed above and observe the effect."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"script_params = { \n",
|
|
" # Checkpoint number of the checkpoint from which to roll out\n",
|
|
" \"--checkpoint-number\": last_checkpoint_number,\n",
|
|
"\n",
|
|
" # Training algorithm\n",
|
|
" \"--run\": training_algorithm,\n",
|
|
" \n",
|
|
" # Training environment\n",
|
|
" \"--env\": rl_environment,\n",
|
|
" \n",
|
|
" # Algorithm-specific parameters\n",
|
|
" \"--config\": '{}',\n",
|
|
" \n",
|
|
" # Number of rollout steps \n",
|
|
" \"--steps\": 2000,\n",
|
|
" \n",
|
|
" # If should repress rendering of the environment\n",
|
|
" \"--no-render\": \"\"\n",
|
|
"}\n",
|
|
"\n",
|
|
"rollout_estimator = ReinforcementLearningEstimator(\n",
|
|
" # Location of source files\n",
|
|
" source_directory='files',\n",
|
|
" \n",
|
|
" # Python script file\n",
|
|
" entry_script='cartpole_rollout.py',\n",
|
|
" \n",
|
|
" # A dictionary of arguments to pass to the rollout script specified in ``entry_script``\n",
|
|
" script_params = script_params,\n",
|
|
" \n",
|
|
" # Data inputs\n",
|
|
" inputs=[\n",
|
|
" training_artifacts_ds.as_named_input('artifacts_dataset'),\n",
|
|
" training_artifacts_ds.as_named_input('artifacts_path').as_mount()],\n",
|
|
" \n",
|
|
" # The Azure ML compute target\n",
|
|
" compute_target=compute_target,\n",
|
|
" \n",
|
|
" # RL framework. Currently must be Ray.\n",
|
|
" rl_framework=Ray(),\n",
|
|
" \n",
|
|
" # Additional pip packages to install\n",
|
|
" pip_packages = ['azureml-dataprep[fuse,pandas]'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Same as before, we use the *rollout_estimator* to submit a run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rollout_run = exp.submit(rollout_estimator)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And then, similar to the training section, we can monitor the real-time progress of the rollout run and its chid as follows. If you browse logs of the child run you can see the evaluation results recorded in driver_log.txt file. Note that you may need to wait several minutes before these results become available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(rollout_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Wait for completion of the rollout run, or you may cancel the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"#rollout_run.cancel()\n",
|
|
"rollout_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cleaning up\n",
|
|
"For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# To archive the created experiment:\n",
|
|
"#exp.archive()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next\n",
|
|
"This example was about running Azure ML RL (Ray/RLlib Framework) on compute instance. Please see [Cartpole problem](../cartpole-on-single-compute/cartpole_cc.ipynb)\n",
|
|
"example which uses Ray RLlib to train a Cartpole playing agent on a single node remote compute.\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "adrosa"
|
|
},
|
|
{
|
|
"name": "hoazari"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.9"
|
|
},
|
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |