mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
774 lines
28 KiB
Plaintext
774 lines
28 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Compute Instance\n",
|
|
"\n",
|
|
"Reinforcement Learning in Azure Machine Learning is a managed service for running reinforcement learning training and simulation. With Reinforcement Learning in Azure Machine Learning, data scientists can start developing reinforcement learning systems on one machine, and scale to compute targets with 100s of nodes if needed.\n",
|
|
"\n",
|
|
"This example shows how to use Reinforcement Learning in Azure Machine Learning to train a Cartpole playing agent on a compute instance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Cartpole problem\n",
|
|
"\n",
|
|
"Cartpole, also known as [Inverted Pendulum](https://en.wikipedia.org/wiki/Inverted_pendulum), is a pendulum with a center of mass above its pivot point. This formation is essentially unstable and will easily fall over but can be kept balanced by applying appropriate horizontal forces to the pivot point.\n",
|
|
"\n",
|
|
"<table style=\"width:50%\">\n",
|
|
" <tr>\n",
|
|
" <th>\n",
|
|
" <img src=\"./images/cartpole.png\" alt=\"Cartpole image\" /> \n",
|
|
" </th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th><p>Fig 1. Cartpole problem schematic description (from <a href=\"https://towardsdatascience.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288\">towardsdatascience.com</a>).</p></th>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
"\n",
|
|
"The goal here is to train an agent to keep the cartpole balanced by applying appropriate forces to the pivot point.\n",
|
|
"\n",
|
|
"See [this video](https://www.youtube.com/watch?v=XiigTGKZfks) for a real-world demonstration of cartpole problem."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prerequisite\n",
|
|
"The user should have completed the Azure Machine Learning Tutorial: [Get started creating your first ML experiment with the Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). You will need to make sure that you have a valid subscription ID, a resource group, and an Azure Machine Learning workspace. All datastores and datasets you use should be associated with your workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set up Development Environment\n",
|
|
"The following subsections show typical steps to setup your development environment. Setup includes:\n",
|
|
"\n",
|
|
"* Connecting to a workspace to enable communication between your local machine and remote resources\n",
|
|
"* Creating an experiment to track all your runs\n",
|
|
"* Using a Compute Instance as compute target"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Azure Machine Learning SDK \n",
|
|
"Display the Azure Machine Learning SDK version."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646344676671
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"print(\"Azure Machine Learning SDK Version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get Azure Machine Learning workspace\n",
|
|
"Get a reference to an existing Azure Machine Learning workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646344680982
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Use Compute Instance as compute target\n",
|
|
"\n",
|
|
"A compute target is a designated compute resource where you run your training and simulation scripts. This location may be your local machine or a cloud-based compute resource. For more information see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)\n",
|
|
"\n",
|
|
"The code below shows how to use current compute instance as a compute target. First some helper functions:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646344684217
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os.path\n",
|
|
"\n",
|
|
"# Get information about the currently running compute instance (notebook VM), like its name and prefix.\n",
|
|
"def load_nbvm():\n",
|
|
" if not os.path.isfile(\"/mnt/azmnt/.nbvm\"):\n",
|
|
" return None\n",
|
|
" with open(\"/mnt/azmnt/.nbvm\", 'r') as nbvm_file:\n",
|
|
" return { key:value for (key, value) in [ line.strip().split('=') for line in nbvm_file if '=' in line ] }\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then we use these helper functions to get a handle to current compute instance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646344690768
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeInstance\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"import random\n",
|
|
"import string\n",
|
|
"\n",
|
|
"# Load current compute instance info\n",
|
|
"current_compute_instance = load_nbvm()\n",
|
|
"\n",
|
|
"# For this demo, let's use the current compute instance as the compute target, if available\n",
|
|
"if current_compute_instance:\n",
|
|
" print(\"Current compute instance:\", current_compute_instance)\n",
|
|
" instance_name = current_compute_instance['instance']\n",
|
|
"else:\n",
|
|
" # Compute instance name needs to be unique across all existing compute instances within an Azure region\n",
|
|
" instance_name = \"cartpole-ci-\" + \"\".join(random.choice(string.ascii_lowercase) for _ in range(5))\n",
|
|
" try:\n",
|
|
" instance = ComputeInstance(workspace=ws, name=instance_name)\n",
|
|
" print('Found existing instance, use it.')\n",
|
|
" except ComputeTargetException:\n",
|
|
" print(\"Creating new compute instance...\")\n",
|
|
" compute_config = ComputeInstance.provisioning_configuration(\n",
|
|
" vm_size='STANDARD_D2_V2'\n",
|
|
" )\n",
|
|
" instance = ComputeInstance.create(ws, instance_name, compute_config)\n",
|
|
" instance.wait_for_completion(show_output=True)\n",
|
|
" print(\"Instance name:\", instance_name)\n",
|
|
"\n",
|
|
"compute_target = ws.compute_targets[instance_name]\n",
|
|
"\n",
|
|
"print(\"Compute target status:\")\n",
|
|
"print(compute_target.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Azure Machine Learning experiment\n",
|
|
"Create an experiment to track the runs in your workspace. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646344835579
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'CartPole-v0-CI'\n",
|
|
"experiment = Experiment(workspace=ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646346293902
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"import os\n",
|
|
"import time\n",
|
|
"\n",
|
|
"ray_environment_name = 'cartpole-ray-ci'\n",
|
|
"ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'files', 'docker', 'Dockerfile')\n",
|
|
"\n",
|
|
"# Build environment image\n",
|
|
"ray_environment = Environment. \\\n",
|
|
" from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
|
|
" register(workspace=ws)\n",
|
|
"ray_env_build_details = ray_environment.build(workspace=ws)\n",
|
|
"\n",
|
|
"ray_env_build_details.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train Cartpole Agent\n",
|
|
"In this section, we show how to use Azure Machine Learning jobs and Ray/RLlib framework to train a cartpole playing agent. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create reinforcement learning training run\n",
|
|
"\n",
|
|
"The code below submits the training run using a `ScriptRunConfig`. By providing the\n",
|
|
"command to run the training, and a `RunConfig` object configured with your\n",
|
|
"compute target, number of nodes, and environment image to use."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347120585
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"from azureml.core import RunConfiguration, ScriptRunConfig, Experiment\n",
|
|
"from azureml.core.runconfig import DockerConfiguration, RunConfiguration\n",
|
|
"\n",
|
|
"training_algorithm = 'PPO'\n",
|
|
"rl_environment = 'CartPole-v0'\n",
|
|
"\n",
|
|
"script_name = 'cartpole_training.py'\n",
|
|
"script_arguments = [\n",
|
|
" '--run', training_algorithm,\n",
|
|
" '--env', rl_environment,\n",
|
|
" '--config', '{\"num_gpus\": 0, \"num_workers\": 1}',\n",
|
|
" '--stop', '{\"episode_reward_mean\": 200, \"time_total_s\": 300}',\n",
|
|
" '--checkpoint-freq', '2',\n",
|
|
" '--checkpoint-at-end',\n",
|
|
" '--local-dir', './logs'\n",
|
|
"]\n",
|
|
"\n",
|
|
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
|
|
"aml_run_config_ml.target = compute_target\n",
|
|
"aml_run_config_ml.docker = DockerConfiguration(use_docker=True)\n",
|
|
"aml_run_config_ml.node_count = 1\n",
|
|
"aml_run_config_ml.environment = ray_environment\n",
|
|
"\n",
|
|
"training_config = ScriptRunConfig(source_directory='./files',\n",
|
|
" script=script_name,\n",
|
|
" arguments=script_arguments,\n",
|
|
" run_config = aml_run_config_ml\n",
|
|
" )\n",
|
|
"training_run = experiment.submit(training_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Training script\n",
|
|
"\n",
|
|
"As recommended in RLlib documentations, we use Ray Tune API to run the training algorithm. All the RLlib built-in trainers are compatible with the Tune API. Here we use `tune.run()` to execute a built-in training algorithm. For convenience, down below you can see part of the entry script where we make this call.\n",
|
|
"\n",
|
|
"This is the list of parameters we are passing into `tune.run()` via the `script_params` parameter:\n",
|
|
"\n",
|
|
"- `run_or_experiment`: name of the [built-in algorithm](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#rllib-algorithms), 'PPO' in our example,\n",
|
|
"- `config`: Algorithm-specific configuration. This includes specifying the environment, `env`, which in our example is the gym **[CartPole-v0](https://gym.openai.com/envs/CartPole-v0/)** environment,\n",
|
|
"- `stop`: stopping conditions, which could be any of the metrics returned by the trainer. Here we use \"mean of episode reward\", and \"total training time in seconds\" as stop conditions, and\n",
|
|
"- `checkpoint_freq` and `checkpoint_at_end`: Frequency of taking checkpoints (number of training iterations between checkpoints), and if a checkpoint should be taken at the end.\n",
|
|
"\n",
|
|
"We also specify the `local_dir`, the directory in which the training logs, checkpoints and other training artificats will be recorded. \n",
|
|
"\n",
|
|
"See [RLlib Training APIs](https://ray.readthedocs.io/en/latest/rllib-training.html#rllib-training-apis) for more details, and also [Training (tune.run, tune.Experiment)](https://ray.readthedocs.io/en/latest/tune/api_docs/execution.html#training-tune-run-tune-experiment) for the complete list of parameters.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import os\n",
|
|
"import ray\n",
|
|
"import ray.tune as tune\n",
|
|
"\n",
|
|
"if __name__ == \"__main__\":\n",
|
|
"\n",
|
|
" # parse arguments ...\n",
|
|
" \n",
|
|
" # Start ray head (single node)\n",
|
|
" os.system('ray start --head')\n",
|
|
" ray.init(address='auto')\n",
|
|
"\n",
|
|
" # Run training task using tune.run\n",
|
|
" tune.run(\n",
|
|
" run_or_experiment=args.run,\n",
|
|
" config=dict(args.config, env=args.env),\n",
|
|
" stop=args.stop,\n",
|
|
" checkpoint_freq=args.checkpoint_freq,\n",
|
|
" checkpoint_at_end=args.checkpoint_at_end,\n",
|
|
" local_dir=args.local_dir\n",
|
|
" )\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor experiment\n",
|
|
"Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs.\n",
|
|
"\n",
|
|
"You can click on the link under **Status** to see the details of a child run. It will also show the metrics being logged."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347127671
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(training_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Stop the run\n",
|
|
"\n",
|
|
"To stop the run, call `training_run.cancel()`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"# training_run.cancel()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Wait for completion\n",
|
|
"Wait for the run to complete before proceeding.\n",
|
|
"\n",
|
|
"**Note: The run may take a few minutes to complete.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347318682
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"training_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Evaluate Trained Agent and See Results\n",
|
|
"\n",
|
|
"We can evaluate a previously trained policy using the `cartpole_rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
|
|
"\n",
|
|
"In this section we show how to get access to these checkpoints data, and then how to use them to evaluate the trained policy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a dataset of training artifacts\n",
|
|
"To evaluate a trained policy (a checkpoint) we need to make the checkpoint accessible to the rollout script.\n",
|
|
"We can use the Run API to download policy training artifacts (saved model and checkpoints) to local compute."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347328505
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from os import path\n",
|
|
"from distutils import dir_util\n",
|
|
"\n",
|
|
"training_artifacts_path = path.join(\"logs\", training_algorithm)\n",
|
|
"print(\"Training artifacts path:\", training_artifacts_path)\n",
|
|
"\n",
|
|
"if path.exists(training_artifacts_path):\n",
|
|
" dir_util.remove_tree(training_artifacts_path)\n",
|
|
"\n",
|
|
"# Download run artifacts to local compute\n",
|
|
"training_run.download_files(training_artifacts_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now let's find the checkpoints and the last checkpoint number."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347334571
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A helper function to find checkpoint files in a directory\n",
|
|
"def find_checkpoints(file_path):\n",
|
|
" print(\"Looking in path:\", file_path)\n",
|
|
" checkpoints = []\n",
|
|
" for root, _, files in os.walk(file_path):\n",
|
|
" for name in files:\n",
|
|
" if os.path.basename(root).startswith('checkpoint_'):\n",
|
|
" checkpoints.append(path.join(root, name))\n",
|
|
" return checkpoints"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347337724
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Find checkpoints and last checkpoint number\n",
|
|
"checkpoint_files = find_checkpoints(training_artifacts_path)\n",
|
|
"\n",
|
|
"checkpoint_numbers = []\n",
|
|
"for file in checkpoint_files:\n",
|
|
" file = os.path.basename(file)\n",
|
|
" if file.startswith('checkpoint-') and not file.endswith('.tune_metadata'):\n",
|
|
" checkpoint_numbers.append(int(file.split('-')[1]))\n",
|
|
"\n",
|
|
"print(\"Checkpoints:\", checkpoint_numbers)\n",
|
|
"\n",
|
|
"last_checkpoint_number = max(checkpoint_numbers)\n",
|
|
"print(\"Last checkpoint number:\", last_checkpoint_number)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we upload checkpoints to default datastore and create a file dataset. This dataset will be used to pass in the checkpoints to the rollout script."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347346085
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Upload the checkpoint files and create a DataSet\n",
|
|
"from azureml.core import Dataset\n",
|
|
"\n",
|
|
"datastore = ws.get_default_datastore()\n",
|
|
"checkpoint_dataref = datastore.upload_files(checkpoint_files, target_path='cartpole_checkpoints_' + training_run.id, overwrite=True)\n",
|
|
"checkpoint_ds = Dataset.File.from_files(checkpoint_dataref)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To verify, we can print out the number (and paths) of all the files in the dataset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347354726
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"artifacts_paths = checkpoint_ds.to_path()\n",
|
|
"print(\"Number of files in dataset:\", len(artifacts_paths))\n",
|
|
"\n",
|
|
"# Uncomment line below to print all file paths\n",
|
|
"#print(\"Artifacts dataset file paths: \", artifacts_paths)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Evaluate Trained Agent and See Results\n",
|
|
"\n",
|
|
"We can evaluate a previously trained policy using the `cartpole_rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
|
|
"In this section we show how to use these checkpoints to evaluate the trained policy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347414835
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"ray_environment_name = 'cartpole-ray-ci'\n",
|
|
"\n",
|
|
"experiment_name = 'CartPole-v0-CI'\n",
|
|
"training_algorithm = 'PPO'\n",
|
|
"rl_environment = 'CartPole-v0'\n",
|
|
"\n",
|
|
"experiment = Experiment(workspace=ws, name=experiment_name)\n",
|
|
"ray_environment = Environment.get(workspace=ws, name=ray_environment_name)\n",
|
|
"\n",
|
|
"script_name = 'cartpole_rollout.py'\n",
|
|
"script_arguments = [\n",
|
|
" '--run', training_algorithm,\n",
|
|
" '--env', rl_environment,\n",
|
|
" '--config', '{}',\n",
|
|
" '--steps', '2000',\n",
|
|
" '--checkpoint-number', str(last_checkpoint_number),\n",
|
|
" '--no-render',\n",
|
|
" '--artifacts-dataset', checkpoint_ds.as_named_input('artifacts_dataset'),\n",
|
|
" '--artifacts-path', checkpoint_ds.as_named_input('artifacts_path').as_mount()\n",
|
|
"]\n",
|
|
"\n",
|
|
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
|
|
"aml_run_config_ml.target = compute_target\n",
|
|
"aml_run_config_ml.docker = DockerConfiguration(use_docker=True)\n",
|
|
"aml_run_config_ml.node_count = 1\n",
|
|
"aml_run_config_ml.environment = ray_environment\n",
|
|
"aml_run_config_ml.data\n",
|
|
"\n",
|
|
"rollout_config = ScriptRunConfig(\n",
|
|
" source_directory='./files',\n",
|
|
" script=script_name,\n",
|
|
" arguments=script_arguments,\n",
|
|
" run_config = aml_run_config_ml\n",
|
|
" )\n",
|
|
" \n",
|
|
"rollout_run = experiment.submit(rollout_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And then, similar to the training section, we can monitor the real-time progress of the rollout run and its chid as follows. If you browse logs of the child run you can see the evaluation results recorded in driver_log.txt file. Note that you may need to wait several minutes before these results become available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1646347429626
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"RunDetails(rollout_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Wait for completion of the rollout run, or you may cancel the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uncomment line below to cancel the run\n",
|
|
"#rollout_run.cancel()\n",
|
|
"rollout_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cleaning up\n",
|
|
"For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# To archive the created experiment:\n",
|
|
"#exp.archive()\n",
|
|
"\n",
|
|
"# To delete created compute instance\n",
|
|
"if not current_compute_instance:\n",
|
|
" compute_target.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next\n",
|
|
"This example was about running Reinforcement Learning in Azure Machine Learning (Ray/RLlib Framework) on a compute instance. Please see [Cartpole Problem on Single Compute](../cartpole-on-single-compute/cartpole_sc.ipynb)\n",
|
|
"example which uses Ray RLlib to train a Cartpole playing agent on a single node remote compute.\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "adrosa"
|
|
},
|
|
{
|
|
"name": "hoazari"
|
|
}
|
|
],
|
|
"categories": [
|
|
"how-to-use-azureml",
|
|
"reinforcement-learning"
|
|
],
|
|
"interpreter": {
|
|
"hash": "13382f70c1d0595120591d2e358c8d446daf961bf951d1fba9a32631e205d5ab"
|
|
},
|
|
"kernel_info": {
|
|
"name": "python3-azureml"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.9"
|
|
},
|
|
"microsoft": {
|
|
"host": {
|
|
"AzureML": {
|
|
"notebookHasBeenCompleted": true
|
|
}
|
|
}
|
|
},
|
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
|
"nteract": {
|
|
"version": "nteract-front-end@1.0.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
} |