Files
MachineLearningNotebooks/training/07.tensorboard/07.tensorboard.ipynb
2018-10-12 14:39:33 -04:00

531 lines
18 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 40. Tensorboard Integration with Run History\n",
"\n",
"1. Run a Tensorflow job locally and view its TB output live.\n",
"2. The same, for a DSVM.\n",
"3. And once more, with Batch AI.\n",
"4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set experiment name and create project\n",
"Choose a name for your run history container in the workspace, and create a folder for the project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from os import path, makedirs\n",
"experiment_name = 'tensorboard-demo'\n",
"\n",
"# experiment folder\n",
"exp_dir = './sample_projects/' + experiment_name\n",
"\n",
"if not path.exists(exp_dir):\n",
" makedirs(exp_dir)\n",
"\n",
"# runs we started in this session, for the finale\n",
"runs = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download Tensorflow Tensorboard demo code\n",
"\n",
"Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n",
"\n",
"Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"import tempfile\n",
"tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n",
"with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n",
" file.write(tf_code.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure and run locally\n",
"\n",
"We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n",
"\n",
"Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"\n",
"# Create a run configuration.\n",
"run_config = RunConfiguration()\n",
"run_config.environment.python.user_managed_dependencies = True\n",
"\n",
"# You can choose a specific Python environment by pointing to a Python path \n",
"#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment, Run\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"import tensorflow as tf\n",
"\n",
"logs_dir = os.curdir + os.sep + \"logs\"\n",
"tensorflow_logs_dir = os.path.join(logs_dir, \"tensorflow\")\n",
"\n",
"if not path.exists(tensorflow_logs_dir):\n",
" makedirs(tensorflow_logs_dir)\n",
"\n",
"os.environ[\"TEST_TMPDIR\"] = logs_dir\n",
"\n",
"# Writing logs to ./logs results in their being uploaded to Artifact Service,\n",
"# and thus, made accessible to our Tensorboard instance.\n",
"arguments_list = [\"--log_dir\", logs_dir]\n",
"\n",
"# Create an experiment\n",
"exp = Experiment(ws, experiment_name)\n",
"\n",
"script = ScriptRunConfig(exp_dir,\n",
" script=\"mnist_with_summaries.py\",\n",
" run_config=run_config)\n",
"\n",
"# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n",
"# arguments_list += [\"--max_steps\", \"5000\"]\n",
"kwargs = {}\n",
"kwargs['arguments_list'] = arguments_list\n",
"run = exp.submit(script, kwargs)\n",
"# You can also wait for the run to complete\n",
"# run.wait_for_completion(show_output=True)\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard\n",
"\n",
"Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.tensorboard import Tensorboard\n",
"\n",
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now, with a DSVM\n",
"\n",
"Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n",
"Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n",
"\n",
"If you are unfamiliar with DSVM configuration, check [04. Train in a remote VM (Ubuntu DSVM)](04.train-on-remote-vm.ipynb) for a more detailed breakdown."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import DsvmCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"compute_target_name = 'cpu-dsvm'\n",
"\n",
"try:\n",
" compute_target = DsvmCompute(workspace = ws, name = compute_target_name)\n",
" print('found existing:', compute_target.name)\n",
"except ComputeTargetException:\n",
" print('creating new.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
" compute_target = DsvmCompute.create(ws, name = compute_target_name, provisioning_configuration = dsvm_config)\n",
" compute_target.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit run using TensorFlow estimator\n",
"\n",
"Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.dnn import TensorFlow\n",
"\n",
"script_params = {\"--log_dir\": \"./logs\"}\n",
"\n",
"# If you want the run to go longer, set --max-steps to a higher number.\n",
"# script_params[\"--max_steps\"] = \"5000\"\n",
"\n",
"tf_estimator = TensorFlow(source_directory=exp_dir,\n",
" compute_target=compute_target,\n",
" entry_script='mnist_with_summaries.py',\n",
" script_params=script_params)\n",
"\n",
"run = exp.submit(tf_estimator)\n",
"\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard with this run\n",
"\n",
"Just like before."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Once more, with a Batch AI cluster\n",
"\n",
"Just to prove we can, let's create a Batch AI cluster using MLC, and run our demo there, as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import BatchAiCompute\n",
"\n",
"clust_name = ws.name + \"cpu\"\n",
"\n",
"try:\n",
" # If you already have a cluster named this, we don't need to make a new one.\n",
" cts = ws.compute_targets() \n",
" compute_target = cts[clust_name]\n",
" assert compute_target.type == 'BatchAI'\n",
"except:\n",
" # Let's make a new one here.\n",
" provisioning_config = BatchAiCompute.provisioning_configuration(cluster_max_nodes=2, \n",
" autoscale_enabled=True, \n",
" cluster_min_nodes=1,\n",
" vm_size='Standard_D11_V2')\n",
" \n",
" compute_target = BatchAiCompute.create(ws, clust_name, provisioning_config)\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n",
"print(compute_target.name)\n",
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
" # print(compute_target.status.serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit run using TensorFlow estimator\n",
"\n",
"Again, we can use the TensorFlow estimator and everything is set up automatically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"script_params = {\"--log_dir\": \"./logs\"}\n",
"\n",
"# If you want the run to go longer, set --max-steps to a higher number.\n",
"# script_params[\"--max_steps\"] = \"5000\"\n",
"\n",
"tf_estimator = TensorFlow(source_directory=exp_dir,\n",
" compute_target=compute_target,\n",
" entry_script='mnist_with_summaries.py',\n",
" script_params=script_params)\n",
"\n",
"run = exp.submit(tf_estimator)\n",
"\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard with this run\n",
"\n",
"Once more..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finale\n",
"\n",
"If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs...\n",
"# and it turns out that we have been building one of those all along.\n",
"tb = Tensorboard(runs)\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}