Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Tensorboard Integration with Run History

1. Run a Tensorflow job locally and view its TB output live.
2. The same, for a DSVM.
3. And once more, with an AmlCompute cluster.
4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph.

## Prerequisites
* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning
* Go through the [configuration notebook](../../../configuration.ipynb) notebook to:
    * install the AML SDK
    * create a workspace and its configuration file (`config.json`)

In [None]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

Install the Azure ML TensorBoard package.

In [None]:
!pip install azureml-contrib-tensorboard

## Diagnostics
Opt-in diagnostics for better experience, quality, and security of future releases.

In [None]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

## Initialize Workspace

Initialize a workspace object from persisted configuration.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

## Set experiment name and create project
Choose a name for your run history container in the workspace, and create a folder for the project.

In [None]:
from os import path, makedirs
experiment_name = 'tensorboard-demo'

# experiment folder
exp_dir = './sample_projects/' + experiment_name

if not path.exists(exp_dir):
    makedirs(exp_dir)

# runs we started in this session, for the finale
runs = []

## Download Tensorflow Tensorboard demo code

Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.

Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository.

In [None]:
import requests
import os

tf_code = requests.get("https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py")
with open(os.path.join(exp_dir, "mnist_with_summaries.py"), "w") as file:
    file.write(tf_code.text)

## Configure and run locally

We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.

Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard.

In [None]:
from azureml.core.runconfig import RunConfiguration

# Create a run configuration.
run_config = RunConfiguration()
run_config.environment.python.user_managed_dependencies = True

# You can choose a specific Python environment by pointing to a Python path 
#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'

In [None]:
from azureml.core import Experiment
from azureml.core.script_run_config import ScriptRunConfig

logs_dir = os.path.join(os.curdir, "logs")
data_dir = os.path.abspath(os.path.join(os.curdir, "mnist_data"))

if not path.exists(data_dir):
    makedirs(data_dir)

os.environ["TEST_TMPDIR"] = data_dir

# Writing logs to ./logs results in their being uploaded to Artifact Service,
# and thus, made accessible to our Tensorboard instance.
arguments_list = ["--log_dir", logs_dir]

# Create an experiment
exp = Experiment(ws, experiment_name)

# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:
# arguments_list += ["--max_steps", "5000"]

script = ScriptRunConfig(exp_dir,
                         script="mnist_with_summaries.py",
                         run_config=run_config,
                         arguments=arguments_list)

run = exp.submit(script)
# You can also wait for the run to complete
# run.wait_for_completion(show_output=True)
runs.append(run)

## Start Tensorboard

Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs.

In [None]:
from azureml.contrib.tensorboard import Tensorboard

# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here
tb = Tensorboard([run])

# If successful, start() returns a string with the URI of the instance.
tb.start()

## Stop Tensorboard

When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes.

In [None]:
tb.stop()

## Now, with a DSVM

Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.
Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.

If you are unfamiliar with DSVM configuration, check [Train in a remote VM](../../training/train-on-remote-vm/train-on-remote-vm.ipynb) for a more detailed breakdown.

**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The `DSVMCompute` class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.

```shell
# create a DSVM in your resource group
# note you need to be at least a contributor to the resource group in order to execute this command successfully.
(myenv) $ az vm create --resource-group <resource_group_name> --name <some_vm_name> --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username <username> --admin-password <password> --generate-ssh-keys --authentication-type password
```
You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal.

In [None]:
from azureml.core.compute import RemoteCompute
from azureml.core.compute_target import ComputeTargetException

username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')
address = os.getenv('AZUREML_DSVM_ADDRESS', default='<ip_address_or_fqdn>')

compute_target_name = 'cpudsvm'
# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase 
try:
    attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)
    print('found existing:', attached_dsvm_compute.name)
except ComputeTargetException:
    attached_dsvm_compute = RemoteCompute.attach(workspace=ws,
                                                 name=compute_target_name,
                                                 username=username,
                                                 address=address,
                                                 ssh_port=22,
                                                 private_key_file='./.ssh/id_rsa')
    
    attached_dsvm_compute.wait_for_completion(show_output=True)

## Submit run using TensorFlow estimator

Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically.

In [None]:
from azureml.train.dnn import TensorFlow

script_params = {"--log_dir": "./logs"}

# If you want the run to go longer, set --max-steps to a higher number.
# script_params["--max_steps"] = "5000"

tf_estimator = TensorFlow(source_directory=exp_dir,
                          compute_target=attached_dsvm_compute,
                          entry_script='mnist_with_summaries.py',
                          script_params=script_params)

run = exp.submit(tf_estimator)

runs.append(run)

## Start Tensorboard with this run

Just like before.

In [None]:
# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here
tb = Tensorboard([run])

# If successful, start() returns a string with the URI of the instance.
tb.start()

## Stop Tensorboard

When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes.

In [None]:
tb.stop()

## Once more, with an AmlCompute cluster

Just to prove we can, let's create an AmlCompute CPU cluster, and run our demo there, as well.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute

# choose a name for your cluster
cluster_name = "cpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## Submit run using TensorFlow estimator

Again, we can use the TensorFlow estimator and everything is set up automatically.

In [None]:
script_params = {"--log_dir": "./logs"}

# If you want the run to go longer, set --max-steps to a higher number.
# script_params["--max_steps"] = "5000"

tf_estimator = TensorFlow(source_directory=exp_dir,
                          compute_target=compute_target,
                          entry_script='mnist_with_summaries.py',
                          script_params=script_params)

run = exp.submit(tf_estimator)

runs.append(run)

## Start Tensorboard with this run

Once more...

In [None]:
# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here
tb = Tensorboard([run])

# If successful, start() returns a string with the URI of the instance.
tb.start()

## Stop Tensorboard

When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes.

In [None]:
tb.stop()

## Finale

If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well.

In [None]:
# The Tensorboard constructor takes an array of runs...
# and it turns out that we have been building one of those all along.
tb = Tensorboard(runs)

# If successful, start() returns a string with the URI of the instance.
tb.start()

## Stop Tensorboard

As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least).

In [None]:
tb.stop()