Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-computeinstance.png)

#  Train using Azure Machine Learning Compute Instance

* Initialize Workspace
* Introduction to ComputeInstance
* Create an Experiment
* Submit ComputeInstance run
* Additional operations to perform on ComputeInstance

## Prerequisites
If you are using an Azure Machine Learning ComputeInstance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace.

In [None]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

## Initialize Workspace

Initialize a workspace object

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

## Introduction to ComputeInstance


Azure Machine Learning compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. It is created **within your workspace region**.

For more information on ComputeInstance, please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance)

**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

### Create ComputeInstance
First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since ComputeInstance is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D3_V2') is supported.

You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)

In [None]:
from azureml.core.compute import ComputeTarget, ComputeInstance

ComputeInstance.supported_vmsizes(workspace = ws)
# ComputeInstance.supported_vmsizes(workspace = ws, location='eastus')

In [None]:
import datetime
import time

from azureml.core.compute import ComputeTarget, ComputeInstance
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your instance
# Compute instance name should be unique across the azure region
compute_name = "ci{}".format(ws._workspace_id)[:10]

# Verify that instance does not exist already
try:
    instance = ComputeInstance(workspace=ws, name=compute_name)
    print('Found existing instance, use it.')
except ComputeTargetException:
    compute_config = ComputeInstance.provisioning_configuration(
        vm_size='STANDARD_D3_V2',
        ssh_public_access=False,
        # vnet_resourcegroup_name='<my-resource-group>',
        # vnet_name='<my-vnet-name>',
        # subnet_name='default',
        # admin_user_ssh_public_key='<my-sshkey>'
    )
    instance = ComputeInstance.create(ws, compute_name, compute_config)
    instance.wait_for_completion(show_output=True)

## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [None]:
from azureml.core import Experiment
experiment_name = 'train-on-computeinstance'
experiment = Experiment(workspace = ws, name = experiment_name)

## Submit ComputeInstance run
The training script `train.py` is already created for you

### Create environment

Create an environment with scikit-learn installed.

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

### Configure & Run

In [None]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory='', script='train.py')

# Set compute target to the one created in previous step
src.run_config.target = instance

# Set environment
src.run_config.environment = myenv
 
run = experiment.submit(config=src)

Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

You can use the get_active_runs() to get the currently running or queued jobs on the compute instance

In [None]:
# wait for the run to reach Queued or Running state if it is in Preparing state
status = run.get_status()
while status not in ['Queued', 'Running', 'Completed', 'Failed', 'Canceled']:
   state = run.get_status()
   print('Run status: {}'.format(status))
   time.sleep(10)

In [None]:
# get active runs which are in Queued or Running state
active_runs = instance.get_active_runs()
for active_run in active_runs:
    print(active_run.run_id, ',', active_run.status)

In [None]:
run.wait_for_completion()
print(run.get_metrics())

### Additional operations to perform on ComputeInstance

You can perform more operations on ComputeInstance such as get status, change the state or deleting the compute.

In [None]:
# get_status() gets the latest status of the ComputeInstance target
instance.get_status()

In [None]:
# stop() is used to stop the ComputeInstance
# Stopping ComputeInstance will stop the billing meter and persist the state on the disk.
# Available Quota will not be changed with this operation.
instance.stop(wait_for_completion=True, show_output=True)

In [None]:
# start() is used to start the ComputeInstance if it is in stopped state
instance.start(wait_for_completion=True, show_output=True)

In [None]:
# restart() is used to restart the ComputeInstance
instance.restart(wait_for_completion=True, show_output=True)

In [None]:
# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name 
# instance.delete(wait_for_completion=True, show_output=True)