mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 01:27:06 -05:00
406 lines
12 KiB
Plaintext
406 lines
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Train using Azure Machine Learning Compute Instance\n",
|
|
"\n",
|
|
"* Initialize Workspace\n",
|
|
"* Introduction to ComputeInstance\n",
|
|
"* Create an Experiment\n",
|
|
"* Submit ComputeInstance run\n",
|
|
"* Additional operations to perform on ComputeInstance"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites\n",
|
|
"If you are using an Azure Machine Learning ComputeInstance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check core SDK version number\n",
|
|
"import azureml.core\n",
|
|
"\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize Workspace\n",
|
|
"\n",
|
|
"Initialize a workspace object"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"create workspace"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Introduction to ComputeInstance\n",
|
|
"\n",
|
|
"\n",
|
|
"Azure Machine Learning compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. It is created **within your workspace region**.\n",
|
|
"\n",
|
|
"For more information on ComputeInstance, please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance)\n",
|
|
"\n",
|
|
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create ComputeInstance\n",
|
|
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since ComputeInstance is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D3_V2') is supported.\n",
|
|
"\n",
|
|
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"msdoc": "how-to-auto-train-remote.md",
|
|
"name": "check_region"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
|
|
"\n",
|
|
"ComputeInstance.supported_vmsizes(workspace = ws)\n",
|
|
"# ComputeInstance.supported_vmsizes(workspace = ws, location='eastus')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"msdoc": "how-to-auto-train-remote.md",
|
|
"name": "create_instance"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import datetime\n",
|
|
"import time\n",
|
|
"\n",
|
|
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# Choose a name for your instance\n",
|
|
"# Compute instance name should be unique across the azure region\n",
|
|
"compute_name = \"ci{}\".format(ws._workspace_id)[:10]\n",
|
|
"\n",
|
|
"# Verify that instance does not exist already\n",
|
|
"try:\n",
|
|
" instance = ComputeInstance(workspace=ws, name=compute_name)\n",
|
|
" print('Found existing instance, use it.')\n",
|
|
"except ComputeTargetException:\n",
|
|
" compute_config = ComputeInstance.provisioning_configuration(\n",
|
|
" vm_size='STANDARD_D3_V2',\n",
|
|
" ssh_public_access=False,\n",
|
|
" # vnet_resourcegroup_name='<my-resource-group>',\n",
|
|
" # vnet_name='<my-vnet-name>',\n",
|
|
" # subnet_name='default',\n",
|
|
" # admin_user_ssh_public_key='<my-sshkey>'\n",
|
|
" )\n",
|
|
" instance = ComputeInstance.create(ws, compute_name, compute_config)\n",
|
|
" instance.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create An Experiment\n",
|
|
"\n",
|
|
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"experiment_name = 'train-on-computeinstance'\n",
|
|
"experiment = Experiment(workspace = ws, name = experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Submit ComputeInstance run\n",
|
|
"The training script `train.py` is already created for you"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create environment\n",
|
|
"\n",
|
|
"Create an environment with scikit-learn installed."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
"\n",
|
|
"myenv = Environment(\"myenv\")\n",
|
|
"myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Configure & Run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
|
|
"\n",
|
|
"src = ScriptRunConfig(source_directory='', script='train.py')\n",
|
|
"\n",
|
|
"# Set compute target to the one created in previous step\n",
|
|
"src.run_config.target = instance\n",
|
|
"\n",
|
|
"# Set environment\n",
|
|
"src.run_config.environment = myenv\n",
|
|
" \n",
|
|
"run = experiment.submit(config=src)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"RunDetails(run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can use the get_active_runs() to get the currently running or queued jobs on the compute instance"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# wait for the run to reach Queued or Running state if it is in Preparing state\n",
|
|
"status = run.get_status()\n",
|
|
"while status not in ['Queued', 'Running', 'Completed', 'Failed', 'Canceled']:\n",
|
|
" state = run.get_status()\n",
|
|
" print('Run status: {}'.format(status))\n",
|
|
" time.sleep(10)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# get active runs which are in Queued or Running state\n",
|
|
"active_runs = instance.get_active_runs()\n",
|
|
"for active_run in active_runs:\n",
|
|
" print(active_run.run_id, ',', active_run.status)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run.wait_for_completion()\n",
|
|
"print(run.get_metrics())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Additional operations to perform on ComputeInstance\n",
|
|
"\n",
|
|
"You can perform more operations on ComputeInstance such as get status, change the state or deleting the compute."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"msdoc": "how-to-auto-train-remote.md",
|
|
"name": "get_status"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# get_status() gets the latest status of the ComputeInstance target\n",
|
|
"instance.get_status()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"msdoc": "how-to-auto-train-remote.md",
|
|
"name": "stop"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# stop() is used to stop the ComputeInstance\n",
|
|
"# Stopping ComputeInstance will stop the billing meter and persist the state on the disk.\n",
|
|
"# Available Quota will not be changed with this operation.\n",
|
|
"instance.stop(wait_for_completion=True, show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"msdoc": "how-to-auto-train-remote.md",
|
|
"name": "start"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# start() is used to start the ComputeInstance if it is in stopped state\n",
|
|
"instance.start(wait_for_completion=True, show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# restart() is used to restart the ComputeInstance\n",
|
|
"instance.restart(wait_for_completion=True, show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name \n",
|
|
"# instance.delete(wait_for_completion=True, show_output=True)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "ramagott"
|
|
}
|
|
],
|
|
"category": "training",
|
|
"compute": [
|
|
"Compute Instance"
|
|
],
|
|
"datasets": [
|
|
"Diabetes"
|
|
],
|
|
"deployment": [
|
|
"None"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"None"
|
|
],
|
|
"friendly_name": "Train on Azure Machine Learning Compute Instance",
|
|
"index_order": 1,
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.7"
|
|
},
|
|
"tags": [
|
|
"None"
|
|
],
|
|
"task": "Submit a run on Azure Machine Learning Compute Instance."
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |