{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-computeinstance.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Train using Azure Machine Learning Compute Instance\n", "\n", "* Initialize Workspace\n", "* Introduction to ComputeInstance\n", "* Create an Experiment\n", "* Submit ComputeInstance run\n", "* Additional operations to perform on ComputeInstance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "If you are using an Azure Machine Learning ComputeInstance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Check core SDK version number\n", "import azureml.core\n", "\n", "print(\"SDK version:\", azureml.core.VERSION)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize Workspace\n", "\n", "Initialize a workspace object" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "create workspace" ] }, "outputs": [], "source": [ "from azureml.core import Workspace\n", "\n", "ws = Workspace.from_config()\n", "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to ComputeInstance\n", "\n", "\n", "Azure Machine Learning compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. It is created **within your workspace region**.\n", "\n", "For more information on ComputeInstance, please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance)\n", "\n", "**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create ComputeInstance\n", "First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since ComputeInstance is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D3_V2') is supported.\n", "\n", "You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "msdoc": "how-to-auto-train-remote.md", "name": "check_region" }, "outputs": [], "source": [ "from azureml.core.compute import ComputeTarget, ComputeInstance\n", "\n", "ComputeInstance.supported_vmsizes(workspace = ws)\n", "# ComputeInstance.supported_vmsizes(workspace = ws, location='eastus')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "msdoc": "how-to-auto-train-remote.md", "name": "create_instance" }, "outputs": [], "source": [ "import datetime\n", "import time\n", "\n", "from azureml.core.compute import ComputeTarget, ComputeInstance\n", "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your instance\n", "# Compute instance name should be unique across the azure region\n", "compute_name = \"ci{}\".format(ws._workspace_id)[:10]\n", "\n", "# Verify that instance does not exist already\n", "try:\n", " instance = ComputeInstance(workspace=ws, name=compute_name)\n", " print('Found existing instance, use it.')\n", "except ComputeTargetException:\n", " compute_config = ComputeInstance.provisioning_configuration(\n", " vm_size='STANDARD_D3_V2',\n", " ssh_public_access=False,\n", " # vnet_resourcegroup_name='',\n", " # vnet_name='',\n", " # subnet_name='default',\n", " # admin_user_ssh_public_key=''\n", " )\n", " instance = ComputeInstance.create(ws, compute_name, compute_config)\n", " instance.wait_for_completion(show_output=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create An Experiment\n", "\n", "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core import Experiment\n", "experiment_name = 'train-on-computeinstance'\n", "experiment = Experiment(workspace = ws, name = experiment_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit ComputeInstance run\n", "The training script `train.py` is already created for you" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create environment\n", "\n", "Create an environment with scikit-learn installed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core import Environment\n", "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", "myenv = Environment(\"myenv\")\n", "myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure & Run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core import ScriptRunConfig\n", "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", "\n", "src = ScriptRunConfig(source_directory='', script='train.py')\n", "\n", "# Set compute target to the one created in previous step\n", "src.run_config.target = instance\n", "\n", "# Set environment\n", "src.run_config.environment = myenv\n", " \n", "run = experiment.submit(config=src)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.widgets import RunDetails\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the get_active_runs() to get the currently running or queued jobs on the compute instance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# wait for the run to reach Queued or Running state if it is in Preparing state\n", "status = run.get_status()\n", "while status not in ['Queued', 'Running', 'Completed', 'Failed', 'Canceled']:\n", " state = run.get_status()\n", " print('Run status: {}'.format(status))\n", " time.sleep(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get active runs which are in Queued or Running state\n", "active_runs = instance.get_active_runs()\n", "for active_run in active_runs:\n", " print(active_run.run_id, ',', active_run.status)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run.wait_for_completion()\n", "print(run.get_metrics())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional operations to perform on ComputeInstance\n", "\n", "You can perform more operations on ComputeInstance such as get status, change the state or deleting the compute." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "msdoc": "how-to-auto-train-remote.md", "name": "get_status" }, "outputs": [], "source": [ "# get_status() gets the latest status of the ComputeInstance target\n", "instance.get_status()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "msdoc": "how-to-auto-train-remote.md", "name": "stop" }, "outputs": [], "source": [ "# stop() is used to stop the ComputeInstance\n", "# Stopping ComputeInstance will stop the billing meter and persist the state on the disk.\n", "# Available Quota will not be changed with this operation.\n", "instance.stop(wait_for_completion=True, show_output=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "msdoc": "how-to-auto-train-remote.md", "name": "start" }, "outputs": [], "source": [ "# start() is used to start the ComputeInstance if it is in stopped state\n", "instance.start(wait_for_completion=True, show_output=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# restart() is used to restart the ComputeInstance\n", "instance.restart(wait_for_completion=True, show_output=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name \n", "# instance.delete(wait_for_completion=True, show_output=True)" ] } ], "metadata": { "authors": [ { "name": "ramagott" } ], "category": "training", "compute": [ "Compute Instance" ], "datasets": [ "Diabetes" ], "deployment": [ "None" ], "exclude_from_index": false, "framework": [ "None" ], "friendly_name": "Train on Azure Machine Learning Compute Instance", "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "tags": [ "None" ], "task": "Submit a run on Azure Machine Learning Compute Instance." }, "nbformat": 4, "nbformat_minor": 2 }