{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.png)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" } }, "source": [ "# How to use Estimator in Azure ML\n", "\n", "## Introduction\n", "This tutorial shows how to use the Estimator pattern in Azure Machine Learning SDK. Estimator is a convenient object in Azure Machine Learning that wraps run configuration information to help simplify the tasks of specifying how a script is executed.\n", "\n", "\n", "## Prerequisite:\n", "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", " * install the AML SDK\n", " * create a workspace and its configuration file (`config.json`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get started. First let's import some Python libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbpresent": { "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" } }, "outputs": [], "source": [ "import azureml.core\n", "from azureml.core import Workspace\n", "\n", "# check core SDK version number\n", "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize workspace\n", "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ws = Workspace.from_config()\n", "print('Workspace name: ' + ws.name, \n", " 'Azure region: ' + ws.location, \n", " 'Subscription id: ' + ws.subscription_id, \n", " 'Resource group: ' + ws.resource_group, sep = '\\n')" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" } }, "source": [ "## Create an Azure ML experiment\n", "Let's create an experiment named \"estimator-test\". The script runs will be recorded under this experiment in Azure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbpresent": { "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" } }, "outputs": [], "source": [ "from azureml.core import Experiment\n", "\n", "exp = Experiment(workspace=ws, name='estimator-test')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create or Attach existing AmlCompute\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", "1. create the configuration (this step is local and only takes a second)\n", "2. create the cluster (this step will take about **20 seconds**)\n", "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.compute import ComputeTarget, AmlCompute\n", "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# choose a name for your cluster\n", "cluster_name = \"cpu-cluster\"\n", "\n", "try:\n", " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", " print('Found existing compute target')\n", "except ComputeTargetException:\n", " print('Creating a new compute target...')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)\n", "\n", " # create the cluster\n", " cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n", "\n", " # can poll for a minimum number of nodes and for a specific timeout. \n", " # if no min node count is provided it uses the scale settings for the cluster\n", " cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(cpu_cluster.get_status().serialize())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compute_targets = ws.compute_targets\n", "for name, ct in compute_targets.items():\n", " print(name, ct.type, ct.provisioning_state)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" } }, "source": [ "## Use a simple script\n", "We have already created a simple \"hello world\" script. This is the script that we will submit through the estimator pattern. It prints a hello-world message, and if Azure ML SDK is installed, it will also logs an array of values ([Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number)). The script takes as input the number of Fibonacci numbers in the sequence to log." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open('./dummy_train.py', 'r') as f:\n", " print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create A Generic Estimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we import the Estimator class and also a widget to visualize a run." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.train.estimator import Estimator\n", "from azureml.widgets import RunDetails" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest estimator is to submit the current folder to the local computer. Estimator by default will attempt to use Docker-based execution. Let's turn that off for now. It then builds a conda environment locally, installs Azure ML SDK in it, and runs your script." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use a conda environment, don't use Docker, on local computer\n", "# Let's see how you can pass bool arguments in the script_params. Passing `'--my_bool_var': ''` will set my_bool_var as True and\n", "# if you want it to be False, just do not pass it in the script_params.\n", "script_params = {\n", " '--numbers-in-sequence': 10,\n", " '--my_bool_var': ''\n", "}\n", "est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n", "run = exp.submit(est)\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also enable Docker and let estimator pick the default CPU image supplied by Azure ML for execution. You can target an AmlCompute cluster (or any other supported compute target types)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use a conda environment on default Docker image in an AmlCompute cluster\n", "script_params = {\n", " '--numbers-in-sequence': 10\n", "}\n", "est = Estimator(source_directory='.', script_params=script_params, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)\n", "run = exp.submit(est)\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can customize the conda environment by adding conda and/or pip packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add a conda package\n", "script_params = {\n", " '--numbers-in-sequence': 10\n", "}\n", "est = Estimator(source_directory='.', \n", " script_params=script_params, \n", " compute_target='local', \n", " entry_script='dummy_train.py', \n", " use_docker=False, \n", " conda_packages=['scikit-learn'])\n", "run = exp.submit(est)\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also specify a custom Docker image for execution. In this case, you probably want to tell the system not to build a new conda environment for you. Instead, you can specify the path to an existing Python environment in the custom Docker image. If custom Docker image information is not specified, Azure ML uses the default Docker image to run your training. For more information about Docker containers used in Azure ML training, please see [Azure ML Containers repository](https://github.com/Azure/AzureML-Containers).\n", "\n", "**Note**: since the below example points to the preinstalled Python environment in the miniconda3 image maintained by continuum.io on Docker Hub where Azure ML SDK is not present, the logging metric code is not triggered. But a run history record is still recorded. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use a custom Docker image\n", "from azureml.core.container_registry import ContainerRegistry\n", "\n", "# this is an image available in Docker Hub\n", "image_name = 'continuumio/miniconda3'\n", "\n", "# you can also point to an image in a private ACR\n", "image_registry_details = ContainerRegistry()\n", "image_registry_details.address = \"myregistry.azurecr.io\"\n", "image_registry_details.username = \"username\"\n", "image_registry_details.password = \"password\"\n", "\n", "# don't let the system build a new conda environment\n", "user_managed_dependencies = True\n", "\n", "# submit to a local Docker container. if you don't have Docker engine running locally, you can set compute_target to cpu_cluster.\n", "script_params = {\n", " '--numbers-in-sequence': 10\n", "}\n", "est = Estimator(source_directory='.', \n", " script_params=script_params, \n", " compute_target='local', \n", " entry_script='dummy_train.py',\n", " custom_docker_image=image_name,\n", " # uncomment below line to use your private ACR\n", " #image_registry_details=image_registry_details,\n", " user_managed=user_managed_dependencies\n", " )\n", "\n", "run = exp.submit(est)\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to passing in a python file, you can also pass in a Jupyter notebook as the `entry_script`. [notebook_example.ipynb](notebook_example.ipynb) uses pm.record() to log key-value pairs which will appear in Azure Portal and shown in below widget.\n", "\n", "In order to run below, make sure `azureml-contrib-notebook` package is installed in current environment with `pip intall azureml-contrib-notebook`.\n", "\n", "This code snippet specifies the following parameters to the `Estimator` constructor. For more information on `Estimator`, please see [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) or [API doc](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py).\n", "\n", "| Parameter | Description |\n", "| ------------- | ------------- |\n", "| source_directory | (str) Local directory that contains all of your code needed for the training job. This folder gets copied from your local machine to the remote compute |\n", "| compute_target | ([AbstractComputeTarget](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute_target.abstractcomputetarget?view=azure-ml-py) or str) Remote compute target that your training script will run on, in this case a previously created persistent compute cluster (cpu_cluster) |\n", "| entry_script | (str) Filepath (relative to the source_directory) of the training script/notebook to be run on the remote compute. This file, and any additional files it depends on, should be located in this folder |\n", "| script_params | (dict) A dictionary containing parameters to the `entry_script`. This is useful for passing datastore reference, for example, see [train-hyperparameter-tune-deploy-with-tensorflow.ipynb](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "est = Estimator(source_directory='.', compute_target=cpu_cluster, entry_script='notebook_example.ipynb', pip_packages=['nteract-scrapbook', 'azureml-contrib-notebook'])\n", "run = exp.submit(est)\n", "RunDetails(run).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Intelligent hyperparameter tuning\n", "\n", "The simple \"hello world\" script above lets the user fix the value of a parameter for the number of Fibonacci numbers in the sequence to log. Similarly, when training models, you can fix values of parameters of the training algorithm itself. E.g. the learning rate, the number of layers, the number of nodes in each layer in a neural network, etc. These adjustable parameters that govern the training process are referred to as the hyperparameters of the model. The goal of hyperparameter tuning is to search across various hyperparameter configurations and find the configuration that results in the best performance.\n", "\n", "\n", "To demonstrate how Azure Machine Learning can help you automate the process of hyperarameter tuning, we will launch multiple runs with different values for numbers in the sequence. First let's define the parameter space using random sampling." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", "from azureml.train.hyperdrive import choice\n", "\n", "ps = RandomParameterSampling(\n", " {\n", " '--numbers-in-sequence': choice(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will create a new estimator without the above numbers-in-sequence parameter since that will be passed in later. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "est = Estimator(source_directory='.', script_params={}, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will look at training metrics and early termination policies. When training a model, users are interested in logging and optimizing certain metrics of the model e.g. maximize the accuracy of the model, or minimize loss. This metric is logged by the training script for each run. In our simple script above, we are logging Fibonacci numbers in a sequence. But a training script could just as easily log other metrics like accuracy or loss, which can be used to evaluate the performance of a given training run.\n", "\n", "The intelligent hyperparameter tuning capability in Azure Machine Learning automatically terminates poorly performing runs using an early termination policy. Early termination reduces wastage of compute resources and instead uses these resources for exploring other hyperparameter configurations. In this example, we use the BanditPolicy. This basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML will terminate the training run. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are ready to configure a run configuration object for hyperparameter tuning. We need to call out the primary metric that we want the experiment to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script and we specify that we are looking to maximize this value. Next, we control the resource budget for the experiment by setting the maximum total number of training runs to 10. We also set the maximum number of training runs to run concurrently at 4, which is the same as the number of nodes in our computer cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hdc = HyperDriveConfig(estimator=est, \n", " hyperparameter_sampling=ps, \n", " policy=policy, \n", " primary_metric_name='Fibonacci numbers', \n", " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", " max_total_runs=10,\n", " max_concurrent_runs=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's launch the hyperparameter tuning job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hdr = exp.submit(config=hdc)\n", "RunDetails(hdr).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When all the runs complete, we can find the run with the best performance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "best_run = hdr.get_best_run_by_primary_metric()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can register the model from the best run and use it to deploy a web service that can be used for Inferencing. Details on how how you can do this can be found in the sample folders for the ohter types of estimators.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next Steps\n", "Now you can proceed to explore the other types of estimators, such as TensorFlow estimator, PyTorch estimator, etc. in the sample folder." ] } ], "metadata": { "authors": [ { "name": "ninhu" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }