diff --git a/configuration.ipynb b/configuration.ipynb index 5db62bab..a6f93629 100644 --- a/configuration.ipynb +++ b/configuration.ipynb @@ -1,383 +1,383 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/configuration.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Configuration\n", - "\n", - "_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n", - "\n", - "---\n", - "---\n", - "\n", - "## Table of Contents\n", - "\n", - "1. [Introduction](#Introduction)\n", - " 1. What is an Azure Machine Learning workspace\n", - "1. [Setup](#Setup)\n", - " 1. Azure subscription\n", - " 1. Azure ML SDK and other library installation\n", - " 1. Azure Container Instance registration\n", - "1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n", - " 1. Workspace parameters\n", - " 1. Access your workspace\n", - " 1. Create a new workspace\n", - " 1. Create compute resources\n", - "1. [Next steps](#Next%20steps)\n", - "\n", - "---\n", - "\n", - "## Introduction\n", - "\n", - "This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n", - "\n", - "Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n", - "\n", - "In this notebook you will\n", - "* Learn about getting an Azure subscription\n", - "* Specify your workspace parameters\n", - "* Access or create your workspace\n", - "* Add a default compute cluster for your workspace\n", - "\n", - "### What is an Azure Machine Learning workspace\n", - "\n", - "An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setup\n", - "\n", - "This section describes activities required before you can access any Azure ML services functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1. Azure Subscription\n", - "\n", - "In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n", - "\n", - "### 2. Azure ML SDK and other library installation\n", - "\n", - "If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n", - "\n", - "Also install following libraries to your environment. Many of the example notebooks depend on them\n", - "\n", - "```\n", - "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", - "```\n", - "\n", - "Once installation is complete, the following cell checks the Azure ML SDK version:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "install" - ] - }, - "outputs": [], - "source": [ - "import azureml.core\n", - "\n", - "print(\"This notebook was created using version 1.0.43 of the Azure ML SDK\")\n", - "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n", - "\n", - "### 3. Azure Container Instance registration\n", - "Azure Machine Learning uses of [Azure Container Instance (ACI)](https://azure.microsoft.com/services/container-instances) to deploy dev/test web services. An Azure subscription needs to be registered to use ACI. If you or the subscription owner have not yet registered ACI on your subscription, you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands. Note that if you ran through the AML [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) you have already registered ACI. \n", - "\n", - "```shell\n", - "# check to see if ACI is already registered\n", - "(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n", - "\n", - "# if ACI is not registered, run this command.\n", - "# note you need to be the subscription owner in order to execute this command successfully.\n", - "(myenv) $ az provider register -n Microsoft.ContainerInstance\n", - "```\n", - "\n", - "---" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure your Azure ML workspace\n", - "\n", - "### Workspace parameters\n", - "\n", - "To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n", - "* Your subscription id\n", - "* A resource group name\n", - "* (optional) The region that will host your workspace\n", - "* A name for your workspace\n", - "\n", - "You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n", - "\n", - "You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", - "\n", - "The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n", - "\n", - "The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n", - "\n", - "The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n", - "\n", - "If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n", - "\n", - "Replace the default values in the cell below with your workspace parameters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"\")\n", - "resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"\")\n", - "workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"\")\n", - "workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Access your workspace\n", - "\n", - "The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "try:\n", - " ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n", - " # write the details of the workspace to a configuration file to the notebook library\n", - " ws.write_config()\n", - " print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n", - "except:\n", - " print(\"Workspace not accessible. Change your parameters or create a new workspace below\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a new workspace\n", - "\n", - "If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", - "\n", - "**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", - "\n", - "This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n", - "\n", - "This will fail if:\n", - "* You do not have permission to create a workspace in the resource group\n", - "* You do not have permission to create a resource group if it's non-existing.\n", - "* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n", - "\n", - "If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "# Create the workspace using the specified parameters\n", - "ws = Workspace.create(name = workspace_name,\n", - " subscription_id = subscription_id,\n", - " resource_group = resource_group, \n", - " location = workspace_region,\n", - " create_resource_group = True,\n", - " exist_ok = True)\n", - "ws.get_details()\n", - "\n", - "# write the details of the workspace to a configuration file to the notebook library\n", - "ws.write_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create compute resources for your training experiments\n", - "\n", - "Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n", - "\n", - "To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n", - "\n", - "The cluster parameters are:\n", - "* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n", - "\n", - "```shell\n", - "az vm list-skus -o tsv\n", - "```\n", - "* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while not in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n", - "* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n", - "\n", - "\n", - "To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpu-cluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", - " print(\"Found existing cpu-cluster\")\n", - "except ComputeTargetException:\n", - " print(\"Creating new cpu-cluster\")\n", - " \n", - " # Specify the configuration for the new cluster\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n", - " min_nodes=0,\n", - " max_nodes=4)\n", - "\n", - " # Create the cluster with the specified name and configuration\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", - " \n", - " # Wait for the cluster to complete, show the output log\n", - " cpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your GPU cluster\n", - "gpu_cluster_name = \"gpu-cluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n", - " print(\"Found existing gpu cluster\")\n", - "except ComputeTargetException:\n", - " print(\"Creating new gpu-cluster\")\n", - " \n", - " # Specify the configuration for the new cluster\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n", - " min_nodes=0,\n", - " max_nodes=4)\n", - " # Create the cluster with the specified name and configuration\n", - " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n", - "\n", - " # Wait for the cluster to complete, show the output log\n", - " gpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## Next steps\n", - "\n", - "In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n", - "\n", - "If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/configuration.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Configuration\n", + "\n", + "_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n", + "\n", + "---\n", + "---\n", + "\n", + "## Table of Contents\n", + "\n", + "1. [Introduction](#Introduction)\n", + " 1. What is an Azure Machine Learning workspace\n", + "1. [Setup](#Setup)\n", + " 1. Azure subscription\n", + " 1. Azure ML SDK and other library installation\n", + " 1. Azure Container Instance registration\n", + "1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n", + " 1. Workspace parameters\n", + " 1. Access your workspace\n", + " 1. Create a new workspace\n", + " 1. Create compute resources\n", + "1. [Next steps](#Next%20steps)\n", + "\n", + "---\n", + "\n", + "## Introduction\n", + "\n", + "This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n", + "\n", + "Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n", + "\n", + "In this notebook you will\n", + "* Learn about getting an Azure subscription\n", + "* Specify your workspace parameters\n", + "* Access or create your workspace\n", + "* Add a default compute cluster for your workspace\n", + "\n", + "### What is an Azure Machine Learning workspace\n", + "\n", + "An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "This section describes activities required before you can access any Azure ML services functionality." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Azure Subscription\n", + "\n", + "In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n", + "\n", + "### 2. Azure ML SDK and other library installation\n", + "\n", + "If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n", + "\n", + "Also install following libraries to your environment. Many of the example notebooks depend on them\n", + "\n", + "```\n", + "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", + "```\n", + "\n", + "Once installation is complete, the following cell checks the Azure ML SDK version:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "install" + ] + }, + "outputs": [], + "source": [ + "import azureml.core\n", + "\n", + "print(\"This notebook was created using version AZUREML-SDK-VERSION of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n", + "\n", + "### 3. Azure Container Instance registration\n", + "Azure Machine Learning uses of [Azure Container Instance (ACI)](https://azure.microsoft.com/services/container-instances) to deploy dev/test web services. An Azure subscription needs to be registered to use ACI. If you or the subscription owner have not yet registered ACI on your subscription, you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands. Note that if you ran through the AML [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) you have already registered ACI. \n", + "\n", + "```shell\n", + "# check to see if ACI is already registered\n", + "(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n", + "\n", + "# if ACI is not registered, run this command.\n", + "# note you need to be the subscription owner in order to execute this command successfully.\n", + "(myenv) $ az provider register -n Microsoft.ContainerInstance\n", + "```\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure your Azure ML workspace\n", + "\n", + "### Workspace parameters\n", + "\n", + "To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n", + "* Your subscription id\n", + "* A resource group name\n", + "* (optional) The region that will host your workspace\n", + "* A name for your workspace\n", + "\n", + "You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n", + "\n", + "You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", + "\n", + "The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n", + "\n", + "The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n", + "\n", + "The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n", + "\n", + "If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n", + "\n", + "Replace the default values in the cell below with your workspace parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"\")\n", + "resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"\")\n", + "workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"\")\n", + "workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Access your workspace\n", + "\n", + "The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "try:\n", + " ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n", + " # write the details of the workspace to a configuration file to the notebook library\n", + " ws.write_config()\n", + " print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n", + "except:\n", + " print(\"Workspace not accessible. Change your parameters or create a new workspace below\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a new workspace\n", + "\n", + "If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", + "\n", + "**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", + "\n", + "This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n", + "\n", + "This will fail if:\n", + "* You do not have permission to create a workspace in the resource group\n", + "* You do not have permission to create a resource group if it's non-existing.\n", + "* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n", + "\n", + "If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "# Create the workspace using the specified parameters\n", + "ws = Workspace.create(name = workspace_name,\n", + " subscription_id = subscription_id,\n", + " resource_group = resource_group, \n", + " location = workspace_region,\n", + " create_resource_group = True,\n", + " exist_ok = True)\n", + "ws.get_details()\n", + "\n", + "# write the details of the workspace to a configuration file to the notebook library\n", + "ws.write_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create compute resources for your training experiments\n", + "\n", + "Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n", + "\n", + "To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n", + "\n", + "The cluster parameters are:\n", + "* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n", + "\n", + "```shell\n", + "az vm list-skus -o tsv\n", + "```\n", + "* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while not in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n", + "* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n", + "\n", + "\n", + "To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print(\"Found existing cpu-cluster\")\n", + "except ComputeTargetException:\n", + " print(\"Creating new cpu-cluster\")\n", + " \n", + " # Specify the configuration for the new cluster\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n", + " min_nodes=0,\n", + " max_nodes=4)\n", + "\n", + " # Create the cluster with the specified name and configuration\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + " \n", + " # Wait for the cluster to complete, show the output log\n", + " cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your GPU cluster\n", + "gpu_cluster_name = \"gpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n", + " print(\"Found existing gpu cluster\")\n", + "except ComputeTargetException:\n", + " print(\"Creating new gpu-cluster\")\n", + " \n", + " # Specify the configuration for the new cluster\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n", + " min_nodes=0,\n", + " max_nodes=4)\n", + " # Create the cluster with the specified name and configuration\n", + " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n", + "\n", + " # Wait for the cluster to complete, show the output log\n", + " gpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Next steps\n", + "\n", + "In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n", + "\n", + "If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 } diff --git a/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb b/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb index 79d98eac..94cd67af 100644 --- a/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb +++ b/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb @@ -1,498 +1,498 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Enabling App Insights for Services in Production\n", - "With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n", - "\n", - "\n", - "## What does Application Insights monitor?\n", - "It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n", - "\n", - "\n", - "## What is different compared to standard production deployment process?\n", - "If you want to enable generic App Insights for a service run:\n", - "```python\n", - "aks_service= Webservice(ws, \"aks-w-dc2\")\n", - "aks_service.update(enable_app_insights=True)```\n", - "Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n", - "\n", - "If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n", - "1. Update scoring file.\n", - "2. Update aks configuration.\n", - "3. Build new image and deploy it. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Import your dependencies" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.webservice import AksWebservice\n", - "import azureml.core\n", - "import json\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Set up your configuration and create a workspace\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Register Model\n", - "Register an existing trained model, add descirption and tags." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "from azureml.core.model import Model\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", - " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)\n", - "\n", - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. *Update your scoring file with custom print statements*\n", - "Here is an example:\n", - "### a. In your init function add:\n", - "```python\n", - "print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n", - "\n", - "### b. In your run function add:\n", - "```python\n", - "print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy \n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "import time\n", - "\n", - "def init():\n", - " global model\n", - " #Print statement for appinsights custom traces:\n", - " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", - " \n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n", - " # this call should return the path to the model.pkl file on the local disk.\n", - " model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n", - " \n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - " \n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))\n", - " # you can return any datatype as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " error = str(e)\n", - " print (error + time.strftime(\"%H:%M:%S\"))\n", - " return error" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. *Create myenv.yml file*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 6. Create your new Image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"Image with ridge regression model\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", - " )\n", - "\n", - "image = ContainerImage.create(name = \"myimage1\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy to ACI (Optional)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", - " description = 'Predict diabetes using regression model',\n", - " enable_app_insights = True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'my-aci-service-4'\n", - "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,28,13,45,54,6,57,8,8,10], \n", - " [101,9,8,37,6,45,4,3,2,41]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding='utf8')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state == \"Healthy\":\n", - " prediction = aci_service.run(input_data=test_sample)\n", - " print(prediction)\n", - "else:\n", - " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aci_service.error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 7. Deploy to AKS service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create AKS compute if you haven't done so." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the default configuration (can also provide parameters to customize)\n", - "prov_config = AksCompute.provisioning_configuration()\n", - "\n", - "aks_name = 'my-aks-test3' \n", - "# Create the cluster\n", - "aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_target.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you already have a cluster you can attach the service to it:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```python \n", - "%%time\n", - "resource_id = '/subscriptions//resourcegroups//providers/Microsoft.ContainerService/managedClusters/'\n", - "create_name= 'myaks4'\n", - "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", - "aks_target = ComputeTarget.attach(workspace = ws, \n", - " name = create_name, \n", - " attach_configuration=attach_config)\n", - "## Wait for the operation to complete\n", - "aks_target.wait_for_provisioning(True)```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### a. *Activate App Insights through updating AKS Webservice configuration*\n", - "In order to enable App Insights in your service you will need to update your AKS configuration file:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Set the web service configuration\n", - "aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### b. Deploy your service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aks_target.provisioning_state== \"Succeeded\": \n", - " aks_service_name ='aks-w-dc5'\n", - " aks_service = Webservice.deploy_from_image(workspace = ws, \n", - " name = aks_service_name,\n", - " image = image,\n", - " deployment_config = aks_config,\n", - " deployment_target = aks_target\n", - " )\n", - " aks_service.wait_for_deployment(show_output = True)\n", - " print(aks_service.state)\n", - "else:\n", - " raise ValueError(\"AKS provisioning failed. Error: \", aks_service.error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 8. Test your service " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,28,13,45,54,6,57,8,8,10], \n", - " [101,9,8,37,6,45,4,3,2,41]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding='utf8')\n", - "\n", - "if aks_service.state == \"Healthy\":\n", - " prediction = aks_service.run(input_data=test_sample)\n", - " print(prediction)\n", - "else:\n", - " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aks_service.error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 9. See your service telemetry in App Insights\n", - "1. Go to the [Azure Portal](https://portal.azure.com/)\n", - "2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n", - "3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n", - "4. Click on the top banner \"Analytics\"\n", - "5. In the \"Schema\" section select \"traces\" and run your query.\n", - "6. Voila! All your custom traces should be there." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Disable App Insights" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "aks_service.update(enable_app_insights=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service.delete()\n", - "aci_service.delete()\n", - "image.delete()\n", - "model.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "shipatel" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.3" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Enabling App Insights for Services in Production\n", + "With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n", + "\n", + "\n", + "## What does Application Insights monitor?\n", + "It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n", + "\n", + "\n", + "## What is different compared to standard production deployment process?\n", + "If you want to enable generic App Insights for a service run:\n", + "```python\n", + "aks_service= Webservice(ws, \"aks-w-dc2\")\n", + "aks_service.update(enable_app_insights=True)```\n", + "Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n", + "\n", + "If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n", + "1. Update scoring file.\n", + "2. Update aks configuration.\n", + "3. Build new image and deploy it. " + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Import your dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.compute import AksCompute, ComputeTarget\n", + "from azureml.core.webservice import AksWebservice\n", + "import azureml.core\n", + "import json\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Set up your configuration and create a workspace\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Register Model\n", + "Register an existing trained model, add descirption and tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", + " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. *Update your scoring file with custom print statements*\n", + "Here is an example:\n", + "### a. In your init function add:\n", + "```python\n", + "print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n", + "\n", + "### b. In your run function add:\n", + "```python\n", + "print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy \n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "import time\n", + "\n", + "def init():\n", + " global model\n", + " #Print statement for appinsights custom traces:\n", + " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", + " \n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n", + " # this call should return the path to the model.pkl file on the local disk.\n", + " model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n", + " \n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + " \n", + "\n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))\n", + " # you can return any datatype as long as it is JSON-serializable\n", + " return result.tolist()\n", + " except Exception as e:\n", + " error = str(e)\n", + " print (error + time.strftime(\"%H:%M:%S\"))\n", + " return error" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. *Create myenv.yml file*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Create your new Image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"Image with ridge regression model\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", + " )\n", + "\n", + "image = ContainerImage.create(name = \"myimage1\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy to ACI (Optional)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", + " description = 'Predict diabetes using regression model',\n", + " enable_app_insights = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'my-aci-service-4'\n", + "print(aci_service_name)\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,28,13,45,54,6,57,8,8,10], \n", + " [101,9,8,37,6,45,4,3,2,41]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding='utf8')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state == \"Healthy\":\n", + " prediction = aci_service.run(input_data=test_sample)\n", + " print(prediction)\n", + "else:\n", + " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aci_service.error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Deploy to AKS service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create AKS compute if you haven't done so." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the default configuration (can also provide parameters to customize)\n", + "prov_config = AksCompute.provisioning_configuration()\n", + "\n", + "aks_name = 'my-aks-test3' \n", + "# Create the cluster\n", + "aks_target = ComputeTarget.create(workspace = ws, \n", + " name = aks_name, \n", + " provisioning_configuration = prov_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_target.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(aks_target.provisioning_state)\n", + "print(aks_target.provisioning_errors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you already have a cluster you can attach the service to it:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python \n", + "%%time\n", + "resource_id = '/subscriptions//resourcegroups//providers/Microsoft.ContainerService/managedClusters/'\n", + "create_name= 'myaks4'\n", + "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", + "aks_target = ComputeTarget.attach(workspace = ws, \n", + " name = create_name, \n", + " attach_configuration=attach_config)\n", + "## Wait for the operation to complete\n", + "aks_target.wait_for_provisioning(True)```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### a. *Activate App Insights through updating AKS Webservice configuration*\n", + "In order to enable App Insights in your service you will need to update your AKS configuration file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Set the web service configuration\n", + "aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Deploy your service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aks_target.provisioning_state== \"Succeeded\": \n", + " aks_service_name ='aks-w-dc5'\n", + " aks_service = Webservice.deploy_from_image(workspace = ws, \n", + " name = aks_service_name,\n", + " image = image,\n", + " deployment_config = aks_config,\n", + " deployment_target = aks_target\n", + " )\n", + " aks_service.wait_for_deployment(show_output = True)\n", + " print(aks_service.state)\n", + "else:\n", + " raise ValueError(\"AKS provisioning failed. Error: \", aks_service.error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Test your service " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,28,13,45,54,6,57,8,8,10], \n", + " [101,9,8,37,6,45,4,3,2,41]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding='utf8')\n", + "\n", + "if aks_service.state == \"Healthy\":\n", + " prediction = aks_service.run(input_data=test_sample)\n", + " print(prediction)\n", + "else:\n", + " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aks_service.error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. See your service telemetry in App Insights\n", + "1. Go to the [Azure Portal](https://portal.azure.com/)\n", + "2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n", + "3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n", + "4. Click on the top banner \"Analytics\"\n", + "5. In the \"Schema\" section select \"traces\" and run your query.\n", + "6. Voila! All your custom traces should be there." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Disable App Insights" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "aks_service.update(enable_app_insights=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service.delete()\n", + "aci_service.delete()\n", + "image.delete()\n", + "model.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "shipatel" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb b/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb index 2540321f..084e823b 100644 --- a/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb +++ b/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb @@ -1,478 +1,478 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Enabling Data Collection for Models in Production\n", - "With this notebook, you can learn how to collect input model data from your Azure Machine Learning service in an Azure Blob storage. Once enabled, this data collected gives you the opportunity:\n", - "\n", - "* Monitor data drifts as production data enters your model\n", - "* Make better decisions on when to retrain or optimize your model\n", - "* Retrain your model with the data collected\n", - "\n", - "## What data is collected?\n", - "* Model input data (voice, images, and video are not supported) from services deployed in Azure Kubernetes Cluster (AKS)\n", - "* Model predictions using production input data.\n", - "\n", - "**Note:** pre-aggregation or pre-calculations on this data are done by user and not included in this version of the product.\n", - "\n", - "## What is different compared to standard production deployment process?\n", - "1. Update scoring file.\n", - "2. Update yml file with new dependency.\n", - "3. Update aks configuration.\n", - "4. Build new image and deploy it. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Import your dependencies" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.webservice import Webservice, AksWebservice\n", - "import azureml.core\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Set up your configuration and create a workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Register Model\n", - "Register an existing trained model, add descirption and tags." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "from azureml.core.model import Model\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", - " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)\n", - "\n", - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. *Update your scoring file with Data Collection*\n", - "The file below, compared to the file used in notebook 11, has the following changes:\n", - "### a. Import the module\n", - "```python \n", - "from azureml.monitoring import ModelDataCollector```\n", - "### b. In your init function add:\n", - "```python \n", - "global inputs_dc, prediction_d\n", - "inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n", - "prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n", - " \n", - "* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n", - "* CorrelationId: is an optional parameter, you do not need to set it up if your model doesn't require it. Having a correlationId in place does help you for easier mapping with other data. (Examples include: LoanNumber, CustomerId, etc.)\n", - "* Feature Names: These need to be set up in the order of your features in order for them to have column names when the .csv is created.\n", - "\n", - "### c. In your run function add:\n", - "```python\n", - "inputs_dc.collect(data)\n", - "prediction_dc.collect(result)```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy \n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "from azureml.monitoring import ModelDataCollector\n", - "import time\n", - "\n", - "def init():\n", - " global model\n", - " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n", - " # this call should return the path to the model.pkl file on the local disk.\n", - " model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - " global inputs_dc, prediction_dc\n", - " # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n", - " inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n", - " # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n", - " prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n", - " \n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " global inputs_dc, prediction_dc\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n", - " inputs_dc.collect(data) #this call is saving our input data into our blob\n", - " prediction_dc.collect(result)#this call is saving our prediction data into our blob\n", - " print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n", - " # you can return any data type as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " error = str(e)\n", - " print (error + time.strftime(\"%H:%M:%S\"))\n", - " return error" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. *Update your myenv.yml file with the required module*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "myenv.add_pip_package(\"azureml-monitoring\")\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 6. Create your new Image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"Image with ridge regression model\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", - " )\n", - "\n", - "image = ContainerImage.create(name = \"myimage1\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 7. Deploy to AKS service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create AKS compute if you haven't done so." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the default configuration (can also provide parameters to customize)\n", - "prov_config = AksCompute.provisioning_configuration()\n", - "\n", - "aks_name = 'my-aks-test1' \n", - "# Create the cluster\n", - "aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_target.wait_for_completion(show_output = True)\n", - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you already have a cluster you can attach the service to it:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```python \n", - " %%time\n", - " resource_id = '/subscriptions//resourcegroups//providers/Microsoft.ContainerService/managedClusters/'\n", - " create_name= 'myaks4'\n", - " attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", - " aks_target = ComputeTarget.attach(workspace = ws, \n", - " name = create_name, \n", - " attach_configuration=attach_config)\n", - " ## Wait for the operation to complete\n", - " aks_target.wait_for_provisioning(True)```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### a. *Activate Data Collection and App Insights through updating AKS Webservice configuration*\n", - "In order to enable Data Collection and App Insights in your service you will need to update your AKS configuration file:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Set the web service configuration\n", - "aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### b. Deploy your service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aks_target.provisioning_state== \"Succeeded\": \n", - " aks_service_name ='aks-w-dc0'\n", - " aks_service = Webservice.deploy_from_image(workspace = ws, \n", - " name = aks_service_name,\n", - " image = image,\n", - " deployment_config = aks_config,\n", - " deployment_target = aks_target\n", - " )\n", - " aks_service.wait_for_deployment(show_output = True)\n", - " print(aks_service.state)\n", - "else: \n", - " raise ValueError(\"aks provisioning failed, can't deploy service. Error: \", aks_service.error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 8. Test your service and send some data\n", - "**Note**: It will take around 15 mins for your data to appear in your blob.\n", - "The data will appear in your Azure Blob following this format:\n", - "\n", - "/modeldata/subscriptionid/resourcegroupname/workspacename/webservicename/modelname/modelversion/identifier/year/month/day/data.csv " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,54,6,7,8,88,10], \n", - " [10,9,8,37,36,45,4,33,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "if aks_service.state == \"Healthy\":\n", - " prediction = aks_service.run(input_data=test_sample)\n", - " print(prediction)\n", - "else:\n", - " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aks_service.error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 9. Validate you data and analyze it\n", - "You can look into your data following this path format in your Azure Blob (it takes up to 15 minutes for the data to appear):\n", - "\n", - "/modeldata/**subscriptionid>**/**resourcegroupname>**/**workspacename>**/**webservicename>**/**modelname>**/**modelversion>>**/**identifier>**/*year/month/day*/data.csv \n", - "\n", - "For doing further analysis you have multiple options:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### a. Create DataBricks cluter and connect it to your blob\n", - "https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal or in your databricks workspace you can look for the template \"Azure Blob Storage Import Example Notebook\".\n", - "\n", - "\n", - "Here is an example for setting up the file location to extract the relevant data:\n", - "\n", - " file_location = \"wasbs://mycontainer@storageaccountname.blob.core.windows.net/unknown/unknown/unknown-bigdataset-unknown/my_iterate_parking_inputs/2018/°/°/data.csv\" \n", - "file_type = \"csv\"\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### b. Connect Blob to Power Bi (Small Data only)\n", - "1. Download and Open PowerBi Desktop\n", - "2. Select \"Get Data\" and click on \"Azure Blob Storage\" >> Connect\n", - "3. Add your storage account and enter your storage key.\n", - "4. Select the container where your Data Collection is stored and click on Edit. \n", - "5. In the query editor, click under \"Name\" column and add your Storage account Model path into the filter. Note: if you want to only look into files from a specific year or month, just expand the filter path. For example, just look into March data: /modeldata/subscriptionid>/resourcegroupname>/workspacename>/webservicename>/modelname>/modelversion>/identifier>/year>/3\n", - "6. Click on the double arrow aside the \"Content\" column to combine the files. \n", - "7. Click OK and the data will preload.\n", - "8. You can now click Close and Apply and start building your custom reports on your Model Input data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Disable Data Collection" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "aks_service.update(collect_model_data=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service.delete()\n", - "image.delete()\n", - "model.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "shipatel" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.3" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Enabling Data Collection for Models in Production\n", + "With this notebook, you can learn how to collect input model data from your Azure Machine Learning service in an Azure Blob storage. Once enabled, this data collected gives you the opportunity:\n", + "\n", + "* Monitor data drifts as production data enters your model\n", + "* Make better decisions on when to retrain or optimize your model\n", + "* Retrain your model with the data collected\n", + "\n", + "## What data is collected?\n", + "* Model input data (voice, images, and video are not supported) from services deployed in Azure Kubernetes Cluster (AKS)\n", + "* Model predictions using production input data.\n", + "\n", + "**Note:** pre-aggregation or pre-calculations on this data are done by user and not included in this version of the product.\n", + "\n", + "## What is different compared to standard production deployment process?\n", + "1. Update scoring file.\n", + "2. Update yml file with new dependency.\n", + "3. Update aks configuration.\n", + "4. Build new image and deploy it. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Import your dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.compute import AksCompute, ComputeTarget\n", + "from azureml.core.webservice import Webservice, AksWebservice\n", + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Set up your configuration and create a workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Register Model\n", + "Register an existing trained model, add descirption and tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", + " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. *Update your scoring file with Data Collection*\n", + "The file below, compared to the file used in notebook 11, has the following changes:\n", + "### a. Import the module\n", + "```python \n", + "from azureml.monitoring import ModelDataCollector```\n", + "### b. In your init function add:\n", + "```python \n", + "global inputs_dc, prediction_d\n", + "inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n", + "prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n", + " \n", + "* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n", + "* CorrelationId: is an optional parameter, you do not need to set it up if your model doesn't require it. Having a correlationId in place does help you for easier mapping with other data. (Examples include: LoanNumber, CustomerId, etc.)\n", + "* Feature Names: These need to be set up in the order of your features in order for them to have column names when the .csv is created.\n", + "\n", + "### c. In your run function add:\n", + "```python\n", + "inputs_dc.collect(data)\n", + "prediction_dc.collect(result)```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy \n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "from azureml.monitoring import ModelDataCollector\n", + "import time\n", + "\n", + "def init():\n", + " global model\n", + " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n", + " # this call should return the path to the model.pkl file on the local disk.\n", + " model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + " global inputs_dc, prediction_dc\n", + " # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n", + " inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n", + " # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n", + " prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n", + " \n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " global inputs_dc, prediction_dc\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n", + " inputs_dc.collect(data) #this call is saving our input data into our blob\n", + " prediction_dc.collect(result)#this call is saving our prediction data into our blob\n", + " print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n", + " # you can return any data type as long as it is JSON-serializable\n", + " return result.tolist()\n", + " except Exception as e:\n", + " error = str(e)\n", + " print (error + time.strftime(\"%H:%M:%S\"))\n", + " return error" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. *Update your myenv.yml file with the required module*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "myenv.add_pip_package(\"azureml-monitoring\")\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Create your new Image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"Image with ridge regression model\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", + " )\n", + "\n", + "image = ContainerImage.create(name = \"myimage1\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Deploy to AKS service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create AKS compute if you haven't done so." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the default configuration (can also provide parameters to customize)\n", + "prov_config = AksCompute.provisioning_configuration()\n", + "\n", + "aks_name = 'my-aks-test1' \n", + "# Create the cluster\n", + "aks_target = ComputeTarget.create(workspace = ws, \n", + " name = aks_name, \n", + " provisioning_configuration = prov_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_target.wait_for_completion(show_output = True)\n", + "print(aks_target.provisioning_state)\n", + "print(aks_target.provisioning_errors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you already have a cluster you can attach the service to it:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python \n", + " %%time\n", + " resource_id = '/subscriptions//resourcegroups//providers/Microsoft.ContainerService/managedClusters/'\n", + " create_name= 'myaks4'\n", + " attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", + " aks_target = ComputeTarget.attach(workspace = ws, \n", + " name = create_name, \n", + " attach_configuration=attach_config)\n", + " ## Wait for the operation to complete\n", + " aks_target.wait_for_provisioning(True)```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### a. *Activate Data Collection and App Insights through updating AKS Webservice configuration*\n", + "In order to enable Data Collection and App Insights in your service you will need to update your AKS configuration file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Set the web service configuration\n", + "aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Deploy your service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aks_target.provisioning_state== \"Succeeded\": \n", + " aks_service_name ='aks-w-dc0'\n", + " aks_service = Webservice.deploy_from_image(workspace = ws, \n", + " name = aks_service_name,\n", + " image = image,\n", + " deployment_config = aks_config,\n", + " deployment_target = aks_target\n", + " )\n", + " aks_service.wait_for_deployment(show_output = True)\n", + " print(aks_service.state)\n", + "else: \n", + " raise ValueError(\"aks provisioning failed, can't deploy service. Error: \", aks_service.error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Test your service and send some data\n", + "**Note**: It will take around 15 mins for your data to appear in your blob.\n", + "The data will appear in your Azure Blob following this format:\n", + "\n", + "/modeldata/subscriptionid/resourcegroupname/workspacename/webservicename/modelname/modelversion/identifier/year/month/day/data.csv " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,54,6,7,8,88,10], \n", + " [10,9,8,37,36,45,4,33,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "if aks_service.state == \"Healthy\":\n", + " prediction = aks_service.run(input_data=test_sample)\n", + " print(prediction)\n", + "else:\n", + " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aks_service.error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Validate you data and analyze it\n", + "You can look into your data following this path format in your Azure Blob (it takes up to 15 minutes for the data to appear):\n", + "\n", + "/modeldata/**subscriptionid>**/**resourcegroupname>**/**workspacename>**/**webservicename>**/**modelname>**/**modelversion>>**/**identifier>**/*year/month/day*/data.csv \n", + "\n", + "For doing further analysis you have multiple options:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### a. Create DataBricks cluter and connect it to your blob\n", + "https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal or in your databricks workspace you can look for the template \"Azure Blob Storage Import Example Notebook\".\n", + "\n", + "\n", + "Here is an example for setting up the file location to extract the relevant data:\n", + "\n", + " file_location = \"wasbs://mycontainer@storageaccountname.blob.core.windows.net/unknown/unknown/unknown-bigdataset-unknown/my_iterate_parking_inputs/2018/°/°/data.csv\" \n", + "file_type = \"csv\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Connect Blob to Power Bi (Small Data only)\n", + "1. Download and Open PowerBi Desktop\n", + "2. Select \"Get Data\" and click on \"Azure Blob Storage\" >> Connect\n", + "3. Add your storage account and enter your storage key.\n", + "4. Select the container where your Data Collection is stored and click on Edit. \n", + "5. In the query editor, click under \"Name\" column and add your Storage account Model path into the filter. Note: if you want to only look into files from a specific year or month, just expand the filter path. For example, just look into March data: /modeldata/subscriptionid>/resourcegroupname>/workspacename>/webservicename>/modelname>/modelversion>/identifier>/year>/3\n", + "6. Click on the double arrow aside the \"Content\" column to combine the files. \n", + "7. Click OK and the data will preload.\n", + "8. You can now click Close and Apply and start building your custom reports on your Model Input data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Disable Data Collection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "aks_service.update(collect_model_data=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service.delete()\n", + "image.delete()\n", + "model.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "shipatel" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb b/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb index 9c147c3c..8aba52d8 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb @@ -1,443 +1,443 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# YOLO Real-time Object Detection using ONNX on AzureML\n", - "\n", - "This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n", - "\n", - "## What is ONNX\n", - "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", - "\n", - "## YOLO Details\n", - "You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "To make the best use of your time, make sure you have done the following:\n", - "\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (config.json)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Install necessary packages\n", - "\n", - "You'll need to run the following commands to use this tutorial:\n", - "\n", - "```sh\n", - "pip install onnxmltools\n", - "pip install coremltools # use this on Linux and Mac\n", - "pip install git+https://github.com/apple/coremltools # use this on Windows\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Convert model to ONNX\n", - "\n", - "First we download the CoreML model. We use the CoreML model from [Matthijs Hollemans's tutorial](https://github.com/hollance/YOLO-CoreML-MPSNNGraph). This may take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import urllib.request\n", - "\n", - "coreml_model_url = \"https://github.com/hollance/YOLO-CoreML-MPSNNGraph/raw/master/TinyYOLO-CoreML/TinyYOLO-CoreML/TinyYOLO.mlmodel\"\n", - "urllib.request.urlretrieve(coreml_model_url, filename=\"TinyYOLO.mlmodel\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we use ONNXMLTools to convert the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import onnxmltools\n", - "import coremltools\n", - "\n", - "# Load a CoreML model\n", - "coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n", - "\n", - "# Convert from CoreML into ONNX\n", - "onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n", - "\n", - "# Save ONNX model\n", - "onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n", - "\n", - "import os\n", - "print(os.path.getsize('tinyyolov2.onnx'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploying as a web service with Azure ML\n", - "\n", - "### Load Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Registering your model with Azure ML\n", - "\n", - "Now we upload the model and register it in the workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(model_path = \"tinyyolov2.onnx\",\n", - " model_name = \"tinyyolov2\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"TinyYOLO\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Displaying your registered models\n", - "\n", - "You can optionally list out all the models that you have registered in this workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, m in models.items():\n", - " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write scoring file\n", - "\n", - "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import time\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import numpy as np # we're going to use numpy to process input and output data\n", - "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", - "\n", - "def init():\n", - " global session\n", - " model = Model.get_model_path(model_name = 'tinyyolov2')\n", - " session = onnxruntime.InferenceSession(model)\n", - "\n", - "def preprocess(input_data_json):\n", - " # convert the JSON data into the tensor input\n", - " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", - "\n", - "def postprocess(result):\n", - " return np.array(result).tolist()\n", - "\n", - "def run(input_data_json):\n", - " try:\n", - " start = time.time() # start timer\n", - " input_data = preprocess(input_data_json)\n", - " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", - " result = session.run([], {input_name: input_data})\n", - " end = time.time() # stop timer\n", - " return {\"result\": postprocess(result),\n", - " \"time\": end - start}\n", - " except Exception as e:\n", - " result = str(e)\n", - " return {\"error\": result}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create container image\n", - "First we create a YAML file that specifies which dependencies we would like to see in our container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we have Azure ML create the container. This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " docker_file = \"Dockerfile\",\n", - " description = \"TinyYOLO ONNX Demo\",\n", - " tags = {\"demo\": \"onnx\"}\n", - " )\n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnxyolo\",\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all set! Let's get our model chugging.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'web service for TinyYOLO ONNX model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from random import randint\n", - "\n", - "aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - " aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "\n", - "If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(aci_service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you are eventually done using the web service, remember to delete it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "viswamy" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# YOLO Real-time Object Detection using ONNX on AzureML\n", + "\n", + "This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n", + "\n", + "## What is ONNX\n", + "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", + "\n", + "## YOLO Details\n", + "You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "To make the best use of your time, make sure you have done the following:\n", + "\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (config.json)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Install necessary packages\n", + "\n", + "You'll need to run the following commands to use this tutorial:\n", + "\n", + "```sh\n", + "pip install onnxmltools\n", + "pip install coremltools # use this on Linux and Mac\n", + "pip install git+https://github.com/apple/coremltools # use this on Windows\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Convert model to ONNX\n", + "\n", + "First we download the CoreML model. We use the CoreML model from [Matthijs Hollemans's tutorial](https://github.com/hollance/YOLO-CoreML-MPSNNGraph). This may take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib.request\n", + "\n", + "coreml_model_url = \"https://github.com/hollance/YOLO-CoreML-MPSNNGraph/raw/master/TinyYOLO-CoreML/TinyYOLO-CoreML/TinyYOLO.mlmodel\"\n", + "urllib.request.urlretrieve(coreml_model_url, filename=\"TinyYOLO.mlmodel\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we use ONNXMLTools to convert the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxmltools\n", + "import coremltools\n", + "\n", + "# Load a CoreML model\n", + "coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n", + "\n", + "# Convert from CoreML into ONNX\n", + "onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n", + "\n", + "# Save ONNX model\n", + "onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n", + "\n", + "import os\n", + "print(os.path.getsize('tinyyolov2.onnx'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploying as a web service with Azure ML\n", + "\n", + "### Load Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Registering your model with Azure ML\n", + "\n", + "Now we upload the model and register it in the workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path = \"tinyyolov2.onnx\",\n", + " model_name = \"tinyyolov2\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"TinyYOLO\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Displaying your registered models\n", + "\n", + "You can optionally list out all the models that you have registered in this workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, m in models.items():\n", + " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write scoring file\n", + "\n", + "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import time\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import numpy as np # we're going to use numpy to process input and output data\n", + "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", + "\n", + "def init():\n", + " global session\n", + " model = Model.get_model_path(model_name = 'tinyyolov2')\n", + " session = onnxruntime.InferenceSession(model)\n", + "\n", + "def preprocess(input_data_json):\n", + " # convert the JSON data into the tensor input\n", + " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", + "\n", + "def postprocess(result):\n", + " return np.array(result).tolist()\n", + "\n", + "def run(input_data_json):\n", + " try:\n", + " start = time.time() # start timer\n", + " input_data = preprocess(input_data_json)\n", + " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", + " result = session.run([], {input_name: input_data})\n", + " end = time.time() # stop timer\n", + " return {\"result\": postprocess(result),\n", + " \"time\": end - start}\n", + " except Exception as e:\n", + " result = str(e)\n", + " return {\"error\": result}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create container image\n", + "First we create a YAML file that specifies which dependencies we would like to see in our container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we have Azure ML create the container. This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " docker_file = \"Dockerfile\",\n", + " description = \"TinyYOLO ONNX Demo\",\n", + " tags = {\"demo\": \"onnx\"}\n", + " )\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnxyolo\",\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all set! Let's get our model chugging.\n", + "\n", + "### Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'web service for TinyYOLO ONNX model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from random import randint\n", + "\n", + "aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + " aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "\n", + "If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(aci_service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you are eventually done using the web service, remember to delete it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#aci_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "viswamy" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb b/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb index 3f3a0fd9..d31ca8b0 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb @@ -1,816 +1,816 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Facial Expression Recognition (FER+) using ONNX Runtime on Azure ML\n", - "\n", - "This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", - "\n", - "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", - "\n", - "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization. We use the CPU version of ONNX Runtime in this tutorial, but will soon be releasing an additional tutorial for deploying this model using ONNX Runtime GPU.\n", - "\n", - "#### Tutorial Objectives:\n", - "\n", - "1. Describe the FER+ dataset and pretrained Convolutional Neural Net ONNX model for Emotion Recognition, stored in the ONNX model zoo.\n", - "2. Deploy and run the pretrained FER+ ONNX model on an Azure Machine Learning instance\n", - "3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "### 1. Install Azure ML SDK and create a new workspace\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", - "\n", - "### 2. Install additional packages needed for this Notebook\n", - "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", - "\n", - "```sh\n", - "(myenv) $ pip install matplotlib onnx opencv-python\n", - "```\n", - "\n", - "**Debugging tip**: Make sure that to activate your virtual environment (myenv) before you re-launch this notebook using the `jupyter notebook` comand. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", - "\n", - "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", - "\n", - "In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# urllib is a built-in Python library to download files from URLs\n", - "\n", - "# Objective: retrieve the latest version of the ONNX Emotion FER+ model files from the\n", - "# ONNX Model Zoo and save it in the same folder as this tutorial\n", - "\n", - "import urllib.request\n", - "\n", - "onnx_model_url = \"https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz\"\n", - "\n", - "urllib.request.urlretrieve(onnx_model_url, filename=\"emotion_ferplus.tar.gz\")\n", - "\n", - "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", - "# code from the command line instead of the notebook kernel\n", - "\n", - "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", - "\n", - "!tar xvzf emotion_ferplus.tar.gz" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy a VM with your ONNX model in the Cloud\n", - "\n", - "### Load Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Registering your model with Azure ML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_dir = \"emotion_ferplus\" # replace this with the location of your model files\n", - "\n", - "# leave as is if it's in the same folder as this notebook" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(model_path = model_dir + \"/\" + \"model.onnx\",\n", - " model_name = \"onnx_emotion\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"FER+ emotion recognition CNN from ONNX Model Zoo\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Optional: Displaying your registered models\n", - "\n", - "This step is not required, so feel free to skip it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, m in models.items():\n", - " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ONNX FER+ Model Methodology\n", - "\n", - "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n", - "\n", - "The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n", - "\n", - "You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n", - "\n", - "![](https://raw.githubusercontent.com/Microsoft/FERPlus/master/FER+vsFER.png)\n", - "\n", - "***Input: Photos of cropped faces from FER+ Dataset***\n", - "\n", - "***Task: Classify each facial image into its appropriate emotions in the emotion table***\n", - "\n", - "``` emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7} ```\n", - "\n", - "***Output: Emotion prediction for input image***\n", - "\n", - "\n", - "Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# for images and plots in this notebook\n", - "import matplotlib.pyplot as plt \n", - "\n", - "# display images inline\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Model Description\n", - "\n", - "The FER+ model from the ONNX Model Zoo is summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image from Barsoum et. al's paper [\"Training Deep Networks for Facial Expression Recognition\n", - "with Crowd-Sourced Label Distribution\"](https://arxiv.org/pdf/1608.01041.pdf), with our (64 x 64) input images and our output probabilities for each of the labels." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![](https://raw.githubusercontent.com/vinitra/FERPlus/master/emotion_model_img.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify our Score and Environment Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", - "\n", - "### Write Score File\n", - "\n", - "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import onnxruntime\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import time\n", - "\n", - "def init():\n", - " global session, input_name, output_name\n", - " model = Model.get_model_path(model_name = 'onnx_emotion')\n", - " session = onnxruntime.InferenceSession(model, None)\n", - " input_name = session.get_inputs()[0].name\n", - " output_name = session.get_outputs()[0].name \n", - " \n", - "def run(input_data):\n", - " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", - " We will call the run function later from our Jupyter Notebook \n", - " so our azure service can evaluate our model input in the cloud. '''\n", - "\n", - " try:\n", - " # load in our data, convert to readable format\n", - " data = np.array(json.loads(input_data)['data']).astype('float32')\n", - " \n", - " start = time.time()\n", - " r = session.run([output_name], {input_name : data})\n", - " end = time.time()\n", - " \n", - " result = emotion_map(postprocess(r[0]))\n", - " \n", - " result_dict = {\"result\": result,\n", - " \"time_in_sec\": [end - start]}\n", - " except Exception as e:\n", - " result_dict = {\"error\": str(e)}\n", - " \n", - " return json.dumps(result_dict)\n", - "\n", - "def emotion_map(classes, N=1):\n", - " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", - " top N emotional labels that fit the picture.\"\"\"\n", - " \n", - " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", - " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", - " \n", - " emotion_keys = list(emotion_table.keys())\n", - " emotions = []\n", - " for i in range(N):\n", - " emotions.append(emotion_keys[classes[i]])\n", - " return emotions\n", - "\n", - "def softmax(x):\n", - " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", - " x = x.reshape(-1)\n", - " e_x = np.exp(x - np.max(x))\n", - " return e_x / e_x.sum(axis=0)\n", - "\n", - "def postprocess(scores):\n", - " \"\"\"This function takes the scores generated by the network and \n", - " returns the class IDs in decreasing order of probability.\"\"\"\n", - " prob = softmax(scores)\n", - " prob = np.squeeze(prob)\n", - " classes = np.argsort(prob)[::-1]\n", - " return classes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write Environment File" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the Container Image\n", - "\n", - "This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " docker_file = \"Dockerfile\",\n", - " description = \"Emotion ONNX Runtime container\",\n", - " tags = {\"demo\": \"onnx\"})\n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnximage\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'ONNX for emotion recognition model')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'onnx-demo-emotion'\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - "\n", - " # If your deployment fails, make sure to delete your aci_service before trying again!\n", - " # aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Success!\n", - "\n", - "If you've made it this far, you've deployed a working VM with a facial emotion recognition model running in the cloud using Azure ML. Congratulations!\n", - "\n", - "Let's see how well our model deals with our test images." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Testing and Evaluation\n", - "\n", - "### Useful Helper Functions\n", - "\n", - "We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def emotion_map(classes, N=1):\n", - " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", - " top N emotional labels that fit the picture.\"\"\"\n", - " \n", - " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", - " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", - " \n", - " emotion_keys = list(emotion_table.keys())\n", - " emotions = []\n", - " for c in range(N):\n", - " emotions.append(emotion_keys[classes[c]])\n", - " return emotions\n", - "\n", - "def softmax(x):\n", - " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", - " x = x.reshape(-1)\n", - " e_x = np.exp(x - np.max(x))\n", - " return e_x / e_x.sum(axis=0)\n", - "\n", - "def postprocess(scores):\n", - " \"\"\"This function takes the scores generated by the network and \n", - " returns the class IDs in decreasing order of probability.\"\"\"\n", - " prob = softmax(scores)\n", - " prob = np.squeeze(prob)\n", - " classes = np.argsort(prob)[::-1]\n", - " return classes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load Test Data\n", - "\n", - "These are already in your directory from your ONNX model download (from the model zoo).\n", - "\n", - "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to manipulate our arrays\n", - "import numpy as np \n", - "\n", - "# read in test data protobuf files included with the model\n", - "import onnx\n", - "from onnx import numpy_helper\n", - "\n", - "# to use parsers to read in our model/data\n", - "import json\n", - "import os\n", - "\n", - "test_inputs = []\n", - "test_outputs = []\n", - "\n", - "# read in 3 testing images from .pb files\n", - "test_data_size = 3\n", - "\n", - "for num in np.arange(test_data_size):\n", - " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(num), 'input_0.pb')\n", - " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(num), 'output_0.pb')\n", - " \n", - " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", - " tensor = onnx.TensorProto()\n", - " with open(input_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " input_data = numpy_helper.to_array(tensor)\n", - " test_inputs.append(input_data)\n", - " \n", - " with open(output_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " output_data = numpy_helper.to_array(tensor)\n", - " output_processed = emotion_map(postprocess(output_data[0]))[0]\n", - " test_outputs.append(output_processed)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "### Show some sample images\n", - "We use `matplotlib` to plot 3 test images from the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "plt.figure(figsize = (20, 20))\n", - "for test_image in np.arange(3):\n", - " test_inputs[test_image].reshape(1, 64, 64)\n", - " plt.subplot(1, 8, test_image+1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n", - " plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.gray)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run evaluation / prediction" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6), frameon=False)\n", - "plt.subplot(1, 8, 1)\n", - "\n", - "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", - "plt.text(x = 6, y = 18, s = \"(64 x 64)\", fontsize = 12, color = 'black')\n", - "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", - "\n", - "\n", - "for i in np.arange(test_data_size):\n", - " \n", - " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", - "\n", - " # predict using the deployed model\n", - " r = json.loads(aci_service.run(input_data))\n", - " \n", - " if \"error\" in r:\n", - " print(r['error'])\n", - " break\n", - " \n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " \n", - " ground_truth = test_outputs[i]\n", - " \n", - " # compare actual value vs. the predicted values:\n", - " plt.subplot(1, 8, i+2)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - "\n", - " # use different color for misclassified sample\n", - " font_color = 'red' if ground_truth != result else 'black'\n", - " clr_map = plt.cm.Greys if ground_truth != result else plt.cm.gray\n", - "\n", - " # ground truth labels are in blue\n", - " plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n", - " \n", - " # predictions are in black if correct, red if incorrect\n", - " plt.text(x = 10, y = -45, s = result, fontsize = 18, color = font_color)\n", - " plt.text(x = 5, y = -22, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", - "\n", - " \n", - " plt.imshow(test_inputs[i].reshape(64, 64), cmap = clr_map)\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Try classifying your own images!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Preprocessing functions take your image and format it so it can be passed\n", - "# as input into our ONNX model\n", - "\n", - "import cv2\n", - "\n", - "def rgb2gray(rgb):\n", - " \"\"\"Convert the input image into grayscale\"\"\"\n", - " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", - "\n", - "def resize_img(img_to_resize):\n", - " \"\"\"Resize image to FER+ model input dimensions\"\"\"\n", - " r_img = cv2.resize(img_to_resize, dsize=(64, 64), interpolation=cv2.INTER_AREA)\n", - " r_img.resize((1, 1, 64, 64))\n", - " return r_img\n", - "\n", - "def preprocess(img_to_preprocess):\n", - " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", - " if img_to_preprocess.shape == (64, 64):\n", - " img_to_preprocess.resize((1, 1, 64, 64))\n", - " return img_to_preprocess\n", - " \n", - " grayscale = rgb2gray(img_to_preprocess)\n", - " processed_img = resize_img(grayscale)\n", - " return processed_img" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Replace the following string with your own path/test image\n", - "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", - "\n", - "# Any PNG or JPG image file should work\n", - "# Make sure to include the entire path with // instead of /\n", - "\n", - "# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/face.png\"\n", - "\n", - "your_test_image = \"\"\n", - "\n", - "import matplotlib.image as mpimg\n", - "\n", - "if your_test_image != \"\":\n", - " img = mpimg.imread(your_test_image)\n", - " plt.subplot(1,3,1)\n", - " plt.imshow(img, cmap = plt.cm.Greys)\n", - " print(\"Old Dimensions: \", img.shape)\n", - " img = preprocess(img)\n", - " print(\"New Dimensions: \", img.shape)\n", - "else:\n", - " img = None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if img is None:\n", - " print(\"Add the path for your image data.\")\n", - "else:\n", - " input_data = json.dumps({'data': img.tolist()})\n", - "\n", - " try:\n", - " r = json.loads(aci_service.run(input_data))\n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " except KeyError as e:\n", - " print(str(e))\n", - "\n", - " plt.figure(figsize = (16, 6))\n", - " plt.subplot(1,8,1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = -10, y = -40, s = \"Model prediction: \", fontsize = 14)\n", - " plt.text(x = -10, y = -25, s = \"Inference time: \", fontsize = 14)\n", - " plt.text(x = 100, y = -40, s = str(result), fontsize = 14)\n", - " plt.text(x = 100, y = -25, s = str(time_ms) + \" ms\", fontsize = 14)\n", - " plt.text(x = -10, y = -10, s = \"Model Input image: \", fontsize = 14)\n", - " plt.imshow(img.reshape((64, 64)), cmap = plt.cm.gray) \n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# remember to delete your service after you are done using it!\n", - "\n", - "# aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Congratulations!\n", - "\n", - "In this tutorial, you have:\n", - "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", - "- understood a state-of-the-art convolutional neural net image classification model (FER+ in ONNX) and deployed it in the Azure ML cloud\n", - "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", - "\n", - "Next steps:\n", - "- If you have not already, check out another interesting ONNX/AML application that lets you set up a state-of-the-art [handwritten image classification model (MNIST)](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model for handwritten digit classification in an Azure ML virtual machine.\n", - "- Keep an eye out for an updated version of this tutorial that uses ONNX Runtime GPU.\n", - "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "viswamy" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - }, - "msauthor": "vinitra.swamy" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Facial Expression Recognition (FER+) using ONNX Runtime on Azure ML\n", + "\n", + "This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", + "\n", + "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", + "\n", + "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization. We use the CPU version of ONNX Runtime in this tutorial, but will soon be releasing an additional tutorial for deploying this model using ONNX Runtime GPU.\n", + "\n", + "#### Tutorial Objectives:\n", + "\n", + "1. Describe the FER+ dataset and pretrained Convolutional Neural Net ONNX model for Emotion Recognition, stored in the ONNX model zoo.\n", + "2. Deploy and run the pretrained FER+ ONNX model on an Azure Machine Learning instance\n", + "3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "### 1. Install Azure ML SDK and create a new workspace\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", + "\n", + "### 2. Install additional packages needed for this Notebook\n", + "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", + "\n", + "```sh\n", + "(myenv) $ pip install matplotlib onnx opencv-python\n", + "```\n", + "\n", + "**Debugging tip**: Make sure that to activate your virtual environment (myenv) before you re-launch this notebook using the `jupyter notebook` comand. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", + "\n", + "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", + "\n", + "In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# urllib is a built-in Python library to download files from URLs\n", + "\n", + "# Objective: retrieve the latest version of the ONNX Emotion FER+ model files from the\n", + "# ONNX Model Zoo and save it in the same folder as this tutorial\n", + "\n", + "import urllib.request\n", + "\n", + "onnx_model_url = \"https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz\"\n", + "\n", + "urllib.request.urlretrieve(onnx_model_url, filename=\"emotion_ferplus.tar.gz\")\n", + "\n", + "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", + "# code from the command line instead of the notebook kernel\n", + "\n", + "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", + "\n", + "!tar xvzf emotion_ferplus.tar.gz" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy a VM with your ONNX model in the Cloud\n", + "\n", + "### Load Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Registering your model with Azure ML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_dir = \"emotion_ferplus\" # replace this with the location of your model files\n", + "\n", + "# leave as is if it's in the same folder as this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path = model_dir + \"/\" + \"model.onnx\",\n", + " model_name = \"onnx_emotion\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"FER+ emotion recognition CNN from ONNX Model Zoo\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optional: Displaying your registered models\n", + "\n", + "This step is not required, so feel free to skip it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, m in models.items():\n", + " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ONNX FER+ Model Methodology\n", + "\n", + "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n", + "\n", + "The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n", + "\n", + "You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n", + "\n", + "![](https://raw.githubusercontent.com/Microsoft/FERPlus/master/FER+vsFER.png)\n", + "\n", + "***Input: Photos of cropped faces from FER+ Dataset***\n", + "\n", + "***Task: Classify each facial image into its appropriate emotions in the emotion table***\n", + "\n", + "``` emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7} ```\n", + "\n", + "***Output: Emotion prediction for input image***\n", + "\n", + "\n", + "Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# for images and plots in this notebook\n", + "import matplotlib.pyplot as plt \n", + "\n", + "# display images inline\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Description\n", + "\n", + "The FER+ model from the ONNX Model Zoo is summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image from Barsoum et. al's paper [\"Training Deep Networks for Facial Expression Recognition\n", + "with Crowd-Sourced Label Distribution\"](https://arxiv.org/pdf/1608.01041.pdf), with our (64 x 64) input images and our output probabilities for each of the labels." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://raw.githubusercontent.com/vinitra/FERPlus/master/emotion_model_img.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify our Score and Environment Files" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", + "\n", + "### Write Score File\n", + "\n", + "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import onnxruntime\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import time\n", + "\n", + "def init():\n", + " global session, input_name, output_name\n", + " model = Model.get_model_path(model_name = 'onnx_emotion')\n", + " session = onnxruntime.InferenceSession(model, None)\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name \n", + " \n", + "def run(input_data):\n", + " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", + " We will call the run function later from our Jupyter Notebook \n", + " so our azure service can evaluate our model input in the cloud. '''\n", + "\n", + " try:\n", + " # load in our data, convert to readable format\n", + " data = np.array(json.loads(input_data)['data']).astype('float32')\n", + " \n", + " start = time.time()\n", + " r = session.run([output_name], {input_name : data})\n", + " end = time.time()\n", + " \n", + " result = emotion_map(postprocess(r[0]))\n", + " \n", + " result_dict = {\"result\": result,\n", + " \"time_in_sec\": [end - start]}\n", + " except Exception as e:\n", + " result_dict = {\"error\": str(e)}\n", + " \n", + " return json.dumps(result_dict)\n", + "\n", + "def emotion_map(classes, N=1):\n", + " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", + " top N emotional labels that fit the picture.\"\"\"\n", + " \n", + " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", + " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", + " \n", + " emotion_keys = list(emotion_table.keys())\n", + " emotions = []\n", + " for i in range(N):\n", + " emotions.append(emotion_keys[classes[i]])\n", + " return emotions\n", + "\n", + "def softmax(x):\n", + " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", + " x = x.reshape(-1)\n", + " e_x = np.exp(x - np.max(x))\n", + " return e_x / e_x.sum(axis=0)\n", + "\n", + "def postprocess(scores):\n", + " \"\"\"This function takes the scores generated by the network and \n", + " returns the class IDs in decreasing order of probability.\"\"\"\n", + " prob = softmax(scores)\n", + " prob = np.squeeze(prob)\n", + " classes = np.argsort(prob)[::-1]\n", + " return classes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write Environment File" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the Container Image\n", + "\n", + "This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " docker_file = \"Dockerfile\",\n", + " description = \"Emotion ONNX Runtime container\",\n", + " tags = {\"demo\": \"onnx\"})\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnximage\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", + "\n", + "### Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'ONNX for emotion recognition model')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'onnx-demo-emotion'\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + "\n", + " # If your deployment fails, make sure to delete your aci_service before trying again!\n", + " # aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Success!\n", + "\n", + "If you've made it this far, you've deployed a working VM with a facial emotion recognition model running in the cloud using Azure ML. Congratulations!\n", + "\n", + "Let's see how well our model deals with our test images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Testing and Evaluation\n", + "\n", + "### Useful Helper Functions\n", + "\n", + "We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def emotion_map(classes, N=1):\n", + " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", + " top N emotional labels that fit the picture.\"\"\"\n", + " \n", + " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", + " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", + " \n", + " emotion_keys = list(emotion_table.keys())\n", + " emotions = []\n", + " for c in range(N):\n", + " emotions.append(emotion_keys[classes[c]])\n", + " return emotions\n", + "\n", + "def softmax(x):\n", + " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", + " x = x.reshape(-1)\n", + " e_x = np.exp(x - np.max(x))\n", + " return e_x / e_x.sum(axis=0)\n", + "\n", + "def postprocess(scores):\n", + " \"\"\"This function takes the scores generated by the network and \n", + " returns the class IDs in decreasing order of probability.\"\"\"\n", + " prob = softmax(scores)\n", + " prob = np.squeeze(prob)\n", + " classes = np.argsort(prob)[::-1]\n", + " return classes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Test Data\n", + "\n", + "These are already in your directory from your ONNX model download (from the model zoo).\n", + "\n", + "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# to manipulate our arrays\n", + "import numpy as np \n", + "\n", + "# read in test data protobuf files included with the model\n", + "import onnx\n", + "from onnx import numpy_helper\n", + "\n", + "# to use parsers to read in our model/data\n", + "import json\n", + "import os\n", + "\n", + "test_inputs = []\n", + "test_outputs = []\n", + "\n", + "# read in 3 testing images from .pb files\n", + "test_data_size = 3\n", + "\n", + "for num in np.arange(test_data_size):\n", + " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(num), 'input_0.pb')\n", + " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(num), 'output_0.pb')\n", + " \n", + " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", + " tensor = onnx.TensorProto()\n", + " with open(input_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " input_data = numpy_helper.to_array(tensor)\n", + " test_inputs.append(input_data)\n", + " \n", + " with open(output_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " output_data = numpy_helper.to_array(tensor)\n", + " output_processed = emotion_map(postprocess(output_data[0]))[0]\n", + " test_outputs.append(output_processed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### Show some sample images\n", + "We use `matplotlib` to plot 3 test images from the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "plt.figure(figsize = (20, 20))\n", + "for test_image in np.arange(3):\n", + " test_inputs[test_image].reshape(1, 64, 64)\n", + " plt.subplot(1, 8, test_image+1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n", + " plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.gray)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run evaluation / prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6), frameon=False)\n", + "plt.subplot(1, 8, 1)\n", + "\n", + "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", + "plt.text(x = 6, y = 18, s = \"(64 x 64)\", fontsize = 12, color = 'black')\n", + "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", + "\n", + "\n", + "for i in np.arange(test_data_size):\n", + " \n", + " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", + "\n", + " # predict using the deployed model\n", + " r = json.loads(aci_service.run(input_data))\n", + " \n", + " if \"error\" in r:\n", + " print(r['error'])\n", + " break\n", + " \n", + " result = r['result'][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " \n", + " ground_truth = test_outputs[i]\n", + " \n", + " # compare actual value vs. the predicted values:\n", + " plt.subplot(1, 8, i+2)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + "\n", + " # use different color for misclassified sample\n", + " font_color = 'red' if ground_truth != result else 'black'\n", + " clr_map = plt.cm.Greys if ground_truth != result else plt.cm.gray\n", + "\n", + " # ground truth labels are in blue\n", + " plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n", + " \n", + " # predictions are in black if correct, red if incorrect\n", + " plt.text(x = 10, y = -45, s = result, fontsize = 18, color = font_color)\n", + " plt.text(x = 5, y = -22, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", + "\n", + " \n", + " plt.imshow(test_inputs[i].reshape(64, 64), cmap = clr_map)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Try classifying your own images!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Preprocessing functions take your image and format it so it can be passed\n", + "# as input into our ONNX model\n", + "\n", + "import cv2\n", + "\n", + "def rgb2gray(rgb):\n", + " \"\"\"Convert the input image into grayscale\"\"\"\n", + " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", + "\n", + "def resize_img(img_to_resize):\n", + " \"\"\"Resize image to FER+ model input dimensions\"\"\"\n", + " r_img = cv2.resize(img_to_resize, dsize=(64, 64), interpolation=cv2.INTER_AREA)\n", + " r_img.resize((1, 1, 64, 64))\n", + " return r_img\n", + "\n", + "def preprocess(img_to_preprocess):\n", + " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", + " if img_to_preprocess.shape == (64, 64):\n", + " img_to_preprocess.resize((1, 1, 64, 64))\n", + " return img_to_preprocess\n", + " \n", + " grayscale = rgb2gray(img_to_preprocess)\n", + " processed_img = resize_img(grayscale)\n", + " return processed_img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Replace the following string with your own path/test image\n", + "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", + "\n", + "# Any PNG or JPG image file should work\n", + "# Make sure to include the entire path with // instead of /\n", + "\n", + "# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/face.png\"\n", + "\n", + "your_test_image = \"\"\n", + "\n", + "import matplotlib.image as mpimg\n", + "\n", + "if your_test_image != \"\":\n", + " img = mpimg.imread(your_test_image)\n", + " plt.subplot(1,3,1)\n", + " plt.imshow(img, cmap = plt.cm.Greys)\n", + " print(\"Old Dimensions: \", img.shape)\n", + " img = preprocess(img)\n", + " print(\"New Dimensions: \", img.shape)\n", + "else:\n", + " img = None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if img is None:\n", + " print(\"Add the path for your image data.\")\n", + "else:\n", + " input_data = json.dumps({'data': img.tolist()})\n", + "\n", + " try:\n", + " r = json.loads(aci_service.run(input_data))\n", + " result = r['result'][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " except KeyError as e:\n", + " print(str(e))\n", + "\n", + " plt.figure(figsize = (16, 6))\n", + " plt.subplot(1,8,1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = -10, y = -40, s = \"Model prediction: \", fontsize = 14)\n", + " plt.text(x = -10, y = -25, s = \"Inference time: \", fontsize = 14)\n", + " plt.text(x = 100, y = -40, s = str(result), fontsize = 14)\n", + " plt.text(x = 100, y = -25, s = str(time_ms) + \" ms\", fontsize = 14)\n", + " plt.text(x = -10, y = -10, s = \"Model Input image: \", fontsize = 14)\n", + " plt.imshow(img.reshape((64, 64)), cmap = plt.cm.gray) \n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# remember to delete your service after you are done using it!\n", + "\n", + "# aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Congratulations!\n", + "\n", + "In this tutorial, you have:\n", + "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", + "- understood a state-of-the-art convolutional neural net image classification model (FER+ in ONNX) and deployed it in the Azure ML cloud\n", + "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", + "\n", + "Next steps:\n", + "- If you have not already, check out another interesting ONNX/AML application that lets you set up a state-of-the-art [handwritten image classification model (MNIST)](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model for handwritten digit classification in an Azure ML virtual machine.\n", + "- Keep an eye out for an updated version of this tutorial that uses ONNX Runtime GPU.\n", + "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "viswamy" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "msauthor": "vinitra.swamy" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb b/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb index 43c22a09..a8163fed 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb @@ -1,820 +1,820 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Handwritten Digit Classification (MNIST) using ONNX Runtime on Azure ML\n", - "\n", - "This example shows how to deploy an image classification neural network using the Modified National Institute of Standards and Technology ([MNIST](http://yann.lecun.com/exdb/mnist/)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing number from 0 to 9. This tutorial will show you how to deploy a MNIST model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", - "\n", - "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", - "\n", - "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization.\n", - "\n", - "#### Tutorial Objectives:\n", - "\n", - "- Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n", - "- Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n", - "- Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "### 1. Install Azure ML SDK and create a new workspace\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", - "\n", - "### 2. Install additional packages needed for this tutorial notebook\n", - "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n", - "\n", - "```sh\n", - "(myenv) $ pip install matplotlib onnx opencv-python\n", - "```\n", - "\n", - "**Debugging tip**: Make sure that you run the \"jupyter notebook\" command to launch this notebook after activating your virtual environment. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", - "\n", - "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", - "\n", - "In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# urllib is a built-in Python library to download files from URLs\n", - "\n", - "# Objective: retrieve the latest version of the ONNX MNIST model files from the\n", - "# ONNX Model Zoo and save it in the same folder as this tutorial\n", - "\n", - "import urllib.request\n", - "\n", - "onnx_model_url = \"https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz\"\n", - "\n", - "urllib.request.urlretrieve(onnx_model_url, filename=\"mnist.tar.gz\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", - "# code from the command line instead of the notebook kernel\n", - "\n", - "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", - "\n", - "!tar xvzf mnist.tar.gz" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy a VM with your ONNX model in the Cloud\n", - "\n", - "### Load Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Registering your model with Azure ML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_dir = \"mnist\" # replace this with the location of your model files\n", - "\n", - "# leave as is if it's in the same folder as this notebook" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(workspace = ws,\n", - " model_path = model_dir + \"/\" + \"model.onnx\",\n", - " model_name = \"mnist_1\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"MNIST image classification CNN from ONNX Model Zoo\",)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Optional: Displaying your registered models\n", - "\n", - "This step is not required, so feel free to skip it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, m in models.items():\n", - " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "### ONNX MNIST Model Methodology\n", - "\n", - "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n", - "\n", - "***Input: Handwritten Images from MNIST Dataset***\n", - "\n", - "***Task: Classify each MNIST image into an appropriate digit***\n", - "\n", - "***Output: Digit prediction for input image***\n", - "\n", - "Run the cell below to look at some of the sample images from the MNIST dataset that we used to train this ONNX model. Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# for images and plots in this notebook\n", - "import matplotlib.pyplot as plt \n", - "from IPython.display import Image\n", - "\n", - "# display images inline\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Image(url=\"http://3.bp.blogspot.com/_UpN7DfJA0j4/TJtUBWPk0SI/AAAAAAAAABY/oWPMtmqJn3k/s1600/mnist_originals.png\", width=200, height=200)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify our Score and Environment Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", - "\n", - "### Write Score File\n", - "\n", - "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import onnxruntime\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import time\n", - "\n", - "\n", - "def init():\n", - " global session, input_name, output_name\n", - " model = Model.get_model_path(model_name = 'mnist_1')\n", - " session = onnxruntime.InferenceSession(model, None)\n", - " input_name = session.get_inputs()[0].name\n", - " output_name = session.get_outputs()[0].name \n", - " \n", - "\n", - "def preprocess(input_data_json):\n", - " # convert the JSON data into the tensor input\n", - " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", - "\n", - "def postprocess(result):\n", - " # We use argmax to pick the highest confidence label\n", - " return int(np.argmax(np.array(result).squeeze(), axis=0))\n", - " \n", - "def run(input_data):\n", - "\n", - " try:\n", - " # load in our data, convert to readable format\n", - " data = preprocess(input_data)\n", - " \n", - " # start timer\n", - " start = time.time()\n", - " \n", - " r = session.run([output_name], {input_name: data})\n", - " \n", - " #end timer\n", - " end = time.time()\n", - " \n", - " result = postprocess(r)\n", - " result_dict = {\"result\": result,\n", - " \"time_in_sec\": end - start}\n", - " except Exception as e:\n", - " result_dict = {\"error\": str(e)}\n", - " \n", - " return result_dict\n", - "\n", - "def choose_class(result_prob):\n", - " \"\"\"We use argmax to determine the right label to choose from our output\"\"\"\n", - " return int(np.argmax(result_prob, axis=0))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write Environment File\n", - "\n", - "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the Container Image\n", - "This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " docker_file = \"Dockerfile\",\n", - " description = \"MNIST ONNX Runtime container\",\n", - " tags = {\"demo\": \"onnx\"}) \n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnximage\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'ONNX for mnist model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'onnx-demo-mnist'\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - "\n", - " # If your deployment fails, make sure to delete your aci_service or rename your service before trying again!\n", - " # aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Success!\n", - "\n", - "If you've made it this far, you've deployed a working VM with a handwritten digit classifier running in the cloud using Azure ML. Congratulations!\n", - "\n", - "You can get the URL for the webservice with the code below. Let's now see how well our model deals with our test images." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(aci_service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Testing and Evaluation\n", - "\n", - "### Load Test Data\n", - "\n", - "These are already in your directory from your ONNX model download (from the model zoo).\n", - "\n", - "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to manipulate our arrays\n", - "import numpy as np \n", - "\n", - "# read in test data protobuf files included with the model\n", - "import onnx\n", - "from onnx import numpy_helper\n", - "\n", - "# to use parsers to read in our model/data\n", - "import json\n", - "import os\n", - "\n", - "test_inputs = []\n", - "test_outputs = []\n", - "\n", - "# read in 3 testing images from .pb files\n", - "test_data_size = 3\n", - "\n", - "for i in np.arange(test_data_size):\n", - " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", - " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", - " \n", - " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", - " tensor = onnx.TensorProto()\n", - " with open(input_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " input_data = numpy_helper.to_array(tensor)\n", - " test_inputs.append(input_data)\n", - " \n", - " with open(output_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " output_data = numpy_helper.to_array(tensor)\n", - " test_outputs.append(output_data)\n", - " \n", - "if len(test_inputs) == test_data_size:\n", - " print('Test data loaded successfully.')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "### Show some sample images\n", - "We use `matplotlib` to plot 3 test images from the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6))\n", - "for test_image in np.arange(3):\n", - " plt.subplot(1, 15, test_image+1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.imshow(test_inputs[test_image].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run evaluation / prediction" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6), frameon=False)\n", - "plt.subplot(1, 8, 1)\n", - "\n", - "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", - "plt.text(x = 6, y = 18, s = \"(28 x 28)\", fontsize = 12, color = 'black')\n", - "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", - "\n", - "\n", - "for i in np.arange(test_data_size):\n", - " \n", - " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", - " \n", - " # predict using the deployed model\n", - " r = aci_service.run(input_data)\n", - " \n", - " if \"error\" in r:\n", - " print(r['error'])\n", - " break\n", - " \n", - " result = r['result']\n", - " time_ms = np.round(r['time_in_sec'] * 1000, 2)\n", - " \n", - " ground_truth = int(np.argmax(test_outputs[i]))\n", - " \n", - " # compare actual value vs. the predicted values:\n", - " plt.subplot(1, 8, i+2)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - "\n", - " # use different color for misclassified sample\n", - " font_color = 'red' if ground_truth != result else 'black'\n", - " clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n", - "\n", - " # ground truth labels are in blue\n", - " plt.text(x = 10, y = -30, s = ground_truth, fontsize = 18, color = 'blue')\n", - " \n", - " # predictions are in black if correct, red if incorrect\n", - " plt.text(x = 10, y = -20, s = result, fontsize = 18, color = font_color)\n", - " plt.text(x = 5, y = -10, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", - "\n", - " \n", - " plt.imshow(test_inputs[i].reshape(28, 28), cmap = clr_map)\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Try classifying your own images!\n", - "\n", - "Create your own handwritten image and pass it into the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Preprocessing functions take your image and format it so it can be passed\n", - "# as input into our ONNX model\n", - "\n", - "import cv2\n", - "\n", - "def rgb2gray(rgb):\n", - " \"\"\"Convert the input image into grayscale\"\"\"\n", - " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", - "\n", - "def resize_img(img_to_resize):\n", - " \"\"\"Resize image to MNIST model input dimensions\"\"\"\n", - " r_img = cv2.resize(img_to_resize, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n", - " r_img.resize((1, 1, 28, 28))\n", - " return r_img\n", - "\n", - "def preprocess(img_to_preprocess):\n", - " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", - " if img_to_preprocess.shape == (28, 28):\n", - " img_to_preprocess.resize((1, 1, 28, 28))\n", - " return img_to_preprocess\n", - " \n", - " grayscale = rgb2gray(img_to_preprocess)\n", - " processed_img = resize_img(grayscale)\n", - " return processed_img" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Replace this string with your own path/test image\n", - "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", - "\n", - "# Any PNG or JPG image file should work\n", - "\n", - "your_test_image = \"\"\n", - "\n", - "# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/handwritten_digit.png\"\n", - "\n", - "import matplotlib.image as mpimg\n", - "\n", - "if your_test_image != \"\":\n", - " img = mpimg.imread(your_test_image)\n", - " plt.subplot(1,3,1)\n", - " plt.imshow(img, cmap = plt.cm.Greys)\n", - " print(\"Old Dimensions: \", img.shape)\n", - " img = preprocess(img)\n", - " print(\"New Dimensions: \", img.shape)\n", - "else:\n", - " img = None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if img is None:\n", - " print(\"Add the path for your image data.\")\n", - "else:\n", - " input_data = json.dumps({'data': img.tolist()})\n", - "\n", - " try:\n", - " r = aci_service.run(input_data)\n", - " result = r['result']\n", - " time_ms = np.round(r['time_in_sec'] * 1000, 2)\n", - " except KeyError as e:\n", - " print(str(e))\n", - "\n", - " plt.figure(figsize = (16, 6))\n", - " plt.subplot(1, 15,1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = -100, y = -20, s = \"Model prediction: \", fontsize = 14)\n", - " plt.text(x = -100, y = -10, s = \"Inference time: \", fontsize = 14)\n", - " plt.text(x = 0, y = -20, s = str(result), fontsize = 14)\n", - " plt.text(x = 0, y = -10, s = str(time_ms) + \" ms\", fontsize = 14)\n", - " plt.text(x = -100, y = 14, s = \"Input image: \", fontsize = 14)\n", - " plt.imshow(img.reshape(28, 28), cmap = plt.cm.gray) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional: How does our ONNX MNIST model work? \n", - "#### A brief explanation of Convolutional Neural Networks\n", - "\n", - "A [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN, or ConvNet) is a type of [feed-forward](https://en.wikipedia.org/wiki/Feedforward_neural_network) artificial neural network made up of neurons that have learnable weights and biases. The CNNs take advantage of the spatial nature of the data. In nature, we perceive different objects by their shapes, size and colors. For example, objects in a natural scene are typically edges, corners/vertices (defined by two of more edges), color patches etc. These primitives are often identified using different detectors (e.g., edge detection, color detector) or combination of detectors interacting to facilitate image interpretation (object classification, region of interest detection, scene description etc.) in real world vision related tasks. These detectors are also known as filters. Convolution is a mathematical operator that takes an image and a filter as input and produces a filtered output (representing say edges, corners, or colors in the input image). \n", - "\n", - "Historically, these filters are a set of weights that were often hand crafted or modeled with mathematical functions (e.g., [Gaussian](https://en.wikipedia.org/wiki/Gaussian_filter) / [Laplacian](http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm) / [Canny](https://en.wikipedia.org/wiki/Canny_edge_detector) filter). The filter outputs are mapped through non-linear activation functions mimicking human brain cells called [neurons](https://en.wikipedia.org/wiki/Neuron). Popular deep CNNs or ConvNets (such as [AlexNet](https://en.wikipedia.org/wiki/AlexNet), [VGG](https://arxiv.org/abs/1409.1556), [Inception](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf), [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf)) that are used for various [computer vision](https://en.wikipedia.org/wiki/Computer_vision) tasks have many of these architectural primitives (inspired from biology). \n", - "\n", - "### Convolution Layer\n", - "\n", - "A convolution layer is a set of filters. Each filter is defined by a weight (**W**) matrix, and bias ($b$).\n", - "\n", - "![](https://www.cntk.ai/jup/cntk103d_filterset_v2.png)\n", - "\n", - "These filters are scanned across the image performing the dot product between the weights and corresponding input value ($x$). The bias value is added to the output of the dot product and the resulting sum is optionally mapped through an activation function. This process is illustrated in the following animation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Image(url=\"https://www.cntk.ai/jup/cntk103d_conv2d_final.gif\", width= 200)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Model Description\n", - "\n", - "The MNIST model from the ONNX Model Zoo uses maxpooling to update the weights in its convolutions, summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image, with our input images and our output probabilities of each of our 10 labels. If you're interested in exploring the logic behind creating a Deep Learning model further, please look at the [training tutorial for our ONNX MNIST Convolutional Neural Network](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb). " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Max-Pooling for Convolutional Neural Nets\n", - "\n", - "![](http://www.cntk.ai/jup/c103d_max_pooling.gif)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Pre-Trained Model Architecture\n", - "\n", - "![](http://www.cntk.ai/jup/conv103d_mnist-conv-mp.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# remember to delete your service after you are done using it!\n", - "\n", - "# aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Congratulations!\n", - "\n", - "In this tutorial, you have:\n", - "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", - "- understood a state-of-the-art convolutional neural net image classification model (MNIST in ONNX) and deployed it in Azure ML cloud\n", - "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", - "\n", - "Next steps:\n", - "- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine.\n", - "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "viswamy" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - }, - "msauthor": "vinitra.swamy" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Handwritten Digit Classification (MNIST) using ONNX Runtime on Azure ML\n", + "\n", + "This example shows how to deploy an image classification neural network using the Modified National Institute of Standards and Technology ([MNIST](http://yann.lecun.com/exdb/mnist/)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing number from 0 to 9. This tutorial will show you how to deploy a MNIST model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", + "\n", + "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", + "\n", + "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization.\n", + "\n", + "#### Tutorial Objectives:\n", + "\n", + "- Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n", + "- Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n", + "- Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "### 1. Install Azure ML SDK and create a new workspace\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", + "\n", + "### 2. Install additional packages needed for this tutorial notebook\n", + "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n", + "\n", + "```sh\n", + "(myenv) $ pip install matplotlib onnx opencv-python\n", + "```\n", + "\n", + "**Debugging tip**: Make sure that you run the \"jupyter notebook\" command to launch this notebook after activating your virtual environment. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", + "\n", + "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", + "\n", + "In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# urllib is a built-in Python library to download files from URLs\n", + "\n", + "# Objective: retrieve the latest version of the ONNX MNIST model files from the\n", + "# ONNX Model Zoo and save it in the same folder as this tutorial\n", + "\n", + "import urllib.request\n", + "\n", + "onnx_model_url = \"https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz\"\n", + "\n", + "urllib.request.urlretrieve(onnx_model_url, filename=\"mnist.tar.gz\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", + "# code from the command line instead of the notebook kernel\n", + "\n", + "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", + "\n", + "!tar xvzf mnist.tar.gz" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy a VM with your ONNX model in the Cloud\n", + "\n", + "### Load Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Registering your model with Azure ML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_dir = \"mnist\" # replace this with the location of your model files\n", + "\n", + "# leave as is if it's in the same folder as this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(workspace = ws,\n", + " model_path = model_dir + \"/\" + \"model.onnx\",\n", + " model_name = \"mnist_1\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"MNIST image classification CNN from ONNX Model Zoo\",)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optional: Displaying your registered models\n", + "\n", + "This step is not required, so feel free to skip it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, m in models.items():\n", + " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### ONNX MNIST Model Methodology\n", + "\n", + "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n", + "\n", + "***Input: Handwritten Images from MNIST Dataset***\n", + "\n", + "***Task: Classify each MNIST image into an appropriate digit***\n", + "\n", + "***Output: Digit prediction for input image***\n", + "\n", + "Run the cell below to look at some of the sample images from the MNIST dataset that we used to train this ONNX model. Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# for images and plots in this notebook\n", + "import matplotlib.pyplot as plt \n", + "from IPython.display import Image\n", + "\n", + "# display images inline\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image(url=\"http://3.bp.blogspot.com/_UpN7DfJA0j4/TJtUBWPk0SI/AAAAAAAAABY/oWPMtmqJn3k/s1600/mnist_originals.png\", width=200, height=200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify our Score and Environment Files" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", + "\n", + "### Write Score File\n", + "\n", + "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import onnxruntime\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import time\n", + "\n", + "\n", + "def init():\n", + " global session, input_name, output_name\n", + " model = Model.get_model_path(model_name = 'mnist_1')\n", + " session = onnxruntime.InferenceSession(model, None)\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name \n", + " \n", + "\n", + "def preprocess(input_data_json):\n", + " # convert the JSON data into the tensor input\n", + " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", + "\n", + "def postprocess(result):\n", + " # We use argmax to pick the highest confidence label\n", + " return int(np.argmax(np.array(result).squeeze(), axis=0))\n", + " \n", + "def run(input_data):\n", + "\n", + " try:\n", + " # load in our data, convert to readable format\n", + " data = preprocess(input_data)\n", + " \n", + " # start timer\n", + " start = time.time()\n", + " \n", + " r = session.run([output_name], {input_name: data})\n", + " \n", + " #end timer\n", + " end = time.time()\n", + " \n", + " result = postprocess(r)\n", + " result_dict = {\"result\": result,\n", + " \"time_in_sec\": end - start}\n", + " except Exception as e:\n", + " result_dict = {\"error\": str(e)}\n", + " \n", + " return result_dict\n", + "\n", + "def choose_class(result_prob):\n", + " \"\"\"We use argmax to determine the right label to choose from our output\"\"\"\n", + " return int(np.argmax(result_prob, axis=0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write Environment File\n", + "\n", + "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the Container Image\n", + "This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " docker_file = \"Dockerfile\",\n", + " description = \"MNIST ONNX Runtime container\",\n", + " tags = {\"demo\": \"onnx\"}) \n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnximage\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", + "\n", + "### Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'ONNX for mnist model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'onnx-demo-mnist'\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + "\n", + " # If your deployment fails, make sure to delete your aci_service or rename your service before trying again!\n", + " # aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Success!\n", + "\n", + "If you've made it this far, you've deployed a working VM with a handwritten digit classifier running in the cloud using Azure ML. Congratulations!\n", + "\n", + "You can get the URL for the webservice with the code below. Let's now see how well our model deals with our test images." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(aci_service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Testing and Evaluation\n", + "\n", + "### Load Test Data\n", + "\n", + "These are already in your directory from your ONNX model download (from the model zoo).\n", + "\n", + "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# to manipulate our arrays\n", + "import numpy as np \n", + "\n", + "# read in test data protobuf files included with the model\n", + "import onnx\n", + "from onnx import numpy_helper\n", + "\n", + "# to use parsers to read in our model/data\n", + "import json\n", + "import os\n", + "\n", + "test_inputs = []\n", + "test_outputs = []\n", + "\n", + "# read in 3 testing images from .pb files\n", + "test_data_size = 3\n", + "\n", + "for i in np.arange(test_data_size):\n", + " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", + " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", + " \n", + " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", + " tensor = onnx.TensorProto()\n", + " with open(input_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " input_data = numpy_helper.to_array(tensor)\n", + " test_inputs.append(input_data)\n", + " \n", + " with open(output_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " output_data = numpy_helper.to_array(tensor)\n", + " test_outputs.append(output_data)\n", + " \n", + "if len(test_inputs) == test_data_size:\n", + " print('Test data loaded successfully.')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### Show some sample images\n", + "We use `matplotlib` to plot 3 test images from the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6))\n", + "for test_image in np.arange(3):\n", + " plt.subplot(1, 15, test_image+1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.imshow(test_inputs[test_image].reshape(28, 28), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run evaluation / prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6), frameon=False)\n", + "plt.subplot(1, 8, 1)\n", + "\n", + "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", + "plt.text(x = 6, y = 18, s = \"(28 x 28)\", fontsize = 12, color = 'black')\n", + "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", + "\n", + "\n", + "for i in np.arange(test_data_size):\n", + " \n", + " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", + " \n", + " # predict using the deployed model\n", + " r = aci_service.run(input_data)\n", + " \n", + " if \"error\" in r:\n", + " print(r['error'])\n", + " break\n", + " \n", + " result = r['result']\n", + " time_ms = np.round(r['time_in_sec'] * 1000, 2)\n", + " \n", + " ground_truth = int(np.argmax(test_outputs[i]))\n", + " \n", + " # compare actual value vs. the predicted values:\n", + " plt.subplot(1, 8, i+2)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + "\n", + " # use different color for misclassified sample\n", + " font_color = 'red' if ground_truth != result else 'black'\n", + " clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n", + "\n", + " # ground truth labels are in blue\n", + " plt.text(x = 10, y = -30, s = ground_truth, fontsize = 18, color = 'blue')\n", + " \n", + " # predictions are in black if correct, red if incorrect\n", + " plt.text(x = 10, y = -20, s = result, fontsize = 18, color = font_color)\n", + " plt.text(x = 5, y = -10, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", + "\n", + " \n", + " plt.imshow(test_inputs[i].reshape(28, 28), cmap = clr_map)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Try classifying your own images!\n", + "\n", + "Create your own handwritten image and pass it into the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Preprocessing functions take your image and format it so it can be passed\n", + "# as input into our ONNX model\n", + "\n", + "import cv2\n", + "\n", + "def rgb2gray(rgb):\n", + " \"\"\"Convert the input image into grayscale\"\"\"\n", + " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", + "\n", + "def resize_img(img_to_resize):\n", + " \"\"\"Resize image to MNIST model input dimensions\"\"\"\n", + " r_img = cv2.resize(img_to_resize, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n", + " r_img.resize((1, 1, 28, 28))\n", + " return r_img\n", + "\n", + "def preprocess(img_to_preprocess):\n", + " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", + " if img_to_preprocess.shape == (28, 28):\n", + " img_to_preprocess.resize((1, 1, 28, 28))\n", + " return img_to_preprocess\n", + " \n", + " grayscale = rgb2gray(img_to_preprocess)\n", + " processed_img = resize_img(grayscale)\n", + " return processed_img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Replace this string with your own path/test image\n", + "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", + "\n", + "# Any PNG or JPG image file should work\n", + "\n", + "your_test_image = \"\"\n", + "\n", + "# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/handwritten_digit.png\"\n", + "\n", + "import matplotlib.image as mpimg\n", + "\n", + "if your_test_image != \"\":\n", + " img = mpimg.imread(your_test_image)\n", + " plt.subplot(1,3,1)\n", + " plt.imshow(img, cmap = plt.cm.Greys)\n", + " print(\"Old Dimensions: \", img.shape)\n", + " img = preprocess(img)\n", + " print(\"New Dimensions: \", img.shape)\n", + "else:\n", + " img = None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if img is None:\n", + " print(\"Add the path for your image data.\")\n", + "else:\n", + " input_data = json.dumps({'data': img.tolist()})\n", + "\n", + " try:\n", + " r = aci_service.run(input_data)\n", + " result = r['result']\n", + " time_ms = np.round(r['time_in_sec'] * 1000, 2)\n", + " except KeyError as e:\n", + " print(str(e))\n", + "\n", + " plt.figure(figsize = (16, 6))\n", + " plt.subplot(1, 15,1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = -100, y = -20, s = \"Model prediction: \", fontsize = 14)\n", + " plt.text(x = -100, y = -10, s = \"Inference time: \", fontsize = 14)\n", + " plt.text(x = 0, y = -20, s = str(result), fontsize = 14)\n", + " plt.text(x = 0, y = -10, s = str(time_ms) + \" ms\", fontsize = 14)\n", + " plt.text(x = -100, y = 14, s = \"Input image: \", fontsize = 14)\n", + " plt.imshow(img.reshape(28, 28), cmap = plt.cm.gray) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optional: How does our ONNX MNIST model work? \n", + "#### A brief explanation of Convolutional Neural Networks\n", + "\n", + "A [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN, or ConvNet) is a type of [feed-forward](https://en.wikipedia.org/wiki/Feedforward_neural_network) artificial neural network made up of neurons that have learnable weights and biases. The CNNs take advantage of the spatial nature of the data. In nature, we perceive different objects by their shapes, size and colors. For example, objects in a natural scene are typically edges, corners/vertices (defined by two of more edges), color patches etc. These primitives are often identified using different detectors (e.g., edge detection, color detector) or combination of detectors interacting to facilitate image interpretation (object classification, region of interest detection, scene description etc.) in real world vision related tasks. These detectors are also known as filters. Convolution is a mathematical operator that takes an image and a filter as input and produces a filtered output (representing say edges, corners, or colors in the input image). \n", + "\n", + "Historically, these filters are a set of weights that were often hand crafted or modeled with mathematical functions (e.g., [Gaussian](https://en.wikipedia.org/wiki/Gaussian_filter) / [Laplacian](http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm) / [Canny](https://en.wikipedia.org/wiki/Canny_edge_detector) filter). The filter outputs are mapped through non-linear activation functions mimicking human brain cells called [neurons](https://en.wikipedia.org/wiki/Neuron). Popular deep CNNs or ConvNets (such as [AlexNet](https://en.wikipedia.org/wiki/AlexNet), [VGG](https://arxiv.org/abs/1409.1556), [Inception](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf), [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf)) that are used for various [computer vision](https://en.wikipedia.org/wiki/Computer_vision) tasks have many of these architectural primitives (inspired from biology). \n", + "\n", + "### Convolution Layer\n", + "\n", + "A convolution layer is a set of filters. Each filter is defined by a weight (**W**) matrix, and bias ($b$).\n", + "\n", + "![](https://www.cntk.ai/jup/cntk103d_filterset_v2.png)\n", + "\n", + "These filters are scanned across the image performing the dot product between the weights and corresponding input value ($x$). The bias value is added to the output of the dot product and the resulting sum is optionally mapped through an activation function. This process is illustrated in the following animation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image(url=\"https://www.cntk.ai/jup/cntk103d_conv2d_final.gif\", width= 200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Description\n", + "\n", + "The MNIST model from the ONNX Model Zoo uses maxpooling to update the weights in its convolutions, summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image, with our input images and our output probabilities of each of our 10 labels. If you're interested in exploring the logic behind creating a Deep Learning model further, please look at the [training tutorial for our ONNX MNIST Convolutional Neural Network](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Max-Pooling for Convolutional Neural Nets\n", + "\n", + "![](http://www.cntk.ai/jup/c103d_max_pooling.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Pre-Trained Model Architecture\n", + "\n", + "![](http://www.cntk.ai/jup/conv103d_mnist-conv-mp.png)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# remember to delete your service after you are done using it!\n", + "\n", + "# aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Congratulations!\n", + "\n", + "In this tutorial, you have:\n", + "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", + "- understood a state-of-the-art convolutional neural net image classification model (MNIST in ONNX) and deployed it in Azure ML cloud\n", + "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", + "\n", + "Next steps:\n", + "- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine.\n", + "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "viswamy" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "msauthor": "vinitra.swamy" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb b/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb index 1213c12f..5c4164b4 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb @@ -1,427 +1,427 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# ResNet50 Image Classification using ONNX and AzureML\n", - "\n", - "This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n", - "\n", - "## What is ONNX\n", - "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", - "\n", - "## ResNet50 Details\n", - "ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "To make the best use of your time, make sure you have done the following:\n", - "\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (config.json)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Download pre-trained ONNX model from ONNX Model Zoo.\n", - "\n", - "Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and extract it in the same folder as this tutorial notebook.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import urllib.request\n", - "\n", - "onnx_model_url = \"https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz\"\n", - "urllib.request.urlretrieve(onnx_model_url, filename=\"resnet50v2.tar.gz\")\n", - "\n", - "!tar xvzf resnet50v2.tar.gz" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploying as a web service with Azure ML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load your Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register your model with Azure ML\n", - "\n", - "Now we upload the model and register it in the workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n", - " model_name = \"resnet50v2\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"ResNet50v2 from ONNX Model Zoo\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Displaying your registered models\n", - "\n", - "You can optionally list out all the models that you have registered in this workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, m in models.items():\n", - " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write scoring file\n", - "\n", - "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import time\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import numpy as np # we're going to use numpy to process input and output data\n", - "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", - "\n", - "def softmax(x):\n", - " x = x.reshape(-1)\n", - " e_x = np.exp(x - np.max(x))\n", - " return e_x / e_x.sum(axis=0)\n", - "\n", - "def init():\n", - " global session\n", - " model = Model.get_model_path(model_name = 'resnet50v2')\n", - " session = onnxruntime.InferenceSession(model, None)\n", - "\n", - "def preprocess(input_data_json):\n", - " # convert the JSON data into the tensor input\n", - " img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n", - " \n", - " #normalize\n", - " mean_vec = np.array([0.485, 0.456, 0.406])\n", - " stddev_vec = np.array([0.229, 0.224, 0.225])\n", - " norm_img_data = np.zeros(img_data.shape).astype('float32')\n", - " for i in range(img_data.shape[0]):\n", - " norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n", - "\n", - " return norm_img_data\n", - "\n", - "def postprocess(result):\n", - " return softmax(np.array(result)).tolist()\n", - "\n", - "def run(input_data_json):\n", - " try:\n", - " start = time.time()\n", - " # load in our data which is expected as NCHW 224x224 image\n", - " input_data = preprocess(input_data_json)\n", - " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", - " result = session.run([], {input_name: input_data})\n", - " end = time.time() # stop timer\n", - " return {\"result\": postprocess(result),\n", - " \"time\": end - start}\n", - " except Exception as e:\n", - " result = str(e)\n", - " return {\"error\": result}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create container image" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First we create a YAML file that specifies which dependencies we would like to see in our container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we have Azure ML create the container. This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " docker_file = \"Dockerfile\",\n", - " description = \"ONNX ResNet50 Demo\",\n", - " tags = {\"demo\": \"onnx\"}\n", - " )\n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnxresnet50v2\",\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all set! Let's get our model chugging.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'web service for ResNet50 ONNX model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from random import randint\n", - "\n", - "aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - " aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "\n", - "If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(aci_service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you are eventually done using the web service, remember to delete it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "viswamy" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ResNet50 Image Classification using ONNX and AzureML\n", + "\n", + "This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n", + "\n", + "## What is ONNX\n", + "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", + "\n", + "## ResNet50 Details\n", + "ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "To make the best use of your time, make sure you have done the following:\n", + "\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (config.json)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download pre-trained ONNX model from ONNX Model Zoo.\n", + "\n", + "Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and extract it in the same folder as this tutorial notebook.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib.request\n", + "\n", + "onnx_model_url = \"https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz\"\n", + "urllib.request.urlretrieve(onnx_model_url, filename=\"resnet50v2.tar.gz\")\n", + "\n", + "!tar xvzf resnet50v2.tar.gz" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploying as a web service with Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load your Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register your model with Azure ML\n", + "\n", + "Now we upload the model and register it in the workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n", + " model_name = \"resnet50v2\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"ResNet50v2 from ONNX Model Zoo\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Displaying your registered models\n", + "\n", + "You can optionally list out all the models that you have registered in this workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, m in models.items():\n", + " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write scoring file\n", + "\n", + "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import time\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import numpy as np # we're going to use numpy to process input and output data\n", + "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", + "\n", + "def softmax(x):\n", + " x = x.reshape(-1)\n", + " e_x = np.exp(x - np.max(x))\n", + " return e_x / e_x.sum(axis=0)\n", + "\n", + "def init():\n", + " global session\n", + " model = Model.get_model_path(model_name = 'resnet50v2')\n", + " session = onnxruntime.InferenceSession(model, None)\n", + "\n", + "def preprocess(input_data_json):\n", + " # convert the JSON data into the tensor input\n", + " img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n", + " \n", + " #normalize\n", + " mean_vec = np.array([0.485, 0.456, 0.406])\n", + " stddev_vec = np.array([0.229, 0.224, 0.225])\n", + " norm_img_data = np.zeros(img_data.shape).astype('float32')\n", + " for i in range(img_data.shape[0]):\n", + " norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n", + "\n", + " return norm_img_data\n", + "\n", + "def postprocess(result):\n", + " return softmax(np.array(result)).tolist()\n", + "\n", + "def run(input_data_json):\n", + " try:\n", + " start = time.time()\n", + " # load in our data which is expected as NCHW 224x224 image\n", + " input_data = preprocess(input_data_json)\n", + " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", + " result = session.run([], {input_name: input_data})\n", + " end = time.time() # stop timer\n", + " return {\"result\": postprocess(result),\n", + " \"time\": end - start}\n", + " except Exception as e:\n", + " result = str(e)\n", + " return {\"error\": result}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create container image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we create a YAML file that specifies which dependencies we would like to see in our container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we have Azure ML create the container. This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " docker_file = \"Dockerfile\",\n", + " description = \"ONNX ResNet50 Demo\",\n", + " tags = {\"demo\": \"onnx\"}\n", + " )\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnxresnet50v2\",\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all set! Let's get our model chugging.\n", + "\n", + "### Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'web service for ResNet50 ONNX model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from random import randint\n", + "\n", + "aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + " aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "\n", + "If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(aci_service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you are eventually done using the web service, remember to delete it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#aci_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "viswamy" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb index 9a7f2035..43c58f0a 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb @@ -1,673 +1,673 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# MNIST Handwritten Digit Classification using ONNX and AzureML\n", - "\n", - "This example shows how to train a model on the MNIST data using PyTorch, save it as an ONNX model, and deploy it as a web service using Azure Machine Learning services and the ONNX Runtime.\n", - "\n", - "## What is ONNX\n", - "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", - "\n", - "## MNIST Details\n", - "The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/mnist). " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model\n", - "\n", - "### Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=6)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - "compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# Use the 'status' property to get a detailed status for the current cluster. \n", - "print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-mnist'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script [`mnist.py`](mnist.py) into your project directory. Make sure the training script has the following code to create an ONNX file:\n", - "```python\n", - "dummy_input = torch.randn(1, 1, 28, 28, device=device)\n", - "model_path = os.path.join(output_dir, 'mnist.onnx')\n", - "torch.onnx.export(model, dummy_input, model_path)\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch1-mnist'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import PyTorch\n", - "\n", - "estimator = PyTorch(source_directory=project_folder, \n", - " script_params={'--output-dir': './outputs'},\n", - " compute_target=compute_target,\n", - " entry_script='mnist.py',\n", - " use_gpu=True)\n", - "\n", - "# upgrade to PyTorch 1.0 Preview, which has better support for ONNX\n", - "estimator.conda_dependencies.remove_conda_package('pytorch=0.4.0')\n", - "estimator.conda_dependencies.add_conda_package('pytorch-nightly')\n", - "estimator.conda_dependencies.add_channel('pytorch')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n", - "- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n", - "\n", - "To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download the model (optional)\n", - "\n", - "Once the run completes, you can choose to download the ONNX model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# list all the files from the run\n", - "run.get_file_names()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_path = os.path.join('outputs', 'mnist.onnx')\n", - "run.download_file(model_path, output_file_path=model_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register the model\n", - "You can also register the model from your run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. You can then deploy this registered model as a web service through the AML SDK." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = run.register_model(model_name='mnist', model_path=model_path)\n", - "print(model.name, model.id, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Displaying your registered models (optional)\n", - "\n", - "You can optionally list out all the models that you have registered in this workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, m in models.items():\n", - " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploying as a web service\n", - "\n", - "### Write scoring file\n", - "\n", - "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import time\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import numpy as np # we're going to use numpy to process input and output data\n", - "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", - "\n", - "def init():\n", - " global session\n", - " model = Model.get_model_path(model_name = 'mnist')\n", - " session = onnxruntime.InferenceSession(model)\n", - "\n", - "def preprocess(input_data_json):\n", - " # convert the JSON data into the tensor input\n", - " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", - "\n", - "def postprocess(result):\n", - " # We use argmax to pick the highest confidence label\n", - " return int(np.argmax(np.array(result).squeeze(), axis=0))\n", - "\n", - "def run(input_data_json):\n", - " try:\n", - " start = time.time() # start timer\n", - " input_data = preprocess(input_data_json)\n", - " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", - " result = session.run([], {input_name: input_data})\n", - " end = time.time() # stop timer\n", - " return {\"result\": postprocess(result),\n", - " \"time\": end - start}\n", - " except Exception as e:\n", - " result = str(e)\n", - " return {\"error\": result}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create container image\n", - "First we create a YAML file that specifies which dependencies we would like to see in our container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we have Azure ML create the container. This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " docker_file = \"Dockerfile\",\n", - " description = \"MNIST ONNX Demo\",\n", - " tags = {\"demo\": \"onnx\"}\n", - " )\n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnxmnistdemo\",\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all set! Let's get our model chugging.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'web service for MNIST ONNX model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from random import randint\n", - "\n", - "aci_service_name = 'onnx-demo-mnist'+str(randint(0,100))\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - " aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "\n", - "If you've made it this far, you've deployed a working web service that does handwritten digit classification using an ONNX model. You can get the URL for the webservice with the code below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(aci_service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you are eventually done using the web service, remember to delete it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "viswamy" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": { - "c899ddfc2b134ca9b89a4f278ac7c997": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.1.0", - "model_name": "LayoutModel", - "state": {} - }, - "d146cbdbd4e04710b3eebc15a66957ce": { - "model_module": "azureml_widgets", - "model_module_version": "1.0.0", - "model_name": "ShowRunDetailsModel", - "state": { - "child_runs_metrics": {}, - "compute_target_status": { - "current_node_count": 1, - "node_state_counts": { - "idleNodeCount": 1, - "leavingNodeCount": 0, - "preparingNodeCount": 0, - "runningNodeCount": 0, - "unusableNodeCount": 0 - }, - "provisioning_errors": null, - "provisioning_state": "Succeeded", - "requested_node_count": 1, - "scale_settings": { - "autoScale": { - "initialNodeCount": 0, - "maximumNodeCount": 4, - "minimumNodeCount": 0 - }, - "manual": null - }, - "vm_size": "STANDARD_NC6" - }, - "error": "", - "layout": "IPY_MODEL_c899ddfc2b134ca9b89a4f278ac7c997", - "run_id": "pytorch1-mnist_1537876563990", - "run_logs": "Uploading experiment status to history service.\nAdding run profile attachment azureml-logs/60_control_log.txt\nUploading experiment status to history service.\nAdding run profile attachment azureml-logs/80_driver_log.txt\nScript process exited with code 0\nUploading driver log...\nFinalizing run...\n\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\nProcessing...\nDone!\nTrain Epoch: 1 [0/60000 (0%)]\tLoss: 2.365850\nTrain Epoch: 1 [640/60000 (1%)]\tLoss: 2.305295\nTrain Epoch: 1 [1280/60000 (2%)]\tLoss: 2.301407\nTrain Epoch: 1 [1920/60000 (3%)]\tLoss: 2.316538\nTrain Epoch: 1 [2560/60000 (4%)]\tLoss: 2.255810\nTrain Epoch: 1 [3200/60000 (5%)]\tLoss: 2.224511\nTrain Epoch: 1 [3840/60000 (6%)]\tLoss: 2.216569\nTrain Epoch: 1 [4480/60000 (7%)]\tLoss: 2.181396\nTrain Epoch: 1 [5120/60000 (9%)]\tLoss: 2.116898\nTrain Epoch: 1 [5760/60000 (10%)]\tLoss: 2.045963\nTrain Epoch: 1 [6400/60000 (11%)]\tLoss: 1.973494\nTrain Epoch: 1 [7040/60000 (12%)]\tLoss: 1.968609\nTrain Epoch: 1 [7680/60000 (13%)]\tLoss: 1.787280\nTrain Epoch: 1 [8320/60000 (14%)]\tLoss: 1.735044\nTrain Epoch: 1 [8960/60000 (15%)]\tLoss: 1.680426\nTrain Epoch: 1 [9600/60000 (16%)]\tLoss: 1.486279\nTrain Epoch: 1 [10240/60000 (17%)]\tLoss: 1.545747\nTrain Epoch: 1 [10880/60000 (18%)]\tLoss: 1.193543\nTrain Epoch: 1 [11520/60000 (19%)]\tLoss: 1.652350\nTrain Epoch: 1 [12160/60000 (20%)]\tLoss: 0.982182\nTrain Epoch: 1 [12800/60000 (21%)]\tLoss: 1.331902\nTrain Epoch: 1 [13440/60000 (22%)]\tLoss: 1.089598\nTrain Epoch: 1 [14080/60000 (23%)]\tLoss: 0.998703\nTrain Epoch: 1 [14720/60000 (25%)]\tLoss: 0.992036\nTrain Epoch: 1 [15360/60000 (26%)]\tLoss: 0.979473\nTrain Epoch: 1 [16000/60000 (27%)]\tLoss: 1.141276\nTrain Epoch: 1 [16640/60000 (28%)]\tLoss: 0.836921\nTrain Epoch: 1 [17280/60000 (29%)]\tLoss: 0.764657\nTrain Epoch: 1 [17920/60000 (30%)]\tLoss: 0.826818\nTrain Epoch: 1 [18560/60000 (31%)]\tLoss: 0.837834\nTrain Epoch: 1 [19200/60000 (32%)]\tLoss: 0.899033\nTrain Epoch: 1 [19840/60000 (33%)]\tLoss: 0.868245\nTrain Epoch: 1 [20480/60000 (34%)]\tLoss: 0.930491\nTrain Epoch: 1 [21120/60000 (35%)]\tLoss: 0.795202\nTrain Epoch: 1 [21760/60000 (36%)]\tLoss: 0.575117\nTrain Epoch: 1 [22400/60000 (37%)]\tLoss: 0.577884\nTrain Epoch: 1 [23040/60000 (38%)]\tLoss: 0.708801\nTrain Epoch: 1 [23680/60000 (39%)]\tLoss: 0.927512\nTrain Epoch: 1 [24320/60000 (41%)]\tLoss: 0.598836\nTrain Epoch: 1 [24960/60000 (42%)]\tLoss: 0.944021\nTrain Epoch: 1 [25600/60000 (43%)]\tLoss: 0.811654\nTrain Epoch: 1 [26240/60000 (44%)]\tLoss: 0.590322\nTrain Epoch: 1 [26880/60000 (45%)]\tLoss: 0.555104\nTrain Epoch: 1 [27520/60000 (46%)]\tLoss: 0.795565\nTrain Epoch: 1 [28160/60000 (47%)]\tLoss: 0.603378\nTrain Epoch: 1 [28800/60000 (48%)]\tLoss: 0.552437\nTrain Epoch: 1 [29440/60000 (49%)]\tLoss: 0.662064\nTrain Epoch: 1 [30080/60000 (50%)]\tLoss: 0.682541\nTrain Epoch: 1 [30720/60000 (51%)]\tLoss: 0.659051\nTrain Epoch: 1 [31360/60000 (52%)]\tLoss: 0.781052\nTrain Epoch: 1 [32000/60000 (53%)]\tLoss: 0.595491\nTrain Epoch: 1 [32640/60000 (54%)]\tLoss: 0.367289\nTrain Epoch: 1 [33280/60000 (55%)]\tLoss: 0.459428\nTrain Epoch: 1 [33920/60000 (57%)]\tLoss: 0.819237\nTrain Epoch: 1 [34560/60000 (58%)]\tLoss: 0.773166\nTrain Epoch: 1 [35200/60000 (59%)]\tLoss: 0.557691\nTrain Epoch: 1 [35840/60000 (60%)]\tLoss: 0.854719\nTrain Epoch: 1 [36480/60000 (61%)]\tLoss: 0.497524\nTrain Epoch: 1 [37120/60000 (62%)]\tLoss: 0.582861\nTrain Epoch: 1 [37760/60000 (63%)]\tLoss: 0.839674\nTrain Epoch: 1 [38400/60000 (64%)]\tLoss: 0.557275\nTrain Epoch: 1 [39040/60000 (65%)]\tLoss: 0.419819\nTrain Epoch: 1 [39680/60000 (66%)]\tLoss: 0.694659\nTrain Epoch: 1 [40320/60000 (67%)]\tLoss: 0.678524\nTrain Epoch: 1 [40960/60000 (68%)]\tLoss: 0.514364\nTrain Epoch: 1 [41600/60000 (69%)]\tLoss: 0.400510\nTrain Epoch: 1 [42240/60000 (70%)]\tLoss: 0.526099\nTrain Epoch: 1 [42880/60000 (71%)]\tLoss: 0.387087\nTrain Epoch: 1 [43520/60000 (72%)]\tLoss: 0.730123\nTrain Epoch: 1 [44160/60000 (74%)]\tLoss: 0.678924\nTrain Epoch: 1 [44800/60000 (75%)]\tLoss: 0.425195\nTrain Epoch: 1 [45440/60000 (76%)]\tLoss: 0.656437\nTrain Epoch: 1 [46080/60000 (77%)]\tLoss: 0.348130\nTrain Epoch: 1 [46720/60000 (78%)]\tLoss: 0.487442\nTrain Epoch: 1 [47360/60000 (79%)]\tLoss: 0.649533\nTrain Epoch: 1 [48000/60000 (80%)]\tLoss: 0.541395\nTrain Epoch: 1 [48640/60000 (81%)]\tLoss: 0.464202\nTrain Epoch: 1 [49280/60000 (82%)]\tLoss: 0.750336\nTrain Epoch: 1 [49920/60000 (83%)]\tLoss: 0.548484\nTrain Epoch: 1 [50560/60000 (84%)]\tLoss: 0.421382\nTrain Epoch: 1 [51200/60000 (85%)]\tLoss: 0.680766\nTrain Epoch: 1 [51840/60000 (86%)]\tLoss: 0.483003\nTrain Epoch: 1 [52480/60000 (87%)]\tLoss: 0.610840\nTrain Epoch: 1 [53120/60000 (88%)]\tLoss: 0.483278\nTrain Epoch: 1 [53760/60000 (90%)]\tLoss: 0.553161\nTrain Epoch: 1 [54400/60000 (91%)]\tLoss: 0.465237\nTrain Epoch: 1 [55040/60000 (92%)]\tLoss: 0.558884\nTrain Epoch: 1 [55680/60000 (93%)]\tLoss: 0.528969\nTrain Epoch: 1 [56320/60000 (94%)]\tLoss: 0.370189\nTrain Epoch: 1 [56960/60000 (95%)]\tLoss: 0.379404\nTrain Epoch: 1 [57600/60000 (96%)]\tLoss: 0.263894\nTrain Epoch: 1 [58240/60000 (97%)]\tLoss: 0.432745\nTrain Epoch: 1 [58880/60000 (98%)]\tLoss: 0.455681\nTrain Epoch: 1 [59520/60000 (99%)]\tLoss: 0.483901\n/azureml-envs/azureml_de892a6d0f01a442356c3959dd42e13b/lib/python3.6/site-packages/torch/nn/functional.py:54: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.\n warnings.warn(warning.format(ret))\n\nTest set: Average loss: 0.2073, Accuracy: 9384/10000 (94%)\n\nTrain Epoch: 2 [0/60000 (0%)]\tLoss: 0.390797\nTrain Epoch: 2 [640/60000 (1%)]\tLoss: 0.214512\nTrain Epoch: 2 [1280/60000 (2%)]\tLoss: 0.226415\nTrain Epoch: 2 [1920/60000 (3%)]\tLoss: 0.491764\nTrain Epoch: 2 [2560/60000 (4%)]\tLoss: 0.333604\nTrain Epoch: 2 [3200/60000 (5%)]\tLoss: 0.514239\nTrain Epoch: 2 [3840/60000 (6%)]\tLoss: 0.430618\nTrain Epoch: 2 [4480/60000 (7%)]\tLoss: 0.579474\nTrain Epoch: 2 [5120/60000 (9%)]\tLoss: 0.259456\nTrain Epoch: 2 [5760/60000 (10%)]\tLoss: 0.651198\nTrain Epoch: 2 [6400/60000 (11%)]\tLoss: 0.338269\nTrain Epoch: 2 [7040/60000 (12%)]\tLoss: 0.335233\nTrain Epoch: 2 [7680/60000 (13%)]\tLoss: 0.518132\nTrain Epoch: 2 [8320/60000 (14%)]\tLoss: 0.363488\nTrain Epoch: 2 [8960/60000 (15%)]\tLoss: 0.437092\nTrain Epoch: 2 [9600/60000 (16%)]\tLoss: 0.362660\nTrain Epoch: 2 [10240/60000 (17%)]\tLoss: 0.432337\nTrain Epoch: 2 [10880/60000 (18%)]\tLoss: 0.360611\nTrain Epoch: 2 [11520/60000 (19%)]\tLoss: 0.305427\nTrain Epoch: 2 [12160/60000 (20%)]\tLoss: 0.347859\nTrain Epoch: 2 [12800/60000 (21%)]\tLoss: 0.408770\nTrain Epoch: 2 [13440/60000 (22%)]\tLoss: 0.469975\nTrain Epoch: 2 [14080/60000 (23%)]\tLoss: 0.673716\nTrain Epoch: 2 [14720/60000 (25%)]\tLoss: 0.388876\nTrain Epoch: 2 [15360/60000 (26%)]\tLoss: 0.462371\nTrain Epoch: 2 [16000/60000 (27%)]\tLoss: 0.530107\nTrain Epoch: 2 [16640/60000 (28%)]\tLoss: 0.448767\nTrain Epoch: 2 [17280/60000 (29%)]\tLoss: 0.412764\nTrain Epoch: 2 [17920/60000 (30%)]\tLoss: 0.301494\nTrain Epoch: 2 [18560/60000 (31%)]\tLoss: 0.465599\nTrain Epoch: 2 [19200/60000 (32%)]\tLoss: 0.434249\nTrain Epoch: 2 [19840/60000 (33%)]\tLoss: 0.324006\nTrain Epoch: 2 [20480/60000 (34%)]\tLoss: 0.447446\nTrain Epoch: 2 [21120/60000 (35%)]\tLoss: 0.291222\nTrain Epoch: 2 [21760/60000 (36%)]\tLoss: 0.557065\nTrain Epoch: 2 [22400/60000 (37%)]\tLoss: 0.552659\nTrain Epoch: 2 [23040/60000 (38%)]\tLoss: 0.378901\nTrain Epoch: 2 [23680/60000 (39%)]\tLoss: 0.360550\nTrain Epoch: 2 [24320/60000 (41%)]\tLoss: 0.283795\nTrain Epoch: 2 [24960/60000 (42%)]\tLoss: 0.475816\nTrain Epoch: 2 [25600/60000 (43%)]\tLoss: 0.283652\nTrain Epoch: 2 [26240/60000 (44%)]\tLoss: 0.276265\nTrain Epoch: 2 [26880/60000 (45%)]\tLoss: 0.527902\nTrain Epoch: 2 [27520/60000 (46%)]\tLoss: 0.437130\nTrain Epoch: 2 [28160/60000 (47%)]\tLoss: 0.277132\nTrain Epoch: 2 [28800/60000 (48%)]\tLoss: 0.471580\nTrain Epoch: 2 [29440/60000 (49%)]\tLoss: 0.380154\nTrain Epoch: 2 [30080/60000 (50%)]\tLoss: 0.232072\nTrain Epoch: 2 [30720/60000 (51%)]\tLoss: 0.366567\nTrain Epoch: 2 [31360/60000 (52%)]\tLoss: 0.469628\nTrain Epoch: 2 [32000/60000 (53%)]\tLoss: 0.440017\nTrain Epoch: 2 [32640/60000 (54%)]\tLoss: 0.421814\nTrain Epoch: 2 [33280/60000 (55%)]\tLoss: 0.367687\nTrain Epoch: 2 [33920/60000 (57%)]\tLoss: 0.448384\nTrain Epoch: 2 [34560/60000 (58%)]\tLoss: 0.550283\nTrain Epoch: 2 [35200/60000 (59%)]\tLoss: 0.609798\nTrain Epoch: 2 [35840/60000 (60%)]\tLoss: 0.461334\nTrain Epoch: 2 [36480/60000 (61%)]\tLoss: 0.443838\nTrain Epoch: 2 [37120/60000 (62%)]\tLoss: 0.306666\nTrain Epoch: 2 [37760/60000 (63%)]\tLoss: 0.432083\nTrain Epoch: 2 [38400/60000 (64%)]\tLoss: 0.277025\nTrain Epoch: 2 [39040/60000 (65%)]\tLoss: 0.298752\nTrain Epoch: 2 [39680/60000 (66%)]\tLoss: 0.427435\nTrain Epoch: 2 [40320/60000 (67%)]\tLoss: 0.374736\nTrain Epoch: 2 [40960/60000 (68%)]\tLoss: 0.246496\nTrain Epoch: 2 [41600/60000 (69%)]\tLoss: 0.662259\nTrain Epoch: 2 [42240/60000 (70%)]\tLoss: 0.497635\nTrain Epoch: 2 [42880/60000 (71%)]\tLoss: 0.237556\nTrain Epoch: 2 [43520/60000 (72%)]\tLoss: 0.194535\nTrain Epoch: 2 [44160/60000 (74%)]\tLoss: 0.258943\nTrain Epoch: 2 [44800/60000 (75%)]\tLoss: 0.437360\nTrain Epoch: 2 [45440/60000 (76%)]\tLoss: 0.355489\nTrain Epoch: 2 [46080/60000 (77%)]\tLoss: 0.335020\nTrain Epoch: 2 [46720/60000 (78%)]\tLoss: 0.565189\nTrain Epoch: 2 [47360/60000 (79%)]\tLoss: 0.430366\nTrain Epoch: 2 [48000/60000 (80%)]\tLoss: 0.266303\nTrain Epoch: 2 [48640/60000 (81%)]\tLoss: 0.172954\nTrain Epoch: 2 [49280/60000 (82%)]\tLoss: 0.245803\nTrain Epoch: 2 [49920/60000 (83%)]\tLoss: 0.426530\nTrain Epoch: 2 [50560/60000 (84%)]\tLoss: 0.468984\nTrain Epoch: 2 [51200/60000 (85%)]\tLoss: 0.370892\nTrain Epoch: 2 [51840/60000 (86%)]\tLoss: 0.300021\nTrain Epoch: 2 [52480/60000 (87%)]\tLoss: 0.392199\nTrain Epoch: 2 [53120/60000 (88%)]\tLoss: 0.510658\nTrain Epoch: 2 [53760/60000 (90%)]\tLoss: 0.376290\nTrain Epoch: 2 [54400/60000 (91%)]\tLoss: 0.273752\nTrain Epoch: 2 [55040/60000 (92%)]\tLoss: 0.234505\nTrain Epoch: 2 [55680/60000 (93%)]\tLoss: 0.610978\nTrain Epoch: 2 [56320/60000 (94%)]\tLoss: 0.154850\nTrain Epoch: 2 [56960/60000 (95%)]\tLoss: 0.374254\nTrain Epoch: 2 [57600/60000 (96%)]\tLoss: 0.292167\nTrain Epoch: 2 [58240/60000 (97%)]\tLoss: 0.478376\nTrain Epoch: 2 [58880/60000 (98%)]\tLoss: 0.303128\nTrain Epoch: 2 [59520/60000 (99%)]\tLoss: 0.376779\n\nTest set: Average loss: 0.1297, Accuracy: 9597/10000 (96%)\n\nTrain Epoch: 3 [0/60000 (0%)]\tLoss: 0.450588\nTrain Epoch: 3 [640/60000 (1%)]\tLoss: 0.361118\nTrain Epoch: 3 [1280/60000 (2%)]\tLoss: 0.374497\nTrain Epoch: 3 [1920/60000 (3%)]\tLoss: 0.312127\nTrain Epoch: 3 [2560/60000 (4%)]\tLoss: 0.353896\nTrain Epoch: 3 [3200/60000 (5%)]\tLoss: 0.320840\nTrain Epoch: 3 [3840/60000 (6%)]\tLoss: 0.218477\nTrain Epoch: 3 [4480/60000 (7%)]\tLoss: 0.295629\nTrain Epoch: 3 [5120/60000 (9%)]\tLoss: 0.339400\nTrain Epoch: 3 [5760/60000 (10%)]\tLoss: 0.170357\nTrain Epoch: 3 [6400/60000 (11%)]\tLoss: 0.416447\nTrain Epoch: 3 [7040/60000 (12%)]\tLoss: 0.320326\nTrain Epoch: 3 [7680/60000 (13%)]\tLoss: 0.318410\nTrain Epoch: 3 [8320/60000 (14%)]\tLoss: 0.384793\nTrain Epoch: 3 [8960/60000 (15%)]\tLoss: 0.343415\nTrain Epoch: 3 [9600/60000 (16%)]\tLoss: 0.284627\nTrain Epoch: 3 [10240/60000 (17%)]\tLoss: 0.151805\nTrain Epoch: 3 [10880/60000 (18%)]\tLoss: 0.401332\nTrain Epoch: 3 [11520/60000 (19%)]\tLoss: 0.253159\nTrain Epoch: 3 [12160/60000 (20%)]\tLoss: 0.339563\nTrain Epoch: 3 [12800/60000 (21%)]\tLoss: 0.237430\nTrain Epoch: 3 [13440/60000 (22%)]\tLoss: 0.311402\nTrain Epoch: 3 [14080/60000 (23%)]\tLoss: 0.241667\nTrain Epoch: 3 [14720/60000 (25%)]\tLoss: 0.265347\nTrain Epoch: 3 [15360/60000 (26%)]\tLoss: 0.367453\nTrain Epoch: 3 [16000/60000 (27%)]\tLoss: 0.190671\nTrain Epoch: 3 [16640/60000 (28%)]\tLoss: 0.313052\nTrain Epoch: 3 [17280/60000 (29%)]\tLoss: 0.368028\nTrain Epoch: 3 [17920/60000 (30%)]\tLoss: 0.268639\nTrain Epoch: 3 [18560/60000 (31%)]\tLoss: 0.341066\nTrain Epoch: 3 [19200/60000 (32%)]\tLoss: 0.457961\nTrain Epoch: 3 [19840/60000 (33%)]\tLoss: 0.732400\nTrain Epoch: 3 [20480/60000 (34%)]\tLoss: 0.330679\nTrain Epoch: 3 [21120/60000 (35%)]\tLoss: 0.279778\nTrain Epoch: 3 [21760/60000 (36%)]\tLoss: 0.305972\nTrain Epoch: 3 [22400/60000 (37%)]\tLoss: 0.402131\nTrain Epoch: 3 [23040/60000 (38%)]\tLoss: 0.345302\nTrain Epoch: 3 [23680/60000 (39%)]\tLoss: 0.251726\nTrain Epoch: 3 [24320/60000 (41%)]\tLoss: 0.152062\nTrain Epoch: 3 [24960/60000 (42%)]\tLoss: 0.149305\nTrain Epoch: 3 [25600/60000 (43%)]\tLoss: 0.364678\nTrain Epoch: 3 [26240/60000 (44%)]\tLoss: 0.067165\nTrain Epoch: 3 [26880/60000 (45%)]\tLoss: 0.229927\nTrain Epoch: 3 [27520/60000 (46%)]\tLoss: 0.236894\nTrain Epoch: 3 [28160/60000 (47%)]\tLoss: 0.486373\nTrain Epoch: 3 [28800/60000 (48%)]\tLoss: 0.453053\nTrain Epoch: 3 [29440/60000 (49%)]\tLoss: 0.283823\nTrain Epoch: 3 [30080/60000 (50%)]\tLoss: 0.185119\nTrain Epoch: 3 [30720/60000 (51%)]\tLoss: 0.381274\nTrain Epoch: 3 [31360/60000 (52%)]\tLoss: 0.394533\nTrain Epoch: 3 [32000/60000 (53%)]\tLoss: 0.392791\nTrain Epoch: 3 [32640/60000 (54%)]\tLoss: 0.230672\nTrain Epoch: 3 [33280/60000 (55%)]\tLoss: 0.393846\nTrain Epoch: 3 [33920/60000 (57%)]\tLoss: 0.676802\nTrain Epoch: 3 [34560/60000 (58%)]\tLoss: 0.160434\nTrain Epoch: 3 [35200/60000 (59%)]\tLoss: 0.211318\nTrain Epoch: 3 [35840/60000 (60%)]\tLoss: 0.245763\nTrain Epoch: 3 [36480/60000 (61%)]\tLoss: 0.198454\nTrain Epoch: 3 [37120/60000 (62%)]\tLoss: 0.243536\nTrain Epoch: 3 [37760/60000 (63%)]\tLoss: 0.151804\nTrain Epoch: 3 [38400/60000 (64%)]\tLoss: 0.176093\nTrain Epoch: 3 [39040/60000 (65%)]\tLoss: 0.237228\nTrain Epoch: 3 [39680/60000 (66%)]\tLoss: 0.146441\nTrain Epoch: 3 [40320/60000 (67%)]\tLoss: 0.345162\nTrain Epoch: 3 [40960/60000 (68%)]\tLoss: 0.400378\nTrain Epoch: 3 [41600/60000 (69%)]\tLoss: 0.259152\nTrain Epoch: 3 [42240/60000 (70%)]\tLoss: 0.569659\nTrain Epoch: 3 [42880/60000 (71%)]\tLoss: 0.166401\nTrain Epoch: 3 [43520/60000 (72%)]\tLoss: 0.220592\nTrain Epoch: 3 [44160/60000 (74%)]\tLoss: 0.303227\nTrain Epoch: 3 [44800/60000 (75%)]\tLoss: 0.193691\nTrain Epoch: 3 [45440/60000 (76%)]\tLoss: 0.257408\nTrain Epoch: 3 [46080/60000 (77%)]\tLoss: 0.391211\nTrain Epoch: 3 [46720/60000 (78%)]\tLoss: 0.419841\nTrain Epoch: 3 [47360/60000 (79%)]\tLoss: 0.121861\nTrain Epoch: 3 [48000/60000 (80%)]\tLoss: 0.176442\nTrain Epoch: 3 [48640/60000 (81%)]\tLoss: 0.534631\nTrain Epoch: 3 [49280/60000 (82%)]\tLoss: 0.296596\nTrain Epoch: 3 [49920/60000 (83%)]\tLoss: 0.190096\nTrain Epoch: 3 [50560/60000 (84%)]\tLoss: 0.360826\nTrain Epoch: 3 [51200/60000 (85%)]\tLoss: 0.427482\nTrain Epoch: 3 [51840/60000 (86%)]\tLoss: 0.251076\nTrain Epoch: 3 [52480/60000 (87%)]\tLoss: 0.319904\nTrain Epoch: 3 [53120/60000 (88%)]\tLoss: 0.228778\nTrain Epoch: 3 [53760/60000 (90%)]\tLoss: 0.180340\nTrain Epoch: 3 [54400/60000 (91%)]\tLoss: 0.236512\nTrain Epoch: 3 [55040/60000 (92%)]\tLoss: 0.206779\nTrain Epoch: 3 [55680/60000 (93%)]\tLoss: 0.323677\nTrain Epoch: 3 [56320/60000 (94%)]\tLoss: 0.406382\nTrain Epoch: 3 [56960/60000 (95%)]\tLoss: 0.426768\nTrain Epoch: 3 [57600/60000 (96%)]\tLoss: 0.595419\nTrain Epoch: 3 [58240/60000 (97%)]\tLoss: 0.175457\nTrain Epoch: 3 [58880/60000 (98%)]\tLoss: 0.301019\nTrain Epoch: 3 [59520/60000 (99%)]\tLoss: 0.419139\n\nTest set: Average loss: 0.1049, Accuracy: 9686/10000 (97%)\n\nTrain Epoch: 4 [0/60000 (0%)]\tLoss: 0.352631\nTrain Epoch: 4 [640/60000 (1%)]\tLoss: 0.343671\nTrain Epoch: 4 [1280/60000 (2%)]\tLoss: 0.170439\nTrain Epoch: 4 [1920/60000 (3%)]\tLoss: 0.289486\nTrain Epoch: 4 [2560/60000 (4%)]\tLoss: 0.096597\nTrain Epoch: 4 [3200/60000 (5%)]\tLoss: 0.263759\nTrain Epoch: 4 [3840/60000 (6%)]\tLoss: 0.369941\nTrain Epoch: 4 [4480/60000 (7%)]\tLoss: 0.326594\nTrain Epoch: 4 [5120/60000 (9%)]\tLoss: 0.174094\nTrain Epoch: 4 [5760/60000 (10%)]\tLoss: 0.442069\nTrain Epoch: 4 [6400/60000 (11%)]\tLoss: 0.179002\nTrain Epoch: 4 [7040/60000 (12%)]\tLoss: 0.292742\nTrain Epoch: 4 [7680/60000 (13%)]\tLoss: 0.209898\nTrain Epoch: 4 [8320/60000 (14%)]\tLoss: 0.401671\nTrain Epoch: 4 [8960/60000 (15%)]\tLoss: 0.205146\nTrain Epoch: 4 [9600/60000 (16%)]\tLoss: 0.250836\nTrain Epoch: 4 [10240/60000 (17%)]\tLoss: 0.156622\nTrain Epoch: 4 [10880/60000 (18%)]\tLoss: 0.214578\nTrain Epoch: 4 [11520/60000 (19%)]\tLoss: 0.155916\nTrain Epoch: 4 [12160/60000 (20%)]\tLoss: 0.416294\nTrain Epoch: 4 [12800/60000 (21%)]\tLoss: 0.197429\nTrain Epoch: 4 [13440/60000 (22%)]\tLoss: 0.154103\nTrain Epoch: 4 [14080/60000 (23%)]\tLoss: 0.377950\nTrain Epoch: 4 [14720/60000 (25%)]\tLoss: 0.338084\nTrain Epoch: 4 [15360/60000 (26%)]\tLoss: 0.242834\nTrain Epoch: 4 [16000/60000 (27%)]\tLoss: 0.139219\nTrain Epoch: 4 [16640/60000 (28%)]\tLoss: 0.242067\nTrain Epoch: 4 [17280/60000 (29%)]\tLoss: 0.189929\nTrain Epoch: 4 [17920/60000 (30%)]\tLoss: 0.358215\nTrain Epoch: 4 [18560/60000 (31%)]\tLoss: 0.354969\nTrain Epoch: 4 [19200/60000 (32%)]\tLoss: 0.303644\nTrain Epoch: 4 [19840/60000 (33%)]\tLoss: 0.322343\nTrain Epoch: 4 [20480/60000 (34%)]\tLoss: 0.225422\nTrain Epoch: 4 [21120/60000 (35%)]\tLoss: 0.614347\nTrain Epoch: 4 [21760/60000 (36%)]\tLoss: 0.448674\nTrain Epoch: 4 [22400/60000 (37%)]\tLoss: 0.362976\nTrain Epoch: 4 [23040/60000 (38%)]\tLoss: 0.100357\nTrain Epoch: 4 [23680/60000 (39%)]\tLoss: 0.289331\nTrain Epoch: 4 [24320/60000 (41%)]\tLoss: 0.405818\nTrain Epoch: 4 [24960/60000 (42%)]\tLoss: 0.212617\nTrain Epoch: 4 [25600/60000 (43%)]\tLoss: 0.348597\nTrain Epoch: 4 [26240/60000 (44%)]\tLoss: 0.351009\nTrain Epoch: 4 [26880/60000 (45%)]\tLoss: 0.341456\nTrain Epoch: 4 [27520/60000 (46%)]\tLoss: 0.297527\nTrain Epoch: 4 [28160/60000 (47%)]\tLoss: 0.281190\nTrain Epoch: 4 [28800/60000 (48%)]\tLoss: 0.187359\nTrain Epoch: 4 [29440/60000 (49%)]\tLoss: 0.178844\nTrain Epoch: 4 [30080/60000 (50%)]\tLoss: 0.201243\nTrain Epoch: 4 [30720/60000 (51%)]\tLoss: 0.305701\nTrain Epoch: 4 [31360/60000 (52%)]\tLoss: 0.370592\nTrain Epoch: 4 [32000/60000 (53%)]\tLoss: 0.241955\nTrain Epoch: 4 [32640/60000 (54%)]\tLoss: 0.278765\nTrain Epoch: 4 [33280/60000 (55%)]\tLoss: 0.284302\nTrain Epoch: 4 [33920/60000 (57%)]\tLoss: 0.337426\nTrain Epoch: 4 [34560/60000 (58%)]\tLoss: 0.277304\nTrain Epoch: 4 [35200/60000 (59%)]\tLoss: 0.221228\nTrain Epoch: 4 [35840/60000 (60%)]\tLoss: 0.150985\nTrain Epoch: 4 [36480/60000 (61%)]\tLoss: 0.312087\nTrain Epoch: 4 [37120/60000 (62%)]\tLoss: 0.170111\nTrain Epoch: 4 [37760/60000 (63%)]\tLoss: 0.291135\nTrain Epoch: 4 [38400/60000 (64%)]\tLoss: 0.160971\nTrain Epoch: 4 [39040/60000 (65%)]\tLoss: 0.390679\nTrain Epoch: 4 [39680/60000 (66%)]\tLoss: 0.434802\nTrain Epoch: 4 [40320/60000 (67%)]\tLoss: 0.281539\nTrain Epoch: 4 [40960/60000 (68%)]\tLoss: 0.172577\nTrain Epoch: 4 [41600/60000 (69%)]\tLoss: 0.348624\nTrain Epoch: 4 [42240/60000 (70%)]\tLoss: 0.380416\nTrain Epoch: 4 [42880/60000 (71%)]\tLoss: 0.483520\nTrain Epoch: 4 [43520/60000 (72%)]\tLoss: 0.216825\nTrain Epoch: 4 [44160/60000 (74%)]\tLoss: 0.320874\nTrain Epoch: 4 [44800/60000 (75%)]\tLoss: 0.213358\nTrain Epoch: 4 [45440/60000 (76%)]\tLoss: 0.218650\nTrain Epoch: 4 [46080/60000 (77%)]\tLoss: 0.221090\nTrain Epoch: 4 [46720/60000 (78%)]\tLoss: 0.325981\nTrain Epoch: 4 [47360/60000 (79%)]\tLoss: 0.283184\nTrain Epoch: 4 [48000/60000 (80%)]\tLoss: 0.072845\nTrain Epoch: 4 [48640/60000 (81%)]\tLoss: 0.206940\nTrain Epoch: 4 [49280/60000 (82%)]\tLoss: 0.423454\nTrain Epoch: 4 [49920/60000 (83%)]\tLoss: 0.475285\nTrain Epoch: 4 [50560/60000 (84%)]\tLoss: 0.128978\nTrain Epoch: 4 [51200/60000 (85%)]\tLoss: 0.195609\nTrain Epoch: 4 [51840/60000 (86%)]\tLoss: 0.125730\nTrain Epoch: 4 [52480/60000 (87%)]\tLoss: 0.137783\nTrain Epoch: 4 [53120/60000 (88%)]\tLoss: 0.375247\nTrain Epoch: 4 [53760/60000 (90%)]\tLoss: 0.243497\nTrain Epoch: 4 [54400/60000 (91%)]\tLoss: 0.236100\nTrain Epoch: 4 [55040/60000 (92%)]\tLoss: 0.266795\nTrain Epoch: 4 [55680/60000 (93%)]\tLoss: 0.229095\nTrain Epoch: 4 [56320/60000 (94%)]\tLoss: 0.167610\nTrain Epoch: 4 [56960/60000 (95%)]\tLoss: 0.240640\nTrain Epoch: 4 [57600/60000 (96%)]\tLoss: 0.153999\nTrain Epoch: 4 [58240/60000 (97%)]\tLoss: 0.753790\nTrain Epoch: 4 [58880/60000 (98%)]\tLoss: 0.143998\nTrain Epoch: 4 [59520/60000 (99%)]\tLoss: 0.310583\n\nTest set: Average loss: 0.0843, Accuracy: 9739/10000 (97%)\n\nTrain Epoch: 5 [0/60000 (0%)]\tLoss: 0.227892\nTrain Epoch: 5 [640/60000 (1%)]\tLoss: 0.162702\nTrain Epoch: 5 [1280/60000 (2%)]\tLoss: 0.227571\nTrain Epoch: 5 [1920/60000 (3%)]\tLoss: 0.148511\nTrain Epoch: 5 [2560/60000 (4%)]\tLoss: 0.187414\nTrain Epoch: 5 [3200/60000 (5%)]\tLoss: 0.194418\nTrain Epoch: 5 [3840/60000 (6%)]\tLoss: 0.276495\nTrain Epoch: 5 [4480/60000 (7%)]\tLoss: 0.268769\nTrain Epoch: 5 [5120/60000 (9%)]\tLoss: 0.163968\nTrain Epoch: 5 [5760/60000 (10%)]\tLoss: 0.349296\nTrain Epoch: 5 [6400/60000 (11%)]\tLoss: 0.217248\nTrain Epoch: 5 [7040/60000 (12%)]\tLoss: 0.195263\nTrain Epoch: 5 [7680/60000 (13%)]\tLoss: 0.339447\nTrain Epoch: 5 [8320/60000 (14%)]\tLoss: 0.224461\nTrain Epoch: 5 [8960/60000 (15%)]\tLoss: 0.095605\nTrain Epoch: 5 [9600/60000 (16%)]\tLoss: 0.196891\nTrain Epoch: 5 [10240/60000 (17%)]\tLoss: 0.218742\nTrain Epoch: 5 [10880/60000 (18%)]\tLoss: 0.071347\nTrain Epoch: 5 [11520/60000 (19%)]\tLoss: 0.403286\nTrain Epoch: 5 [12160/60000 (20%)]\tLoss: 0.149740\nTrain Epoch: 5 [12800/60000 (21%)]\tLoss: 0.160939\nTrain Epoch: 5 [13440/60000 (22%)]\tLoss: 0.236512\nTrain Epoch: 5 [14080/60000 (23%)]\tLoss: 0.348727\nTrain Epoch: 5 [14720/60000 (25%)]\tLoss: 0.190054\nTrain Epoch: 5 [15360/60000 (26%)]\tLoss: 0.272029\nTrain Epoch: 5 [16000/60000 (27%)]\tLoss: 0.427739\nTrain Epoch: 5 [16640/60000 (28%)]\tLoss: 0.322332\nTrain Epoch: 5 [17280/60000 (29%)]\tLoss: 0.141410\nTrain Epoch: 5 [17920/60000 (30%)]\tLoss: 0.098900\nTrain Epoch: 5 [18560/60000 (31%)]\tLoss: 0.252387\nTrain Epoch: 5 [19200/60000 (32%)]\tLoss: 0.182150\nTrain Epoch: 5 [19840/60000 (33%)]\tLoss: 0.133239\nTrain Epoch: 5 [20480/60000 (34%)]\tLoss: 0.126683\nTrain Epoch: 5 [21120/60000 (35%)]\tLoss: 0.370189\nTrain Epoch: 5 [21760/60000 (36%)]\tLoss: 0.162514\nTrain Epoch: 5 [22400/60000 (37%)]\tLoss: 0.272352\nTrain Epoch: 5 [23040/60000 (38%)]\tLoss: 0.298543\nTrain Epoch: 5 [23680/60000 (39%)]\tLoss: 0.235891\nTrain Epoch: 5 [24320/60000 (41%)]\tLoss: 0.187710\nTrain Epoch: 5 [24960/60000 (42%)]\tLoss: 0.185363\nTrain Epoch: 5 [25600/60000 (43%)]\tLoss: 0.193369\nTrain Epoch: 5 [26240/60000 (44%)]\tLoss: 0.155984\nTrain Epoch: 5 [26880/60000 (45%)]\tLoss: 0.388923\nTrain Epoch: 5 [27520/60000 (46%)]\tLoss: 0.192868\nTrain Epoch: 5 [28160/60000 (47%)]\tLoss: 0.535787\nTrain Epoch: 5 [28800/60000 (48%)]\tLoss: 0.161020\nTrain Epoch: 5 [29440/60000 (49%)]\tLoss: 0.242179\nTrain Epoch: 5 [30080/60000 (50%)]\tLoss: 0.136554\nTrain Epoch: 5 [30720/60000 (51%)]\tLoss: 0.190672\nTrain Epoch: 5 [31360/60000 (52%)]\tLoss: 0.118027\nTrain Epoch: 5 [32000/60000 (53%)]\tLoss: 0.278750\nTrain Epoch: 5 [32640/60000 (54%)]\tLoss: 0.418058\nTrain Epoch: 5 [33280/60000 (55%)]\tLoss: 0.287063\nTrain Epoch: 5 [33920/60000 (57%)]\tLoss: 0.279596\nTrain Epoch: 5 [34560/60000 (58%)]\tLoss: 0.181579\nTrain Epoch: 5 [35200/60000 (59%)]\tLoss: 0.443592\nTrain Epoch: 5 [35840/60000 (60%)]\tLoss: 0.095470\nTrain Epoch: 5 [36480/60000 (61%)]\tLoss: 0.277385\nTrain Epoch: 5 [37120/60000 (62%)]\tLoss: 0.263358\nTrain Epoch: 5 [37760/60000 (63%)]\tLoss: 0.190867\nTrain Epoch: 5 [38400/60000 (64%)]\tLoss: 0.176580\nTrain Epoch: 5 [39040/60000 (65%)]\tLoss: 0.360235\nTrain Epoch: 5 [39680/60000 (66%)]\tLoss: 0.172416\nTrain Epoch: 5 [40320/60000 (67%)]\tLoss: 0.174126\nTrain Epoch: 5 [40960/60000 (68%)]\tLoss: 0.202162\nTrain Epoch: 5 [41600/60000 (69%)]\tLoss: 0.196991\nTrain Epoch: 5 [42240/60000 (70%)]\tLoss: 0.224622\nTrain Epoch: 5 [42880/60000 (71%)]\tLoss: 0.180406\nTrain Epoch: 5 [43520/60000 (72%)]\tLoss: 0.060447\nTrain Epoch: 5 [44160/60000 (74%)]\tLoss: 0.322497\nTrain Epoch: 5 [44800/60000 (75%)]\tLoss: 0.239324\nTrain Epoch: 5 [45440/60000 (76%)]\tLoss: 0.348920\nTrain Epoch: 5 [46080/60000 (77%)]\tLoss: 0.240017\nTrain Epoch: 5 [46720/60000 (78%)]\tLoss: 0.237575\nTrain Epoch: 5 [47360/60000 (79%)]\tLoss: 0.142648\nTrain Epoch: 5 [48000/60000 (80%)]\tLoss: 0.227562\nTrain Epoch: 5 [48640/60000 (81%)]\tLoss: 0.254358\nTrain Epoch: 5 [49280/60000 (82%)]\tLoss: 0.135818\nTrain Epoch: 5 [49920/60000 (83%)]\tLoss: 0.386120\nTrain Epoch: 5 [50560/60000 (84%)]\tLoss: 0.328150\nTrain Epoch: 5 [51200/60000 (85%)]\tLoss: 0.276833\nTrain Epoch: 5 [51840/60000 (86%)]\tLoss: 0.308869\nTrain Epoch: 5 [52480/60000 (87%)]\tLoss: 0.246442\nTrain Epoch: 5 [53120/60000 (88%)]\tLoss: 0.240874\nTrain Epoch: 5 [53760/60000 (90%)]\tLoss: 0.114337\nTrain Epoch: 5 [54400/60000 (91%)]\tLoss: 0.217325\nTrain Epoch: 5 [55040/60000 (92%)]\tLoss: 0.223010\nTrain Epoch: 5 [55680/60000 (93%)]\tLoss: 0.138459\nTrain Epoch: 5 [56320/60000 (94%)]\tLoss: 0.283678\nTrain Epoch: 5 [56960/60000 (95%)]\tLoss: 0.158834\nTrain Epoch: 5 [57600/60000 (96%)]\tLoss: 0.164267\nTrain Epoch: 5 [58240/60000 (97%)]\tLoss: 0.290795\nTrain Epoch: 5 [58880/60000 (98%)]\tLoss: 0.451639\nTrain Epoch: 5 [59520/60000 (99%)]\tLoss: 0.349018\n\nTest set: Average loss: 0.0797, Accuracy: 9758/10000 (98%)\n\nTrain Epoch: 6 [0/60000 (0%)]\tLoss: 0.311334\nTrain Epoch: 6 [640/60000 (1%)]\tLoss: 0.129143\nTrain Epoch: 6 [1280/60000 (2%)]\tLoss: 0.227222\nTrain Epoch: 6 [1920/60000 (3%)]\tLoss: 0.157591\nTrain Epoch: 6 [2560/60000 (4%)]\tLoss: 0.205490\nTrain Epoch: 6 [3200/60000 (5%)]\tLoss: 0.421089\nTrain Epoch: 6 [3840/60000 (6%)]\tLoss: 0.157544\nTrain Epoch: 6 [4480/60000 (7%)]\tLoss: 0.087023\nTrain Epoch: 6 [5120/60000 (9%)]\tLoss: 0.130669\nTrain Epoch: 6 [5760/60000 (10%)]\tLoss: 0.059450\nTrain Epoch: 6 [6400/60000 (11%)]\tLoss: 0.121786\nTrain Epoch: 6 [7040/60000 (12%)]\tLoss: 0.177859\nTrain Epoch: 6 [7680/60000 (13%)]\tLoss: 0.217464\nTrain Epoch: 6 [8320/60000 (14%)]\tLoss: 0.183426\nTrain Epoch: 6 [8960/60000 (15%)]\tLoss: 0.237282\nTrain Epoch: 6 [9600/60000 (16%)]\tLoss: 0.210031\nTrain Epoch: 6 [10240/60000 (17%)]\tLoss: 0.256110\nTrain Epoch: 6 [10880/60000 (18%)]\tLoss: 0.155481\nTrain Epoch: 6 [11520/60000 (19%)]\tLoss: 0.166967\nTrain Epoch: 6 [12160/60000 (20%)]\tLoss: 0.144590\nTrain Epoch: 6 [12800/60000 (21%)]\tLoss: 0.229593\nTrain Epoch: 6 [13440/60000 (22%)]\tLoss: 0.092102\nTrain Epoch: 6 [14080/60000 (23%)]\tLoss: 0.144247\nTrain Epoch: 6 [14720/60000 (25%)]\tLoss: 0.459083\nTrain Epoch: 6 [15360/60000 (26%)]\tLoss: 0.174974\nTrain Epoch: 6 [16000/60000 (27%)]\tLoss: 0.146433\nTrain Epoch: 6 [16640/60000 (28%)]\tLoss: 0.291392\nTrain Epoch: 6 [17280/60000 (29%)]\tLoss: 0.203127\nTrain Epoch: 6 [17920/60000 (30%)]\tLoss: 0.255063\nTrain Epoch: 6 [18560/60000 (31%)]\tLoss: 0.167576\nTrain Epoch: 6 [19200/60000 (32%)]\tLoss: 0.171914\nTrain Epoch: 6 [19840/60000 (33%)]\tLoss: 0.215950\nTrain Epoch: 6 [20480/60000 (34%)]\tLoss: 0.246624\nTrain Epoch: 6 [21120/60000 (35%)]\tLoss: 0.242730\nTrain Epoch: 6 [21760/60000 (36%)]\tLoss: 0.345666\nTrain Epoch: 6 [22400/60000 (37%)]\tLoss: 0.229078\nTrain Epoch: 6 [23040/60000 (38%)]\tLoss: 0.283169\nTrain Epoch: 6 [23680/60000 (39%)]\tLoss: 0.246430\nTrain Epoch: 6 [24320/60000 (41%)]\tLoss: 0.217211\nTrain Epoch: 6 [24960/60000 (42%)]\tLoss: 0.168141\nTrain Epoch: 6 [25600/60000 (43%)]\tLoss: 0.297715\nTrain Epoch: 6 [26240/60000 (44%)]\tLoss: 0.200130\nTrain Epoch: 6 [26880/60000 (45%)]\tLoss: 0.344390\nTrain Epoch: 6 [27520/60000 (46%)]\tLoss: 0.246202\nTrain Epoch: 6 [28160/60000 (47%)]\tLoss: 0.272422\nTrain Epoch: 6 [28800/60000 (48%)]\tLoss: 0.117001\nTrain Epoch: 6 [29440/60000 (49%)]\tLoss: 0.246031\nTrain Epoch: 6 [30080/60000 (50%)]\tLoss: 0.138119\nTrain Epoch: 6 [30720/60000 (51%)]\tLoss: 0.214345\nTrain Epoch: 6 [31360/60000 (52%)]\tLoss: 0.134483\nTrain Epoch: 6 [32000/60000 (53%)]\tLoss: 0.201771\nTrain Epoch: 6 [32640/60000 (54%)]\tLoss: 0.201668\nTrain Epoch: 6 [33280/60000 (55%)]\tLoss: 0.111183\nTrain Epoch: 6 [33920/60000 (57%)]\tLoss: 0.093289\nTrain Epoch: 6 [34560/60000 (58%)]\tLoss: 0.171475\nTrain Epoch: 6 [35200/60000 (59%)]\tLoss: 0.178729\nTrain Epoch: 6 [35840/60000 (60%)]\tLoss: 0.144986\nTrain Epoch: 6 [36480/60000 (61%)]\tLoss: 0.302206\nTrain Epoch: 6 [37120/60000 (62%)]\tLoss: 0.389723\nTrain Epoch: 6 [37760/60000 (63%)]\tLoss: 0.268302\nTrain Epoch: 6 [38400/60000 (64%)]\tLoss: 0.358240\nTrain Epoch: 6 [39040/60000 (65%)]\tLoss: 0.241359\nTrain Epoch: 6 [39680/60000 (66%)]\tLoss: 0.282464\nTrain Epoch: 6 [40320/60000 (67%)]\tLoss: 0.205064\nTrain Epoch: 6 [40960/60000 (68%)]\tLoss: 0.106739\nTrain Epoch: 6 [41600/60000 (69%)]\tLoss: 0.076333\nTrain Epoch: 6 [42240/60000 (70%)]\tLoss: 0.157558\nTrain Epoch: 6 [42880/60000 (71%)]\tLoss: 0.217494\nTrain Epoch: 6 [43520/60000 (72%)]\tLoss: 0.183687\nTrain Epoch: 6 [44160/60000 (74%)]\tLoss: 0.217155\nTrain Epoch: 6 [44800/60000 (75%)]\tLoss: 0.108482\nTrain Epoch: 6 [45440/60000 (76%)]\tLoss: 0.324247\nTrain Epoch: 6 [46080/60000 (77%)]\tLoss: 0.352494\nTrain Epoch: 6 [46720/60000 (78%)]\tLoss: 0.163462\nTrain Epoch: 6 [47360/60000 (79%)]\tLoss: 0.154820\nTrain Epoch: 6 [48000/60000 (80%)]\tLoss: 0.174164\nTrain Epoch: 6 [48640/60000 (81%)]\tLoss: 0.196258\nTrain Epoch: 6 [49280/60000 (82%)]\tLoss: 0.226030\nTrain Epoch: 6 [49920/60000 (83%)]\tLoss: 0.306971\nTrain Epoch: 6 [50560/60000 (84%)]\tLoss: 0.387282\nTrain Epoch: 6 [51200/60000 (85%)]\tLoss: 0.213550\nTrain Epoch: 6 [51840/60000 (86%)]\tLoss: 0.133755\nTrain Epoch: 6 [52480/60000 (87%)]\tLoss: 0.176044\nTrain Epoch: 6 [53120/60000 (88%)]\tLoss: 0.282900\nTrain Epoch: 6 [53760/60000 (90%)]\tLoss: 0.154157\nTrain Epoch: 6 [54400/60000 (91%)]\tLoss: 0.138895\nTrain Epoch: 6 [55040/60000 (92%)]\tLoss: 0.254137\nTrain Epoch: 6 [55680/60000 (93%)]\tLoss: 0.107765\nTrain Epoch: 6 [56320/60000 (94%)]\tLoss: 0.118788\nTrain Epoch: 6 [56960/60000 (95%)]\tLoss: 0.142051\nTrain Epoch: 6 [57600/60000 (96%)]\tLoss: 0.176375\nTrain Epoch: 6 [58240/60000 (97%)]\tLoss: 0.131573\nTrain Epoch: 6 [58880/60000 (98%)]\tLoss: 0.347166\nTrain Epoch: 6 [59520/60000 (99%)]\tLoss: 0.217951\n\nTest set: Average loss: 0.0690, Accuracy: 9776/10000 (98%)\n\nTrain Epoch: 7 [0/60000 (0%)]\tLoss: 0.142441\nTrain Epoch: 7 [640/60000 (1%)]\tLoss: 0.078599\nTrain Epoch: 7 [1280/60000 (2%)]\tLoss: 0.121731\nTrain Epoch: 7 [1920/60000 (3%)]\tLoss: 0.070044\nTrain Epoch: 7 [2560/60000 (4%)]\tLoss: 0.224216\nTrain Epoch: 7 [3200/60000 (5%)]\tLoss: 0.104122\nTrain Epoch: 7 [3840/60000 (6%)]\tLoss: 0.228575\nTrain Epoch: 7 [4480/60000 (7%)]\tLoss: 0.377044\nTrain Epoch: 7 [5120/60000 (9%)]\tLoss: 0.296184\nTrain Epoch: 7 [5760/60000 (10%)]\tLoss: 0.099891\nTrain Epoch: 7 [6400/60000 (11%)]\tLoss: 0.269691\nTrain Epoch: 7 [7040/60000 (12%)]\tLoss: 0.240640\nTrain Epoch: 7 [7680/60000 (13%)]\tLoss: 0.171192\nTrain Epoch: 7 [8320/60000 (14%)]\tLoss: 0.306889\nTrain Epoch: 7 [8960/60000 (15%)]\tLoss: 0.238503\nTrain Epoch: 7 [9600/60000 (16%)]\tLoss: 0.286252\nTrain Epoch: 7 [10240/60000 (17%)]\tLoss: 0.171058\nTrain Epoch: 7 [10880/60000 (18%)]\tLoss: 0.208866\nTrain Epoch: 7 [11520/60000 (19%)]\tLoss: 0.418091\nTrain Epoch: 7 [12160/60000 (20%)]\tLoss: 0.115058\nTrain Epoch: 7 [12800/60000 (21%)]\tLoss: 0.159557\nTrain Epoch: 7 [13440/60000 (22%)]\tLoss: 0.085076\nTrain Epoch: 7 [14080/60000 (23%)]\tLoss: 0.244673\nTrain Epoch: 7 [14720/60000 (25%)]\tLoss: 0.316326\nTrain Epoch: 7 [15360/60000 (26%)]\tLoss: 0.370775\nTrain Epoch: 7 [16000/60000 (27%)]\tLoss: 0.235262\nTrain Epoch: 7 [16640/60000 (28%)]\tLoss: 0.296188\nTrain Epoch: 7 [17280/60000 (29%)]\tLoss: 0.224960\nTrain Epoch: 7 [17920/60000 (30%)]\tLoss: 0.162341\nTrain Epoch: 7 [18560/60000 (31%)]\tLoss: 0.136551\nTrain Epoch: 7 [19200/60000 (32%)]\tLoss: 0.111435\nTrain Epoch: 7 [19840/60000 (33%)]\tLoss: 0.173483\nTrain Epoch: 7 [20480/60000 (34%)]\tLoss: 0.170351\nTrain Epoch: 7 [21120/60000 (35%)]\tLoss: 0.109828\nTrain Epoch: 7 [21760/60000 (36%)]\tLoss: 0.219692\nTrain Epoch: 7 [22400/60000 (37%)]\tLoss: 0.085780\nTrain Epoch: 7 [23040/60000 (38%)]\tLoss: 0.076800\nTrain Epoch: 7 [23680/60000 (39%)]\tLoss: 0.163377\nTrain Epoch: 7 [24320/60000 (41%)]\tLoss: 0.178391\nTrain Epoch: 7 [24960/60000 (42%)]\tLoss: 0.311988\nTrain Epoch: 7 [25600/60000 (43%)]\tLoss: 0.215559\nTrain Epoch: 7 [26240/60000 (44%)]\tLoss: 0.199207\nTrain Epoch: 7 [26880/60000 (45%)]\tLoss: 0.201917\nTrain Epoch: 7 [27520/60000 (46%)]\tLoss: 0.163283\nTrain Epoch: 7 [28160/60000 (47%)]\tLoss: 0.107533\nTrain Epoch: 7 [28800/60000 (48%)]\tLoss: 0.046209\nTrain Epoch: 7 [29440/60000 (49%)]\tLoss: 0.173062\nTrain Epoch: 7 [30080/60000 (50%)]\tLoss: 0.088925\nTrain Epoch: 7 [30720/60000 (51%)]\tLoss: 0.068962\nTrain Epoch: 7 [31360/60000 (52%)]\tLoss: 0.223214\nTrain Epoch: 7 [32000/60000 (53%)]\tLoss: 0.096083\nTrain Epoch: 7 [32640/60000 (54%)]\tLoss: 0.327635\nTrain Epoch: 7 [33280/60000 (55%)]\tLoss: 0.278620\nTrain Epoch: 7 [33920/60000 (57%)]\tLoss: 0.223806\nTrain Epoch: 7 [34560/60000 (58%)]\tLoss: 0.121638\nTrain Epoch: 7 [35200/60000 (59%)]\tLoss: 0.182739\nTrain Epoch: 7 [35840/60000 (60%)]\tLoss: 0.172866\nTrain Epoch: 7 [36480/60000 (61%)]\tLoss: 0.180873\nTrain Epoch: 7 [37120/60000 (62%)]\tLoss: 0.298984\nTrain Epoch: 7 [37760/60000 (63%)]\tLoss: 0.251939\nTrain Epoch: 7 [38400/60000 (64%)]\tLoss: 0.105321\nTrain Epoch: 7 [39040/60000 (65%)]\tLoss: 0.200500\nTrain Epoch: 7 [39680/60000 (66%)]\tLoss: 0.309791\nTrain Epoch: 7 [40320/60000 (67%)]\tLoss: 0.114949\nTrain Epoch: 7 [40960/60000 (68%)]\tLoss: 0.066153\nTrain Epoch: 7 [41600/60000 (69%)]\tLoss: 0.327437\nTrain Epoch: 7 [42240/60000 (70%)]\tLoss: 0.179023\nTrain Epoch: 7 [42880/60000 (71%)]\tLoss: 0.089861\nTrain Epoch: 7 [43520/60000 (72%)]\tLoss: 0.111230\nTrain Epoch: 7 [44160/60000 (74%)]\tLoss: 0.108233\nTrain Epoch: 7 [44800/60000 (75%)]\tLoss: 0.145669\nTrain Epoch: 7 [45440/60000 (76%)]\tLoss: 0.122024\nTrain Epoch: 7 [46080/60000 (77%)]\tLoss: 0.083490\nTrain Epoch: 7 [46720/60000 (78%)]\tLoss: 0.116002\nTrain Epoch: 7 [47360/60000 (79%)]\tLoss: 0.200240\nTrain Epoch: 7 [48000/60000 (80%)]\tLoss: 0.363707\nTrain Epoch: 7 [48640/60000 (81%)]\tLoss: 0.294594\nTrain Epoch: 7 [49280/60000 (82%)]\tLoss: 0.127643\nTrain Epoch: 7 [49920/60000 (83%)]\tLoss: 0.202008\nTrain Epoch: 7 [50560/60000 (84%)]\tLoss: 0.159551\nTrain Epoch: 7 [51200/60000 (85%)]\tLoss: 0.221197\nTrain Epoch: 7 [51840/60000 (86%)]\tLoss: 0.266463\nTrain Epoch: 7 [52480/60000 (87%)]\tLoss: 0.073967\nTrain Epoch: 7 [53120/60000 (88%)]\tLoss: 0.350092\nTrain Epoch: 7 [53760/60000 (90%)]\tLoss: 0.106500\nTrain Epoch: 7 [54400/60000 (91%)]\tLoss: 0.208859\nTrain Epoch: 7 [55040/60000 (92%)]\tLoss: 0.209937\nTrain Epoch: 7 [55680/60000 (93%)]\tLoss: 0.215286\nTrain Epoch: 7 [56320/60000 (94%)]\tLoss: 0.117026\nTrain Epoch: 7 [56960/60000 (95%)]\tLoss: 0.132321\nTrain Epoch: 7 [57600/60000 (96%)]\tLoss: 0.286004\nTrain Epoch: 7 [58240/60000 (97%)]\tLoss: 0.170485\nTrain Epoch: 7 [58880/60000 (98%)]\tLoss: 0.196613\nTrain Epoch: 7 [59520/60000 (99%)]\tLoss: 0.293870\n\nTest set: Average loss: 0.0657, Accuracy: 9801/10000 (98%)\n\nTrain Epoch: 8 [0/60000 (0%)]\tLoss: 0.315451\nTrain Epoch: 8 [640/60000 (1%)]\tLoss: 0.114413\nTrain Epoch: 8 [1280/60000 (2%)]\tLoss: 0.129036\nTrain Epoch: 8 [1920/60000 (3%)]\tLoss: 0.141999\nTrain Epoch: 8 [2560/60000 (4%)]\tLoss: 0.118697\nTrain Epoch: 8 [3200/60000 (5%)]\tLoss: 0.126823\nTrain Epoch: 8 [3840/60000 (6%)]\tLoss: 0.053924\nTrain Epoch: 8 [4480/60000 (7%)]\tLoss: 0.296224\nTrain Epoch: 8 [5120/60000 (9%)]\tLoss: 0.121338\nTrain Epoch: 8 [5760/60000 (10%)]\tLoss: 0.255161\nTrain Epoch: 8 [6400/60000 (11%)]\tLoss: 0.170684\nTrain Epoch: 8 [7040/60000 (12%)]\tLoss: 0.092008\nTrain Epoch: 8 [7680/60000 (13%)]\tLoss: 0.283091\nTrain Epoch: 8 [8320/60000 (14%)]\tLoss: 0.027133\nTrain Epoch: 8 [8960/60000 (15%)]\tLoss: 0.195686\nTrain Epoch: 8 [9600/60000 (16%)]\tLoss: 0.343612\nTrain Epoch: 8 [10240/60000 (17%)]\tLoss: 0.108563\nTrain Epoch: 8 [10880/60000 (18%)]\tLoss: 0.223832\nTrain Epoch: 8 [11520/60000 (19%)]\tLoss: 0.175617\nTrain Epoch: 8 [12160/60000 (20%)]\tLoss: 0.145828\nTrain Epoch: 8 [12800/60000 (21%)]\tLoss: 0.178722\nTrain Epoch: 8 [13440/60000 (22%)]\tLoss: 0.151158\nTrain Epoch: 8 [14080/60000 (23%)]\tLoss: 0.183155\nTrain Epoch: 8 [14720/60000 (25%)]\tLoss: 0.110281\nTrain Epoch: 8 [15360/60000 (26%)]\tLoss: 0.282224\nTrain Epoch: 8 [16000/60000 (27%)]\tLoss: 0.097411\nTrain Epoch: 8 [16640/60000 (28%)]\tLoss: 0.264533\nTrain Epoch: 8 [17280/60000 (29%)]\tLoss: 0.194778\nTrain Epoch: 8 [17920/60000 (30%)]\tLoss: 0.235924\nTrain Epoch: 8 [18560/60000 (31%)]\tLoss: 0.236801\nTrain Epoch: 8 [19200/60000 (32%)]\tLoss: 0.178174\nTrain Epoch: 8 [19840/60000 (33%)]\tLoss: 0.218752\nTrain Epoch: 8 [20480/60000 (34%)]\tLoss: 0.208353\nTrain Epoch: 8 [21120/60000 (35%)]\tLoss: 0.193034\nTrain Epoch: 8 [21760/60000 (36%)]\tLoss: 0.138453\nTrain Epoch: 8 [22400/60000 (37%)]\tLoss: 0.175271\nTrain Epoch: 8 [23040/60000 (38%)]\tLoss: 0.157295\nTrain Epoch: 8 [23680/60000 (39%)]\tLoss: 0.156248\nTrain Epoch: 8 [24320/60000 (41%)]\tLoss: 0.153413\nTrain Epoch: 8 [24960/60000 (42%)]\tLoss: 0.084870\nTrain Epoch: 8 [25600/60000 (43%)]\tLoss: 0.150966\nTrain Epoch: 8 [26240/60000 (44%)]\tLoss: 0.160973\nTrain Epoch: 8 [26880/60000 (45%)]\tLoss: 0.231433\nTrain Epoch: 8 [27520/60000 (46%)]\tLoss: 0.144396\nTrain Epoch: 8 [28160/60000 (47%)]\tLoss: 0.200417\nTrain Epoch: 8 [28800/60000 (48%)]\tLoss: 0.152939\nTrain Epoch: 8 [29440/60000 (49%)]\tLoss: 0.109962\nTrain Epoch: 8 [30080/60000 (50%)]\tLoss: 0.134907\nTrain Epoch: 8 [30720/60000 (51%)]\tLoss: 0.088782\nTrain Epoch: 8 [31360/60000 (52%)]\tLoss: 0.129031\nTrain Epoch: 8 [32000/60000 (53%)]\tLoss: 0.184744\nTrain Epoch: 8 [32640/60000 (54%)]\tLoss: 0.155463\nTrain Epoch: 8 [33280/60000 (55%)]\tLoss: 0.174192\nTrain Epoch: 8 [33920/60000 (57%)]\tLoss: 0.172103\nTrain Epoch: 8 [34560/60000 (58%)]\tLoss: 0.201503\nTrain Epoch: 8 [35200/60000 (59%)]\tLoss: 0.287885\nTrain Epoch: 8 [35840/60000 (60%)]\tLoss: 0.133675\nTrain Epoch: 8 [36480/60000 (61%)]\tLoss: 0.243534\nTrain Epoch: 8 [37120/60000 (62%)]\tLoss: 0.196020\nTrain Epoch: 8 [37760/60000 (63%)]\tLoss: 0.101380\nTrain Epoch: 8 [38400/60000 (64%)]\tLoss: 0.108299\nTrain Epoch: 8 [39040/60000 (65%)]\tLoss: 0.159048\nTrain Epoch: 8 [39680/60000 (66%)]\tLoss: 0.204734\nTrain Epoch: 8 [40320/60000 (67%)]\tLoss: 0.238383\nTrain Epoch: 8 [40960/60000 (68%)]\tLoss: 0.592663\nTrain Epoch: 8 [41600/60000 (69%)]\tLoss: 0.116080\nTrain Epoch: 8 [42240/60000 (70%)]\tLoss: 0.039719\nTrain Epoch: 8 [42880/60000 (71%)]\tLoss: 0.148190\nTrain Epoch: 8 [43520/60000 (72%)]\tLoss: 0.241765\nTrain Epoch: 8 [44160/60000 (74%)]\tLoss: 0.235942\nTrain Epoch: 8 [44800/60000 (75%)]\tLoss: 0.175277\nTrain Epoch: 8 [45440/60000 (76%)]\tLoss: 0.143608\nTrain Epoch: 8 [46080/60000 (77%)]\tLoss: 0.114853\nTrain Epoch: 8 [46720/60000 (78%)]\tLoss: 0.232284\nTrain Epoch: 8 [47360/60000 (79%)]\tLoss: 0.321072\nTrain Epoch: 8 [48000/60000 (80%)]\tLoss: 0.310765\nTrain Epoch: 8 [48640/60000 (81%)]\tLoss: 0.102070\nTrain Epoch: 8 [49280/60000 (82%)]\tLoss: 0.372137\nTrain Epoch: 8 [49920/60000 (83%)]\tLoss: 0.109344\nTrain Epoch: 8 [50560/60000 (84%)]\tLoss: 0.382866\nTrain Epoch: 8 [51200/60000 (85%)]\tLoss: 0.270467\nTrain Epoch: 8 [51840/60000 (86%)]\tLoss: 0.061211\nTrain Epoch: 8 [52480/60000 (87%)]\tLoss: 0.233812\nTrain Epoch: 8 [53120/60000 (88%)]\tLoss: 0.176510\nTrain Epoch: 8 [53760/60000 (90%)]\tLoss: 0.120536\nTrain Epoch: 8 [54400/60000 (91%)]\tLoss: 0.241959\nTrain Epoch: 8 [55040/60000 (92%)]\tLoss: 0.183966\nTrain Epoch: 8 [55680/60000 (93%)]\tLoss: 0.125279\nTrain Epoch: 8 [56320/60000 (94%)]\tLoss: 0.152849\nTrain Epoch: 8 [56960/60000 (95%)]\tLoss: 0.219788\nTrain Epoch: 8 [57600/60000 (96%)]\tLoss: 0.077843\nTrain Epoch: 8 [58240/60000 (97%)]\tLoss: 0.304191\nTrain Epoch: 8 [58880/60000 (98%)]\tLoss: 0.363550\nTrain Epoch: 8 [59520/60000 (99%)]\tLoss: 0.326421\n\nTest set: Average loss: 0.0632, Accuracy: 9807/10000 (98%)\n\nTrain Epoch: 9 [0/60000 (0%)]\tLoss: 0.140965\nTrain Epoch: 9 [640/60000 (1%)]\tLoss: 0.206063\nTrain Epoch: 9 [1280/60000 (2%)]\tLoss: 0.189364\nTrain Epoch: 9 [1920/60000 (3%)]\tLoss: 0.367962\nTrain Epoch: 9 [2560/60000 (4%)]\tLoss: 0.108362\nTrain Epoch: 9 [3200/60000 (5%)]\tLoss: 0.109142\nTrain Epoch: 9 [3840/60000 (6%)]\tLoss: 0.270022\nTrain Epoch: 9 [4480/60000 (7%)]\tLoss: 0.200647\nTrain Epoch: 9 [5120/60000 (9%)]\tLoss: 0.162118\nTrain Epoch: 9 [5760/60000 (10%)]\tLoss: 0.167245\nTrain Epoch: 9 [6400/60000 (11%)]\tLoss: 0.188903\nTrain Epoch: 9 [7040/60000 (12%)]\tLoss: 0.280550\nTrain Epoch: 9 [7680/60000 (13%)]\tLoss: 0.116265\nTrain Epoch: 9 [8320/60000 (14%)]\tLoss: 0.602693\nTrain Epoch: 9 [8960/60000 (15%)]\tLoss: 0.148682\nTrain Epoch: 9 [9600/60000 (16%)]\tLoss: 0.225477\nTrain Epoch: 9 [10240/60000 (17%)]\tLoss: 0.133642\nTrain Epoch: 9 [10880/60000 (18%)]\tLoss: 0.116083\nTrain Epoch: 9 [11520/60000 (19%)]\tLoss: 0.348113\nTrain Epoch: 9 [12160/60000 (20%)]\tLoss: 0.219562\nTrain Epoch: 9 [12800/60000 (21%)]\tLoss: 0.117716\nTrain Epoch: 9 [13440/60000 (22%)]\tLoss: 0.218508\nTrain Epoch: 9 [14080/60000 (23%)]\tLoss: 0.323755\nTrain Epoch: 9 [14720/60000 (25%)]\tLoss: 0.211174\nTrain Epoch: 9 [15360/60000 (26%)]\tLoss: 0.451853\nTrain Epoch: 9 [16000/60000 (27%)]\tLoss: 0.155174\nTrain Epoch: 9 [16640/60000 (28%)]\tLoss: 0.134905\nTrain Epoch: 9 [17280/60000 (29%)]\tLoss: 0.172428\nTrain Epoch: 9 [17920/60000 (30%)]\tLoss: 0.306172\nTrain Epoch: 9 [18560/60000 (31%)]\tLoss: 0.133085\nTrain Epoch: 9 [19200/60000 (32%)]\tLoss: 0.449040\nTrain Epoch: 9 [19840/60000 (33%)]\tLoss: 0.084722\nTrain Epoch: 9 [20480/60000 (34%)]\tLoss: 0.188086\nTrain Epoch: 9 [21120/60000 (35%)]\tLoss: 0.222472\nTrain Epoch: 9 [21760/60000 (36%)]\tLoss: 0.275132\nTrain Epoch: 9 [22400/60000 (37%)]\tLoss: 0.287421\nTrain Epoch: 9 [23040/60000 (38%)]\tLoss: 0.105733\nTrain Epoch: 9 [23680/60000 (39%)]\tLoss: 0.157949\nTrain Epoch: 9 [24320/60000 (41%)]\tLoss: 0.073462\nTrain Epoch: 9 [24960/60000 (42%)]\tLoss: 0.240201\nTrain Epoch: 9 [25600/60000 (43%)]\tLoss: 0.060848\nTrain Epoch: 9 [26240/60000 (44%)]\tLoss: 0.173801\nTrain Epoch: 9 [26880/60000 (45%)]\tLoss: 0.148143\nTrain Epoch: 9 [27520/60000 (46%)]\tLoss: 0.180779\nTrain Epoch: 9 [28160/60000 (47%)]\tLoss: 0.393192\nTrain Epoch: 9 [28800/60000 (48%)]\tLoss: 0.239243\nTrain Epoch: 9 [29440/60000 (49%)]\tLoss: 0.064345\nTrain Epoch: 9 [30080/60000 (50%)]\tLoss: 0.315658\nTrain Epoch: 9 [30720/60000 (51%)]\tLoss: 0.105739\nTrain Epoch: 9 [31360/60000 (52%)]\tLoss: 0.246439\nTrain Epoch: 9 [32000/60000 (53%)]\tLoss: 0.145221\nTrain Epoch: 9 [32640/60000 (54%)]\tLoss: 0.287615\nTrain Epoch: 9 [33280/60000 (55%)]\tLoss: 0.310717\nTrain Epoch: 9 [33920/60000 (57%)]\tLoss: 0.322760\nTrain Epoch: 9 [34560/60000 (58%)]\tLoss: 0.294462\nTrain Epoch: 9 [35200/60000 (59%)]\tLoss: 0.168697\nTrain Epoch: 9 [35840/60000 (60%)]\tLoss: 0.153495\nTrain Epoch: 9 [36480/60000 (61%)]\tLoss: 0.146843\nTrain Epoch: 9 [37120/60000 (62%)]\tLoss: 0.176622\nTrain Epoch: 9 [37760/60000 (63%)]\tLoss: 0.400825\nTrain Epoch: 9 [38400/60000 (64%)]\tLoss: 0.197533\nTrain Epoch: 9 [39040/60000 (65%)]\tLoss: 0.109741\nTrain Epoch: 9 [39680/60000 (66%)]\tLoss: 0.049689\nTrain Epoch: 9 [40320/60000 (67%)]\tLoss: 0.253087\nTrain Epoch: 9 [40960/60000 (68%)]\tLoss: 0.222971\nTrain Epoch: 9 [41600/60000 (69%)]\tLoss: 0.095467\nTrain Epoch: 9 [42240/60000 (70%)]\tLoss: 0.043052\nTrain Epoch: 9 [42880/60000 (71%)]\tLoss: 0.105347\nTrain Epoch: 9 [43520/60000 (72%)]\tLoss: 0.133342\nTrain Epoch: 9 [44160/60000 (74%)]\tLoss: 0.266375\nTrain Epoch: 9 [44800/60000 (75%)]\tLoss: 0.156081\nTrain Epoch: 9 [45440/60000 (76%)]\tLoss: 0.206747\nTrain Epoch: 9 [46080/60000 (77%)]\tLoss: 0.158561\nTrain Epoch: 9 [46720/60000 (78%)]\tLoss: 0.416148\nTrain Epoch: 9 [47360/60000 (79%)]\tLoss: 0.147991\nTrain Epoch: 9 [48000/60000 (80%)]\tLoss: 0.112567\nTrain Epoch: 9 [48640/60000 (81%)]\tLoss: 0.100846\nTrain Epoch: 9 [49280/60000 (82%)]\tLoss: 0.103345\nTrain Epoch: 9 [49920/60000 (83%)]\tLoss: 0.205922\nTrain Epoch: 9 [50560/60000 (84%)]\tLoss: 0.097610\nTrain Epoch: 9 [51200/60000 (85%)]\tLoss: 0.071967\nTrain Epoch: 9 [51840/60000 (86%)]\tLoss: 0.068125\nTrain Epoch: 9 [52480/60000 (87%)]\tLoss: 0.057313\nTrain Epoch: 9 [53120/60000 (88%)]\tLoss: 0.162428\nTrain Epoch: 9 [53760/60000 (90%)]\tLoss: 0.097614\nTrain Epoch: 9 [54400/60000 (91%)]\tLoss: 0.075174\nTrain Epoch: 9 [55040/60000 (92%)]\tLoss: 0.095530\nTrain Epoch: 9 [55680/60000 (93%)]\tLoss: 0.142529\nTrain Epoch: 9 [56320/60000 (94%)]\tLoss: 0.132163\nTrain Epoch: 9 [56960/60000 (95%)]\tLoss: 0.201932\nTrain Epoch: 9 [57600/60000 (96%)]\tLoss: 0.238939\nTrain Epoch: 9 [58240/60000 (97%)]\tLoss: 0.037396\nTrain Epoch: 9 [58880/60000 (98%)]\tLoss: 0.077772\nTrain Epoch: 9 [59520/60000 (99%)]\tLoss: 0.177759\n\nTest set: Average loss: 0.0559, Accuracy: 9813/10000 (98%)\n\nTrain Epoch: 10 [0/60000 (0%)]\tLoss: 0.112115\nTrain Epoch: 10 [640/60000 (1%)]\tLoss: 0.089035\nTrain Epoch: 10 [1280/60000 (2%)]\tLoss: 0.177925\nTrain Epoch: 10 [1920/60000 (3%)]\tLoss: 0.147350\nTrain Epoch: 10 [2560/60000 (4%)]\tLoss: 0.170561\nTrain Epoch: 10 [3200/60000 (5%)]\tLoss: 0.207891\nTrain Epoch: 10 [3840/60000 (6%)]\tLoss: 0.340160\nTrain Epoch: 10 [4480/60000 (7%)]\tLoss: 0.229032\nTrain Epoch: 10 [5120/60000 (9%)]\tLoss: 0.335419\nTrain Epoch: 10 [5760/60000 (10%)]\tLoss: 0.101219\nTrain Epoch: 10 [6400/60000 (11%)]\tLoss: 0.085085\nTrain Epoch: 10 [7040/60000 (12%)]\tLoss: 0.053658\nTrain Epoch: 10 [7680/60000 (13%)]\tLoss: 0.106224\nTrain Epoch: 10 [8320/60000 (14%)]\tLoss: 0.146947\nTrain Epoch: 10 [8960/60000 (15%)]\tLoss: 0.210157\nTrain Epoch: 10 [9600/60000 (16%)]\tLoss: 0.167598\nTrain Epoch: 10 [10240/60000 (17%)]\tLoss: 0.184822\nTrain Epoch: 10 [10880/60000 (18%)]\tLoss: 0.149518\nTrain Epoch: 10 [11520/60000 (19%)]\tLoss: 0.091374\nTrain Epoch: 10 [12160/60000 (20%)]\tLoss: 0.331635\nTrain Epoch: 10 [12800/60000 (21%)]\tLoss: 0.345818\nTrain Epoch: 10 [13440/60000 (22%)]\tLoss: 0.057789\nTrain Epoch: 10 [14080/60000 (23%)]\tLoss: 0.189208\nTrain Epoch: 10 [14720/60000 (25%)]\tLoss: 0.116747\nTrain Epoch: 10 [15360/60000 (26%)]\tLoss: 0.101344\nTrain Epoch: 10 [16000/60000 (27%)]\tLoss: 0.116675\nTrain Epoch: 10 [16640/60000 (28%)]\tLoss: 0.158562\nTrain Epoch: 10 [17280/60000 (29%)]\tLoss: 0.173697\nTrain Epoch: 10 [17920/60000 (30%)]\tLoss: 0.167972\nTrain Epoch: 10 [18560/60000 (31%)]\tLoss: 0.125186\nTrain Epoch: 10 [19200/60000 (32%)]\tLoss: 0.116458\nTrain Epoch: 10 [19840/60000 (33%)]\tLoss: 0.107688\nTrain Epoch: 10 [20480/60000 (34%)]\tLoss: 0.131942\nTrain Epoch: 10 [21120/60000 (35%)]\tLoss: 0.189690\nTrain Epoch: 10 [21760/60000 (36%)]\tLoss: 0.106075\nTrain Epoch: 10 [22400/60000 (37%)]\tLoss: 0.100791\nTrain Epoch: 10 [23040/60000 (38%)]\tLoss: 0.151750\nTrain Epoch: 10 [23680/60000 (39%)]\tLoss: 0.242852\nTrain Epoch: 10 [24320/60000 (41%)]\tLoss: 0.367772\nTrain Epoch: 10 [24960/60000 (42%)]\tLoss: 0.160668\nTrain Epoch: 10 [25600/60000 (43%)]\tLoss: 0.209858\nTrain Epoch: 10 [26240/60000 (44%)]\tLoss: 0.267443\nTrain Epoch: 10 [26880/60000 (45%)]\tLoss: 0.134159\nTrain Epoch: 10 [27520/60000 (46%)]\tLoss: 0.176844\nTrain Epoch: 10 [28160/60000 (47%)]\tLoss: 0.083609\nTrain Epoch: 10 [28800/60000 (48%)]\tLoss: 0.093472\nTrain Epoch: 10 [29440/60000 (49%)]\tLoss: 0.133502\nTrain Epoch: 10 [30080/60000 (50%)]\tLoss: 0.207314\nTrain Epoch: 10 [30720/60000 (51%)]\tLoss: 0.095819\nTrain Epoch: 10 [31360/60000 (52%)]\tLoss: 0.165338\nTrain Epoch: 10 [32000/60000 (53%)]\tLoss: 0.172792\nTrain Epoch: 10 [32640/60000 (54%)]\tLoss: 0.200346\nTrain Epoch: 10 [33280/60000 (55%)]\tLoss: 0.188566\nTrain Epoch: 10 [33920/60000 (57%)]\tLoss: 0.063107\nTrain Epoch: 10 [34560/60000 (58%)]\tLoss: 0.208076\nTrain Epoch: 10 [35200/60000 (59%)]\tLoss: 0.336500\nTrain Epoch: 10 [35840/60000 (60%)]\tLoss: 0.098523\nTrain Epoch: 10 [36480/60000 (61%)]\tLoss: 0.239501\nTrain Epoch: 10 [37120/60000 (62%)]\tLoss: 0.108441\nTrain Epoch: 10 [37760/60000 (63%)]\tLoss: 0.161891\nTrain Epoch: 10 [38400/60000 (64%)]\tLoss: 0.232178\nTrain Epoch: 10 [39040/60000 (65%)]\tLoss: 0.281599\nTrain Epoch: 10 [39680/60000 (66%)]\tLoss: 0.202701\nTrain Epoch: 10 [40320/60000 (67%)]\tLoss: 0.313276\nTrain Epoch: 10 [40960/60000 (68%)]\tLoss: 0.149932\nTrain Epoch: 10 [41600/60000 (69%)]\tLoss: 0.078690\nTrain Epoch: 10 [42240/60000 (70%)]\tLoss: 0.068174\nTrain Epoch: 10 [42880/60000 (71%)]\tLoss: 0.114682\nTrain Epoch: 10 [43520/60000 (72%)]\tLoss: 0.278032\nTrain Epoch: 10 [44160/60000 (74%)]\tLoss: 0.207701\nTrain Epoch: 10 [44800/60000 (75%)]\tLoss: 0.149129\nTrain Epoch: 10 [45440/60000 (76%)]\tLoss: 0.209997\nTrain Epoch: 10 [46080/60000 (77%)]\tLoss: 0.181944\nTrain Epoch: 10 [46720/60000 (78%)]\tLoss: 0.071149\nTrain Epoch: 10 [47360/60000 (79%)]\tLoss: 0.088598\nTrain Epoch: 10 [48000/60000 (80%)]\tLoss: 0.196593\nTrain Epoch: 10 [48640/60000 (81%)]\tLoss: 0.195960\nTrain Epoch: 10 [49280/60000 (82%)]\tLoss: 0.227564\nTrain Epoch: 10 [49920/60000 (83%)]\tLoss: 0.051203\nTrain Epoch: 10 [50560/60000 (84%)]\tLoss: 0.105916\nTrain Epoch: 10 [51200/60000 (85%)]\tLoss: 0.176384\nTrain Epoch: 10 [51840/60000 (86%)]\tLoss: 0.054657\nTrain Epoch: 10 [52480/60000 (87%)]\tLoss: 0.107465\nTrain Epoch: 10 [53120/60000 (88%)]\tLoss: 0.072626\nTrain Epoch: 10 [53760/60000 (90%)]\tLoss: 0.187904\nTrain Epoch: 10 [54400/60000 (91%)]\tLoss: 0.104509\nTrain Epoch: 10 [55040/60000 (92%)]\tLoss: 0.174006\nTrain Epoch: 10 [55680/60000 (93%)]\tLoss: 0.122760\nTrain Epoch: 10 [56320/60000 (94%)]\tLoss: 0.150131\nTrain Epoch: 10 [56960/60000 (95%)]\tLoss: 0.076365\nTrain Epoch: 10 [57600/60000 (96%)]\tLoss: 0.127536\nTrain Epoch: 10 [58240/60000 (97%)]\tLoss: 0.233154\nTrain Epoch: 10 [58880/60000 (98%)]\tLoss: 0.113188\nTrain Epoch: 10 [59520/60000 (99%)]\tLoss: 0.282389\n\nTest set: Average loss: 0.0531, Accuracy: 9837/10000 (98%)\n\n\n\nThe experiment completed successfully. Finalizing run...\nLogging experiment finalizing status in history service\n\n\nRun is completed.", - "run_properties": { - "SendToClient": "1", - "arguments": "--output-dir ./outputs", - "created_utc": "2018-09-25T11:56:04.832205Z", - "distributed_processes": [], - "end_time_utc": "2018-09-25T12:15:57.841467Z", - "log_files": { - "azureml-logs/55_batchai_execution.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/55_batchai_execution.txt?sv=2017-04-17&sr=b&sig=NNkIC62xdG1h6156XtjtgwTJ1ScXlfxhBiBicNNoExE%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", - "azureml-logs/60_control_log.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/60_control_log.txt?sv=2017-04-17&sr=b&sig=i2mtPt6w5xHkEjpkyfl%2BSD1GPpIdpzIbY6sVUQ62QMo%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", - "azureml-logs/80_driver_log.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/80_driver_log.txt?sv=2017-04-17&sr=b&sig=CvqNHP18huWuXWdi%2BeiPcnztgJfI1iQQ6fV6Li25z1Y%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", - "azureml-logs/azureml.log": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/azureml.log?sv=2017-04-17&sr=b&sig=UTaxvUU4Ua%2FpsXPwQnSIV%2FbKK1zERtclIIjcTfbcSzQ%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r" - }, - "properties": { - "ContentSnapshotId": "727976ee-33bf-44c7-af65-ef1a1cbd2980", - "azureml.runsource": "experiment" - }, - "run_duration": "0:19:53", - "run_id": "pytorch1-mnist_1537876563990", - "script_name": "mnist.py", - "status": "Completed", - "tags": {} - }, - "widget_settings": {}, - "workbench_uri": "https://mlworkspace.azure.ai/portal/subscriptions/75f78a03-482f-4fd8-8c71-5ddc08f92726/resourceGroups/onnxdemos/providers/Microsoft.MachineLearningServices/workspaces/onnx-aml-ignite-demo/experiment/pytorch1-mnist/run/pytorch1-mnist_1537876563990" - } - } - }, - "version_major": 2, - "version_minor": 0 - } - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# MNIST Handwritten Digit Classification using ONNX and AzureML\n", + "\n", + "This example shows how to train a model on the MNIST data using PyTorch, save it as an ONNX model, and deploy it as a web service using Azure Machine Learning services and the ONNX Runtime.\n", + "\n", + "## What is ONNX\n", + "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", + "\n", + "## MNIST Details\n", + "The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/mnist). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model\n", + "\n", + "### Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=6)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + "compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# Use the 'status' property to get a detailed status for the current cluster. \n", + "print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './pytorch-mnist'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script [`mnist.py`](mnist.py) into your project directory. Make sure the training script has the following code to create an ONNX file:\n", + "```python\n", + "dummy_input = torch.randn(1, 1, 28, 28, device=device)\n", + "model_path = os.path.join(output_dir, 'mnist.onnx')\n", + "torch.onnx.export(model, dummy_input, model_path)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'pytorch1-mnist'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a PyTorch estimator\n", + "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import PyTorch\n", + "\n", + "estimator = PyTorch(source_directory=project_folder, \n", + " script_params={'--output-dir': './outputs'},\n", + " compute_target=compute_target,\n", + " entry_script='mnist.py',\n", + " use_gpu=True)\n", + "\n", + "# upgrade to PyTorch 1.0 Preview, which has better support for ONNX\n", + "estimator.conda_dependencies.remove_conda_package('pytorch=0.4.0')\n", + "estimator.conda_dependencies.add_conda_package('pytorch-nightly')\n", + "estimator.conda_dependencies.add_channel('pytorch')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n", + "- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n", + "\n", + "To leverage the Azure VM's GPU for training, we set `use_gpu=True`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download the model (optional)\n", + "\n", + "Once the run completes, you can choose to download the ONNX model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# list all the files from the run\n", + "run.get_file_names()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_path = os.path.join('outputs', 'mnist.onnx')\n", + "run.download_file(model_path, output_file_path=model_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the model\n", + "You can also register the model from your run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. You can then deploy this registered model as a web service through the AML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = run.register_model(model_name='mnist', model_path=model_path)\n", + "print(model.name, model.id, model.version, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Displaying your registered models (optional)\n", + "\n", + "You can optionally list out all the models that you have registered in this workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, m in models.items():\n", + " print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploying as a web service\n", + "\n", + "### Write scoring file\n", + "\n", + "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import time\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import numpy as np # we're going to use numpy to process input and output data\n", + "import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n", + "\n", + "def init():\n", + " global session\n", + " model = Model.get_model_path(model_name = 'mnist')\n", + " session = onnxruntime.InferenceSession(model)\n", + "\n", + "def preprocess(input_data_json):\n", + " # convert the JSON data into the tensor input\n", + " return np.array(json.loads(input_data_json)['data']).astype('float32')\n", + "\n", + "def postprocess(result):\n", + " # We use argmax to pick the highest confidence label\n", + " return int(np.argmax(np.array(result).squeeze(), axis=0))\n", + "\n", + "def run(input_data_json):\n", + " try:\n", + " start = time.time() # start timer\n", + " input_data = preprocess(input_data_json)\n", + " input_name = session.get_inputs()[0].name # get the id of the first input of the model \n", + " result = session.run([], {input_name: input_data})\n", + " end = time.time() # stop timer\n", + " return {\"result\": postprocess(result),\n", + " \"time\": end - start}\n", + " except Exception as e:\n", + " result = str(e)\n", + " return {\"error\": result}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create container image\n", + "First we create a YAML file that specifies which dependencies we would like to see in our container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we have Azure ML create the container. This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " docker_file = \"Dockerfile\",\n", + " description = \"MNIST ONNX Demo\",\n", + " tags = {\"demo\": \"onnx\"}\n", + " )\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnxmnistdemo\",\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all set! Let's get our model chugging.\n", + "\n", + "### Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'web service for MNIST ONNX model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from random import randint\n", + "\n", + "aci_service_name = 'onnx-demo-mnist'+str(randint(0,100))\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + " aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "\n", + "If you've made it this far, you've deployed a working web service that does handwritten digit classification using an ONNX model. You can get the URL for the webservice with the code below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(aci_service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you are eventually done using the web service, remember to delete it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#aci_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "viswamy" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "c899ddfc2b134ca9b89a4f278ac7c997": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.1.0", + "model_name": "LayoutModel", + "state": {} + }, + "d146cbdbd4e04710b3eebc15a66957ce": { + "model_module": "azureml_widgets", + "model_module_version": "1.0.0", + "model_name": "ShowRunDetailsModel", + "state": { + "child_runs_metrics": {}, + "compute_target_status": { + "current_node_count": 1, + "node_state_counts": { + "idleNodeCount": 1, + "leavingNodeCount": 0, + "preparingNodeCount": 0, + "runningNodeCount": 0, + "unusableNodeCount": 0 + }, + "provisioning_errors": null, + "provisioning_state": "Succeeded", + "requested_node_count": 1, + "scale_settings": { + "autoScale": { + "initialNodeCount": 0, + "maximumNodeCount": 4, + "minimumNodeCount": 0 + }, + "manual": null + }, + "vm_size": "STANDARD_NC6" + }, + "error": "", + "layout": "IPY_MODEL_c899ddfc2b134ca9b89a4f278ac7c997", + "run_id": "pytorch1-mnist_1537876563990", + "run_logs": "Uploading experiment status to history service.\nAdding run profile attachment azureml-logs/60_control_log.txt\nUploading experiment status to history service.\nAdding run profile attachment azureml-logs/80_driver_log.txt\nScript process exited with code 0\nUploading driver log...\nFinalizing run...\n\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\nProcessing...\nDone!\nTrain Epoch: 1 [0/60000 (0%)]\tLoss: 2.365850\nTrain Epoch: 1 [640/60000 (1%)]\tLoss: 2.305295\nTrain Epoch: 1 [1280/60000 (2%)]\tLoss: 2.301407\nTrain Epoch: 1 [1920/60000 (3%)]\tLoss: 2.316538\nTrain Epoch: 1 [2560/60000 (4%)]\tLoss: 2.255810\nTrain Epoch: 1 [3200/60000 (5%)]\tLoss: 2.224511\nTrain Epoch: 1 [3840/60000 (6%)]\tLoss: 2.216569\nTrain Epoch: 1 [4480/60000 (7%)]\tLoss: 2.181396\nTrain Epoch: 1 [5120/60000 (9%)]\tLoss: 2.116898\nTrain Epoch: 1 [5760/60000 (10%)]\tLoss: 2.045963\nTrain Epoch: 1 [6400/60000 (11%)]\tLoss: 1.973494\nTrain Epoch: 1 [7040/60000 (12%)]\tLoss: 1.968609\nTrain Epoch: 1 [7680/60000 (13%)]\tLoss: 1.787280\nTrain Epoch: 1 [8320/60000 (14%)]\tLoss: 1.735044\nTrain Epoch: 1 [8960/60000 (15%)]\tLoss: 1.680426\nTrain Epoch: 1 [9600/60000 (16%)]\tLoss: 1.486279\nTrain Epoch: 1 [10240/60000 (17%)]\tLoss: 1.545747\nTrain Epoch: 1 [10880/60000 (18%)]\tLoss: 1.193543\nTrain Epoch: 1 [11520/60000 (19%)]\tLoss: 1.652350\nTrain Epoch: 1 [12160/60000 (20%)]\tLoss: 0.982182\nTrain Epoch: 1 [12800/60000 (21%)]\tLoss: 1.331902\nTrain Epoch: 1 [13440/60000 (22%)]\tLoss: 1.089598\nTrain Epoch: 1 [14080/60000 (23%)]\tLoss: 0.998703\nTrain Epoch: 1 [14720/60000 (25%)]\tLoss: 0.992036\nTrain Epoch: 1 [15360/60000 (26%)]\tLoss: 0.979473\nTrain Epoch: 1 [16000/60000 (27%)]\tLoss: 1.141276\nTrain Epoch: 1 [16640/60000 (28%)]\tLoss: 0.836921\nTrain Epoch: 1 [17280/60000 (29%)]\tLoss: 0.764657\nTrain Epoch: 1 [17920/60000 (30%)]\tLoss: 0.826818\nTrain Epoch: 1 [18560/60000 (31%)]\tLoss: 0.837834\nTrain Epoch: 1 [19200/60000 (32%)]\tLoss: 0.899033\nTrain Epoch: 1 [19840/60000 (33%)]\tLoss: 0.868245\nTrain Epoch: 1 [20480/60000 (34%)]\tLoss: 0.930491\nTrain Epoch: 1 [21120/60000 (35%)]\tLoss: 0.795202\nTrain Epoch: 1 [21760/60000 (36%)]\tLoss: 0.575117\nTrain Epoch: 1 [22400/60000 (37%)]\tLoss: 0.577884\nTrain Epoch: 1 [23040/60000 (38%)]\tLoss: 0.708801\nTrain Epoch: 1 [23680/60000 (39%)]\tLoss: 0.927512\nTrain Epoch: 1 [24320/60000 (41%)]\tLoss: 0.598836\nTrain Epoch: 1 [24960/60000 (42%)]\tLoss: 0.944021\nTrain Epoch: 1 [25600/60000 (43%)]\tLoss: 0.811654\nTrain Epoch: 1 [26240/60000 (44%)]\tLoss: 0.590322\nTrain Epoch: 1 [26880/60000 (45%)]\tLoss: 0.555104\nTrain Epoch: 1 [27520/60000 (46%)]\tLoss: 0.795565\nTrain Epoch: 1 [28160/60000 (47%)]\tLoss: 0.603378\nTrain Epoch: 1 [28800/60000 (48%)]\tLoss: 0.552437\nTrain Epoch: 1 [29440/60000 (49%)]\tLoss: 0.662064\nTrain Epoch: 1 [30080/60000 (50%)]\tLoss: 0.682541\nTrain Epoch: 1 [30720/60000 (51%)]\tLoss: 0.659051\nTrain Epoch: 1 [31360/60000 (52%)]\tLoss: 0.781052\nTrain Epoch: 1 [32000/60000 (53%)]\tLoss: 0.595491\nTrain Epoch: 1 [32640/60000 (54%)]\tLoss: 0.367289\nTrain Epoch: 1 [33280/60000 (55%)]\tLoss: 0.459428\nTrain Epoch: 1 [33920/60000 (57%)]\tLoss: 0.819237\nTrain Epoch: 1 [34560/60000 (58%)]\tLoss: 0.773166\nTrain Epoch: 1 [35200/60000 (59%)]\tLoss: 0.557691\nTrain Epoch: 1 [35840/60000 (60%)]\tLoss: 0.854719\nTrain Epoch: 1 [36480/60000 (61%)]\tLoss: 0.497524\nTrain Epoch: 1 [37120/60000 (62%)]\tLoss: 0.582861\nTrain Epoch: 1 [37760/60000 (63%)]\tLoss: 0.839674\nTrain Epoch: 1 [38400/60000 (64%)]\tLoss: 0.557275\nTrain Epoch: 1 [39040/60000 (65%)]\tLoss: 0.419819\nTrain Epoch: 1 [39680/60000 (66%)]\tLoss: 0.694659\nTrain Epoch: 1 [40320/60000 (67%)]\tLoss: 0.678524\nTrain Epoch: 1 [40960/60000 (68%)]\tLoss: 0.514364\nTrain Epoch: 1 [41600/60000 (69%)]\tLoss: 0.400510\nTrain Epoch: 1 [42240/60000 (70%)]\tLoss: 0.526099\nTrain Epoch: 1 [42880/60000 (71%)]\tLoss: 0.387087\nTrain Epoch: 1 [43520/60000 (72%)]\tLoss: 0.730123\nTrain Epoch: 1 [44160/60000 (74%)]\tLoss: 0.678924\nTrain Epoch: 1 [44800/60000 (75%)]\tLoss: 0.425195\nTrain Epoch: 1 [45440/60000 (76%)]\tLoss: 0.656437\nTrain Epoch: 1 [46080/60000 (77%)]\tLoss: 0.348130\nTrain Epoch: 1 [46720/60000 (78%)]\tLoss: 0.487442\nTrain Epoch: 1 [47360/60000 (79%)]\tLoss: 0.649533\nTrain Epoch: 1 [48000/60000 (80%)]\tLoss: 0.541395\nTrain Epoch: 1 [48640/60000 (81%)]\tLoss: 0.464202\nTrain Epoch: 1 [49280/60000 (82%)]\tLoss: 0.750336\nTrain Epoch: 1 [49920/60000 (83%)]\tLoss: 0.548484\nTrain Epoch: 1 [50560/60000 (84%)]\tLoss: 0.421382\nTrain Epoch: 1 [51200/60000 (85%)]\tLoss: 0.680766\nTrain Epoch: 1 [51840/60000 (86%)]\tLoss: 0.483003\nTrain Epoch: 1 [52480/60000 (87%)]\tLoss: 0.610840\nTrain Epoch: 1 [53120/60000 (88%)]\tLoss: 0.483278\nTrain Epoch: 1 [53760/60000 (90%)]\tLoss: 0.553161\nTrain Epoch: 1 [54400/60000 (91%)]\tLoss: 0.465237\nTrain Epoch: 1 [55040/60000 (92%)]\tLoss: 0.558884\nTrain Epoch: 1 [55680/60000 (93%)]\tLoss: 0.528969\nTrain Epoch: 1 [56320/60000 (94%)]\tLoss: 0.370189\nTrain Epoch: 1 [56960/60000 (95%)]\tLoss: 0.379404\nTrain Epoch: 1 [57600/60000 (96%)]\tLoss: 0.263894\nTrain Epoch: 1 [58240/60000 (97%)]\tLoss: 0.432745\nTrain Epoch: 1 [58880/60000 (98%)]\tLoss: 0.455681\nTrain Epoch: 1 [59520/60000 (99%)]\tLoss: 0.483901\n/azureml-envs/azureml_de892a6d0f01a442356c3959dd42e13b/lib/python3.6/site-packages/torch/nn/functional.py:54: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.\n warnings.warn(warning.format(ret))\n\nTest set: Average loss: 0.2073, Accuracy: 9384/10000 (94%)\n\nTrain Epoch: 2 [0/60000 (0%)]\tLoss: 0.390797\nTrain Epoch: 2 [640/60000 (1%)]\tLoss: 0.214512\nTrain Epoch: 2 [1280/60000 (2%)]\tLoss: 0.226415\nTrain Epoch: 2 [1920/60000 (3%)]\tLoss: 0.491764\nTrain Epoch: 2 [2560/60000 (4%)]\tLoss: 0.333604\nTrain Epoch: 2 [3200/60000 (5%)]\tLoss: 0.514239\nTrain Epoch: 2 [3840/60000 (6%)]\tLoss: 0.430618\nTrain Epoch: 2 [4480/60000 (7%)]\tLoss: 0.579474\nTrain Epoch: 2 [5120/60000 (9%)]\tLoss: 0.259456\nTrain Epoch: 2 [5760/60000 (10%)]\tLoss: 0.651198\nTrain Epoch: 2 [6400/60000 (11%)]\tLoss: 0.338269\nTrain Epoch: 2 [7040/60000 (12%)]\tLoss: 0.335233\nTrain Epoch: 2 [7680/60000 (13%)]\tLoss: 0.518132\nTrain Epoch: 2 [8320/60000 (14%)]\tLoss: 0.363488\nTrain Epoch: 2 [8960/60000 (15%)]\tLoss: 0.437092\nTrain Epoch: 2 [9600/60000 (16%)]\tLoss: 0.362660\nTrain Epoch: 2 [10240/60000 (17%)]\tLoss: 0.432337\nTrain Epoch: 2 [10880/60000 (18%)]\tLoss: 0.360611\nTrain Epoch: 2 [11520/60000 (19%)]\tLoss: 0.305427\nTrain Epoch: 2 [12160/60000 (20%)]\tLoss: 0.347859\nTrain Epoch: 2 [12800/60000 (21%)]\tLoss: 0.408770\nTrain Epoch: 2 [13440/60000 (22%)]\tLoss: 0.469975\nTrain Epoch: 2 [14080/60000 (23%)]\tLoss: 0.673716\nTrain Epoch: 2 [14720/60000 (25%)]\tLoss: 0.388876\nTrain Epoch: 2 [15360/60000 (26%)]\tLoss: 0.462371\nTrain Epoch: 2 [16000/60000 (27%)]\tLoss: 0.530107\nTrain Epoch: 2 [16640/60000 (28%)]\tLoss: 0.448767\nTrain Epoch: 2 [17280/60000 (29%)]\tLoss: 0.412764\nTrain Epoch: 2 [17920/60000 (30%)]\tLoss: 0.301494\nTrain Epoch: 2 [18560/60000 (31%)]\tLoss: 0.465599\nTrain Epoch: 2 [19200/60000 (32%)]\tLoss: 0.434249\nTrain Epoch: 2 [19840/60000 (33%)]\tLoss: 0.324006\nTrain Epoch: 2 [20480/60000 (34%)]\tLoss: 0.447446\nTrain Epoch: 2 [21120/60000 (35%)]\tLoss: 0.291222\nTrain Epoch: 2 [21760/60000 (36%)]\tLoss: 0.557065\nTrain Epoch: 2 [22400/60000 (37%)]\tLoss: 0.552659\nTrain Epoch: 2 [23040/60000 (38%)]\tLoss: 0.378901\nTrain Epoch: 2 [23680/60000 (39%)]\tLoss: 0.360550\nTrain Epoch: 2 [24320/60000 (41%)]\tLoss: 0.283795\nTrain Epoch: 2 [24960/60000 (42%)]\tLoss: 0.475816\nTrain Epoch: 2 [25600/60000 (43%)]\tLoss: 0.283652\nTrain Epoch: 2 [26240/60000 (44%)]\tLoss: 0.276265\nTrain Epoch: 2 [26880/60000 (45%)]\tLoss: 0.527902\nTrain Epoch: 2 [27520/60000 (46%)]\tLoss: 0.437130\nTrain Epoch: 2 [28160/60000 (47%)]\tLoss: 0.277132\nTrain Epoch: 2 [28800/60000 (48%)]\tLoss: 0.471580\nTrain Epoch: 2 [29440/60000 (49%)]\tLoss: 0.380154\nTrain Epoch: 2 [30080/60000 (50%)]\tLoss: 0.232072\nTrain Epoch: 2 [30720/60000 (51%)]\tLoss: 0.366567\nTrain Epoch: 2 [31360/60000 (52%)]\tLoss: 0.469628\nTrain Epoch: 2 [32000/60000 (53%)]\tLoss: 0.440017\nTrain Epoch: 2 [32640/60000 (54%)]\tLoss: 0.421814\nTrain Epoch: 2 [33280/60000 (55%)]\tLoss: 0.367687\nTrain Epoch: 2 [33920/60000 (57%)]\tLoss: 0.448384\nTrain Epoch: 2 [34560/60000 (58%)]\tLoss: 0.550283\nTrain Epoch: 2 [35200/60000 (59%)]\tLoss: 0.609798\nTrain Epoch: 2 [35840/60000 (60%)]\tLoss: 0.461334\nTrain Epoch: 2 [36480/60000 (61%)]\tLoss: 0.443838\nTrain Epoch: 2 [37120/60000 (62%)]\tLoss: 0.306666\nTrain Epoch: 2 [37760/60000 (63%)]\tLoss: 0.432083\nTrain Epoch: 2 [38400/60000 (64%)]\tLoss: 0.277025\nTrain Epoch: 2 [39040/60000 (65%)]\tLoss: 0.298752\nTrain Epoch: 2 [39680/60000 (66%)]\tLoss: 0.427435\nTrain Epoch: 2 [40320/60000 (67%)]\tLoss: 0.374736\nTrain Epoch: 2 [40960/60000 (68%)]\tLoss: 0.246496\nTrain Epoch: 2 [41600/60000 (69%)]\tLoss: 0.662259\nTrain Epoch: 2 [42240/60000 (70%)]\tLoss: 0.497635\nTrain Epoch: 2 [42880/60000 (71%)]\tLoss: 0.237556\nTrain Epoch: 2 [43520/60000 (72%)]\tLoss: 0.194535\nTrain Epoch: 2 [44160/60000 (74%)]\tLoss: 0.258943\nTrain Epoch: 2 [44800/60000 (75%)]\tLoss: 0.437360\nTrain Epoch: 2 [45440/60000 (76%)]\tLoss: 0.355489\nTrain Epoch: 2 [46080/60000 (77%)]\tLoss: 0.335020\nTrain Epoch: 2 [46720/60000 (78%)]\tLoss: 0.565189\nTrain Epoch: 2 [47360/60000 (79%)]\tLoss: 0.430366\nTrain Epoch: 2 [48000/60000 (80%)]\tLoss: 0.266303\nTrain Epoch: 2 [48640/60000 (81%)]\tLoss: 0.172954\nTrain Epoch: 2 [49280/60000 (82%)]\tLoss: 0.245803\nTrain Epoch: 2 [49920/60000 (83%)]\tLoss: 0.426530\nTrain Epoch: 2 [50560/60000 (84%)]\tLoss: 0.468984\nTrain Epoch: 2 [51200/60000 (85%)]\tLoss: 0.370892\nTrain Epoch: 2 [51840/60000 (86%)]\tLoss: 0.300021\nTrain Epoch: 2 [52480/60000 (87%)]\tLoss: 0.392199\nTrain Epoch: 2 [53120/60000 (88%)]\tLoss: 0.510658\nTrain Epoch: 2 [53760/60000 (90%)]\tLoss: 0.376290\nTrain Epoch: 2 [54400/60000 (91%)]\tLoss: 0.273752\nTrain Epoch: 2 [55040/60000 (92%)]\tLoss: 0.234505\nTrain Epoch: 2 [55680/60000 (93%)]\tLoss: 0.610978\nTrain Epoch: 2 [56320/60000 (94%)]\tLoss: 0.154850\nTrain Epoch: 2 [56960/60000 (95%)]\tLoss: 0.374254\nTrain Epoch: 2 [57600/60000 (96%)]\tLoss: 0.292167\nTrain Epoch: 2 [58240/60000 (97%)]\tLoss: 0.478376\nTrain Epoch: 2 [58880/60000 (98%)]\tLoss: 0.303128\nTrain Epoch: 2 [59520/60000 (99%)]\tLoss: 0.376779\n\nTest set: Average loss: 0.1297, Accuracy: 9597/10000 (96%)\n\nTrain Epoch: 3 [0/60000 (0%)]\tLoss: 0.450588\nTrain Epoch: 3 [640/60000 (1%)]\tLoss: 0.361118\nTrain Epoch: 3 [1280/60000 (2%)]\tLoss: 0.374497\nTrain Epoch: 3 [1920/60000 (3%)]\tLoss: 0.312127\nTrain Epoch: 3 [2560/60000 (4%)]\tLoss: 0.353896\nTrain Epoch: 3 [3200/60000 (5%)]\tLoss: 0.320840\nTrain Epoch: 3 [3840/60000 (6%)]\tLoss: 0.218477\nTrain Epoch: 3 [4480/60000 (7%)]\tLoss: 0.295629\nTrain Epoch: 3 [5120/60000 (9%)]\tLoss: 0.339400\nTrain Epoch: 3 [5760/60000 (10%)]\tLoss: 0.170357\nTrain Epoch: 3 [6400/60000 (11%)]\tLoss: 0.416447\nTrain Epoch: 3 [7040/60000 (12%)]\tLoss: 0.320326\nTrain Epoch: 3 [7680/60000 (13%)]\tLoss: 0.318410\nTrain Epoch: 3 [8320/60000 (14%)]\tLoss: 0.384793\nTrain Epoch: 3 [8960/60000 (15%)]\tLoss: 0.343415\nTrain Epoch: 3 [9600/60000 (16%)]\tLoss: 0.284627\nTrain Epoch: 3 [10240/60000 (17%)]\tLoss: 0.151805\nTrain Epoch: 3 [10880/60000 (18%)]\tLoss: 0.401332\nTrain Epoch: 3 [11520/60000 (19%)]\tLoss: 0.253159\nTrain Epoch: 3 [12160/60000 (20%)]\tLoss: 0.339563\nTrain Epoch: 3 [12800/60000 (21%)]\tLoss: 0.237430\nTrain Epoch: 3 [13440/60000 (22%)]\tLoss: 0.311402\nTrain Epoch: 3 [14080/60000 (23%)]\tLoss: 0.241667\nTrain Epoch: 3 [14720/60000 (25%)]\tLoss: 0.265347\nTrain Epoch: 3 [15360/60000 (26%)]\tLoss: 0.367453\nTrain Epoch: 3 [16000/60000 (27%)]\tLoss: 0.190671\nTrain Epoch: 3 [16640/60000 (28%)]\tLoss: 0.313052\nTrain Epoch: 3 [17280/60000 (29%)]\tLoss: 0.368028\nTrain Epoch: 3 [17920/60000 (30%)]\tLoss: 0.268639\nTrain Epoch: 3 [18560/60000 (31%)]\tLoss: 0.341066\nTrain Epoch: 3 [19200/60000 (32%)]\tLoss: 0.457961\nTrain Epoch: 3 [19840/60000 (33%)]\tLoss: 0.732400\nTrain Epoch: 3 [20480/60000 (34%)]\tLoss: 0.330679\nTrain Epoch: 3 [21120/60000 (35%)]\tLoss: 0.279778\nTrain Epoch: 3 [21760/60000 (36%)]\tLoss: 0.305972\nTrain Epoch: 3 [22400/60000 (37%)]\tLoss: 0.402131\nTrain Epoch: 3 [23040/60000 (38%)]\tLoss: 0.345302\nTrain Epoch: 3 [23680/60000 (39%)]\tLoss: 0.251726\nTrain Epoch: 3 [24320/60000 (41%)]\tLoss: 0.152062\nTrain Epoch: 3 [24960/60000 (42%)]\tLoss: 0.149305\nTrain Epoch: 3 [25600/60000 (43%)]\tLoss: 0.364678\nTrain Epoch: 3 [26240/60000 (44%)]\tLoss: 0.067165\nTrain Epoch: 3 [26880/60000 (45%)]\tLoss: 0.229927\nTrain Epoch: 3 [27520/60000 (46%)]\tLoss: 0.236894\nTrain Epoch: 3 [28160/60000 (47%)]\tLoss: 0.486373\nTrain Epoch: 3 [28800/60000 (48%)]\tLoss: 0.453053\nTrain Epoch: 3 [29440/60000 (49%)]\tLoss: 0.283823\nTrain Epoch: 3 [30080/60000 (50%)]\tLoss: 0.185119\nTrain Epoch: 3 [30720/60000 (51%)]\tLoss: 0.381274\nTrain Epoch: 3 [31360/60000 (52%)]\tLoss: 0.394533\nTrain Epoch: 3 [32000/60000 (53%)]\tLoss: 0.392791\nTrain Epoch: 3 [32640/60000 (54%)]\tLoss: 0.230672\nTrain Epoch: 3 [33280/60000 (55%)]\tLoss: 0.393846\nTrain Epoch: 3 [33920/60000 (57%)]\tLoss: 0.676802\nTrain Epoch: 3 [34560/60000 (58%)]\tLoss: 0.160434\nTrain Epoch: 3 [35200/60000 (59%)]\tLoss: 0.211318\nTrain Epoch: 3 [35840/60000 (60%)]\tLoss: 0.245763\nTrain Epoch: 3 [36480/60000 (61%)]\tLoss: 0.198454\nTrain Epoch: 3 [37120/60000 (62%)]\tLoss: 0.243536\nTrain Epoch: 3 [37760/60000 (63%)]\tLoss: 0.151804\nTrain Epoch: 3 [38400/60000 (64%)]\tLoss: 0.176093\nTrain Epoch: 3 [39040/60000 (65%)]\tLoss: 0.237228\nTrain Epoch: 3 [39680/60000 (66%)]\tLoss: 0.146441\nTrain Epoch: 3 [40320/60000 (67%)]\tLoss: 0.345162\nTrain Epoch: 3 [40960/60000 (68%)]\tLoss: 0.400378\nTrain Epoch: 3 [41600/60000 (69%)]\tLoss: 0.259152\nTrain Epoch: 3 [42240/60000 (70%)]\tLoss: 0.569659\nTrain Epoch: 3 [42880/60000 (71%)]\tLoss: 0.166401\nTrain Epoch: 3 [43520/60000 (72%)]\tLoss: 0.220592\nTrain Epoch: 3 [44160/60000 (74%)]\tLoss: 0.303227\nTrain Epoch: 3 [44800/60000 (75%)]\tLoss: 0.193691\nTrain Epoch: 3 [45440/60000 (76%)]\tLoss: 0.257408\nTrain Epoch: 3 [46080/60000 (77%)]\tLoss: 0.391211\nTrain Epoch: 3 [46720/60000 (78%)]\tLoss: 0.419841\nTrain Epoch: 3 [47360/60000 (79%)]\tLoss: 0.121861\nTrain Epoch: 3 [48000/60000 (80%)]\tLoss: 0.176442\nTrain Epoch: 3 [48640/60000 (81%)]\tLoss: 0.534631\nTrain Epoch: 3 [49280/60000 (82%)]\tLoss: 0.296596\nTrain Epoch: 3 [49920/60000 (83%)]\tLoss: 0.190096\nTrain Epoch: 3 [50560/60000 (84%)]\tLoss: 0.360826\nTrain Epoch: 3 [51200/60000 (85%)]\tLoss: 0.427482\nTrain Epoch: 3 [51840/60000 (86%)]\tLoss: 0.251076\nTrain Epoch: 3 [52480/60000 (87%)]\tLoss: 0.319904\nTrain Epoch: 3 [53120/60000 (88%)]\tLoss: 0.228778\nTrain Epoch: 3 [53760/60000 (90%)]\tLoss: 0.180340\nTrain Epoch: 3 [54400/60000 (91%)]\tLoss: 0.236512\nTrain Epoch: 3 [55040/60000 (92%)]\tLoss: 0.206779\nTrain Epoch: 3 [55680/60000 (93%)]\tLoss: 0.323677\nTrain Epoch: 3 [56320/60000 (94%)]\tLoss: 0.406382\nTrain Epoch: 3 [56960/60000 (95%)]\tLoss: 0.426768\nTrain Epoch: 3 [57600/60000 (96%)]\tLoss: 0.595419\nTrain Epoch: 3 [58240/60000 (97%)]\tLoss: 0.175457\nTrain Epoch: 3 [58880/60000 (98%)]\tLoss: 0.301019\nTrain Epoch: 3 [59520/60000 (99%)]\tLoss: 0.419139\n\nTest set: Average loss: 0.1049, Accuracy: 9686/10000 (97%)\n\nTrain Epoch: 4 [0/60000 (0%)]\tLoss: 0.352631\nTrain Epoch: 4 [640/60000 (1%)]\tLoss: 0.343671\nTrain Epoch: 4 [1280/60000 (2%)]\tLoss: 0.170439\nTrain Epoch: 4 [1920/60000 (3%)]\tLoss: 0.289486\nTrain Epoch: 4 [2560/60000 (4%)]\tLoss: 0.096597\nTrain Epoch: 4 [3200/60000 (5%)]\tLoss: 0.263759\nTrain Epoch: 4 [3840/60000 (6%)]\tLoss: 0.369941\nTrain Epoch: 4 [4480/60000 (7%)]\tLoss: 0.326594\nTrain Epoch: 4 [5120/60000 (9%)]\tLoss: 0.174094\nTrain Epoch: 4 [5760/60000 (10%)]\tLoss: 0.442069\nTrain Epoch: 4 [6400/60000 (11%)]\tLoss: 0.179002\nTrain Epoch: 4 [7040/60000 (12%)]\tLoss: 0.292742\nTrain Epoch: 4 [7680/60000 (13%)]\tLoss: 0.209898\nTrain Epoch: 4 [8320/60000 (14%)]\tLoss: 0.401671\nTrain Epoch: 4 [8960/60000 (15%)]\tLoss: 0.205146\nTrain Epoch: 4 [9600/60000 (16%)]\tLoss: 0.250836\nTrain Epoch: 4 [10240/60000 (17%)]\tLoss: 0.156622\nTrain Epoch: 4 [10880/60000 (18%)]\tLoss: 0.214578\nTrain Epoch: 4 [11520/60000 (19%)]\tLoss: 0.155916\nTrain Epoch: 4 [12160/60000 (20%)]\tLoss: 0.416294\nTrain Epoch: 4 [12800/60000 (21%)]\tLoss: 0.197429\nTrain Epoch: 4 [13440/60000 (22%)]\tLoss: 0.154103\nTrain Epoch: 4 [14080/60000 (23%)]\tLoss: 0.377950\nTrain Epoch: 4 [14720/60000 (25%)]\tLoss: 0.338084\nTrain Epoch: 4 [15360/60000 (26%)]\tLoss: 0.242834\nTrain Epoch: 4 [16000/60000 (27%)]\tLoss: 0.139219\nTrain Epoch: 4 [16640/60000 (28%)]\tLoss: 0.242067\nTrain Epoch: 4 [17280/60000 (29%)]\tLoss: 0.189929\nTrain Epoch: 4 [17920/60000 (30%)]\tLoss: 0.358215\nTrain Epoch: 4 [18560/60000 (31%)]\tLoss: 0.354969\nTrain Epoch: 4 [19200/60000 (32%)]\tLoss: 0.303644\nTrain Epoch: 4 [19840/60000 (33%)]\tLoss: 0.322343\nTrain Epoch: 4 [20480/60000 (34%)]\tLoss: 0.225422\nTrain Epoch: 4 [21120/60000 (35%)]\tLoss: 0.614347\nTrain Epoch: 4 [21760/60000 (36%)]\tLoss: 0.448674\nTrain Epoch: 4 [22400/60000 (37%)]\tLoss: 0.362976\nTrain Epoch: 4 [23040/60000 (38%)]\tLoss: 0.100357\nTrain Epoch: 4 [23680/60000 (39%)]\tLoss: 0.289331\nTrain Epoch: 4 [24320/60000 (41%)]\tLoss: 0.405818\nTrain Epoch: 4 [24960/60000 (42%)]\tLoss: 0.212617\nTrain Epoch: 4 [25600/60000 (43%)]\tLoss: 0.348597\nTrain Epoch: 4 [26240/60000 (44%)]\tLoss: 0.351009\nTrain Epoch: 4 [26880/60000 (45%)]\tLoss: 0.341456\nTrain Epoch: 4 [27520/60000 (46%)]\tLoss: 0.297527\nTrain Epoch: 4 [28160/60000 (47%)]\tLoss: 0.281190\nTrain Epoch: 4 [28800/60000 (48%)]\tLoss: 0.187359\nTrain Epoch: 4 [29440/60000 (49%)]\tLoss: 0.178844\nTrain Epoch: 4 [30080/60000 (50%)]\tLoss: 0.201243\nTrain Epoch: 4 [30720/60000 (51%)]\tLoss: 0.305701\nTrain Epoch: 4 [31360/60000 (52%)]\tLoss: 0.370592\nTrain Epoch: 4 [32000/60000 (53%)]\tLoss: 0.241955\nTrain Epoch: 4 [32640/60000 (54%)]\tLoss: 0.278765\nTrain Epoch: 4 [33280/60000 (55%)]\tLoss: 0.284302\nTrain Epoch: 4 [33920/60000 (57%)]\tLoss: 0.337426\nTrain Epoch: 4 [34560/60000 (58%)]\tLoss: 0.277304\nTrain Epoch: 4 [35200/60000 (59%)]\tLoss: 0.221228\nTrain Epoch: 4 [35840/60000 (60%)]\tLoss: 0.150985\nTrain Epoch: 4 [36480/60000 (61%)]\tLoss: 0.312087\nTrain Epoch: 4 [37120/60000 (62%)]\tLoss: 0.170111\nTrain Epoch: 4 [37760/60000 (63%)]\tLoss: 0.291135\nTrain Epoch: 4 [38400/60000 (64%)]\tLoss: 0.160971\nTrain Epoch: 4 [39040/60000 (65%)]\tLoss: 0.390679\nTrain Epoch: 4 [39680/60000 (66%)]\tLoss: 0.434802\nTrain Epoch: 4 [40320/60000 (67%)]\tLoss: 0.281539\nTrain Epoch: 4 [40960/60000 (68%)]\tLoss: 0.172577\nTrain Epoch: 4 [41600/60000 (69%)]\tLoss: 0.348624\nTrain Epoch: 4 [42240/60000 (70%)]\tLoss: 0.380416\nTrain Epoch: 4 [42880/60000 (71%)]\tLoss: 0.483520\nTrain Epoch: 4 [43520/60000 (72%)]\tLoss: 0.216825\nTrain Epoch: 4 [44160/60000 (74%)]\tLoss: 0.320874\nTrain Epoch: 4 [44800/60000 (75%)]\tLoss: 0.213358\nTrain Epoch: 4 [45440/60000 (76%)]\tLoss: 0.218650\nTrain Epoch: 4 [46080/60000 (77%)]\tLoss: 0.221090\nTrain Epoch: 4 [46720/60000 (78%)]\tLoss: 0.325981\nTrain Epoch: 4 [47360/60000 (79%)]\tLoss: 0.283184\nTrain Epoch: 4 [48000/60000 (80%)]\tLoss: 0.072845\nTrain Epoch: 4 [48640/60000 (81%)]\tLoss: 0.206940\nTrain Epoch: 4 [49280/60000 (82%)]\tLoss: 0.423454\nTrain Epoch: 4 [49920/60000 (83%)]\tLoss: 0.475285\nTrain Epoch: 4 [50560/60000 (84%)]\tLoss: 0.128978\nTrain Epoch: 4 [51200/60000 (85%)]\tLoss: 0.195609\nTrain Epoch: 4 [51840/60000 (86%)]\tLoss: 0.125730\nTrain Epoch: 4 [52480/60000 (87%)]\tLoss: 0.137783\nTrain Epoch: 4 [53120/60000 (88%)]\tLoss: 0.375247\nTrain Epoch: 4 [53760/60000 (90%)]\tLoss: 0.243497\nTrain Epoch: 4 [54400/60000 (91%)]\tLoss: 0.236100\nTrain Epoch: 4 [55040/60000 (92%)]\tLoss: 0.266795\nTrain Epoch: 4 [55680/60000 (93%)]\tLoss: 0.229095\nTrain Epoch: 4 [56320/60000 (94%)]\tLoss: 0.167610\nTrain Epoch: 4 [56960/60000 (95%)]\tLoss: 0.240640\nTrain Epoch: 4 [57600/60000 (96%)]\tLoss: 0.153999\nTrain Epoch: 4 [58240/60000 (97%)]\tLoss: 0.753790\nTrain Epoch: 4 [58880/60000 (98%)]\tLoss: 0.143998\nTrain Epoch: 4 [59520/60000 (99%)]\tLoss: 0.310583\n\nTest set: Average loss: 0.0843, Accuracy: 9739/10000 (97%)\n\nTrain Epoch: 5 [0/60000 (0%)]\tLoss: 0.227892\nTrain Epoch: 5 [640/60000 (1%)]\tLoss: 0.162702\nTrain Epoch: 5 [1280/60000 (2%)]\tLoss: 0.227571\nTrain Epoch: 5 [1920/60000 (3%)]\tLoss: 0.148511\nTrain Epoch: 5 [2560/60000 (4%)]\tLoss: 0.187414\nTrain Epoch: 5 [3200/60000 (5%)]\tLoss: 0.194418\nTrain Epoch: 5 [3840/60000 (6%)]\tLoss: 0.276495\nTrain Epoch: 5 [4480/60000 (7%)]\tLoss: 0.268769\nTrain Epoch: 5 [5120/60000 (9%)]\tLoss: 0.163968\nTrain Epoch: 5 [5760/60000 (10%)]\tLoss: 0.349296\nTrain Epoch: 5 [6400/60000 (11%)]\tLoss: 0.217248\nTrain Epoch: 5 [7040/60000 (12%)]\tLoss: 0.195263\nTrain Epoch: 5 [7680/60000 (13%)]\tLoss: 0.339447\nTrain Epoch: 5 [8320/60000 (14%)]\tLoss: 0.224461\nTrain Epoch: 5 [8960/60000 (15%)]\tLoss: 0.095605\nTrain Epoch: 5 [9600/60000 (16%)]\tLoss: 0.196891\nTrain Epoch: 5 [10240/60000 (17%)]\tLoss: 0.218742\nTrain Epoch: 5 [10880/60000 (18%)]\tLoss: 0.071347\nTrain Epoch: 5 [11520/60000 (19%)]\tLoss: 0.403286\nTrain Epoch: 5 [12160/60000 (20%)]\tLoss: 0.149740\nTrain Epoch: 5 [12800/60000 (21%)]\tLoss: 0.160939\nTrain Epoch: 5 [13440/60000 (22%)]\tLoss: 0.236512\nTrain Epoch: 5 [14080/60000 (23%)]\tLoss: 0.348727\nTrain Epoch: 5 [14720/60000 (25%)]\tLoss: 0.190054\nTrain Epoch: 5 [15360/60000 (26%)]\tLoss: 0.272029\nTrain Epoch: 5 [16000/60000 (27%)]\tLoss: 0.427739\nTrain Epoch: 5 [16640/60000 (28%)]\tLoss: 0.322332\nTrain Epoch: 5 [17280/60000 (29%)]\tLoss: 0.141410\nTrain Epoch: 5 [17920/60000 (30%)]\tLoss: 0.098900\nTrain Epoch: 5 [18560/60000 (31%)]\tLoss: 0.252387\nTrain Epoch: 5 [19200/60000 (32%)]\tLoss: 0.182150\nTrain Epoch: 5 [19840/60000 (33%)]\tLoss: 0.133239\nTrain Epoch: 5 [20480/60000 (34%)]\tLoss: 0.126683\nTrain Epoch: 5 [21120/60000 (35%)]\tLoss: 0.370189\nTrain Epoch: 5 [21760/60000 (36%)]\tLoss: 0.162514\nTrain Epoch: 5 [22400/60000 (37%)]\tLoss: 0.272352\nTrain Epoch: 5 [23040/60000 (38%)]\tLoss: 0.298543\nTrain Epoch: 5 [23680/60000 (39%)]\tLoss: 0.235891\nTrain Epoch: 5 [24320/60000 (41%)]\tLoss: 0.187710\nTrain Epoch: 5 [24960/60000 (42%)]\tLoss: 0.185363\nTrain Epoch: 5 [25600/60000 (43%)]\tLoss: 0.193369\nTrain Epoch: 5 [26240/60000 (44%)]\tLoss: 0.155984\nTrain Epoch: 5 [26880/60000 (45%)]\tLoss: 0.388923\nTrain Epoch: 5 [27520/60000 (46%)]\tLoss: 0.192868\nTrain Epoch: 5 [28160/60000 (47%)]\tLoss: 0.535787\nTrain Epoch: 5 [28800/60000 (48%)]\tLoss: 0.161020\nTrain Epoch: 5 [29440/60000 (49%)]\tLoss: 0.242179\nTrain Epoch: 5 [30080/60000 (50%)]\tLoss: 0.136554\nTrain Epoch: 5 [30720/60000 (51%)]\tLoss: 0.190672\nTrain Epoch: 5 [31360/60000 (52%)]\tLoss: 0.118027\nTrain Epoch: 5 [32000/60000 (53%)]\tLoss: 0.278750\nTrain Epoch: 5 [32640/60000 (54%)]\tLoss: 0.418058\nTrain Epoch: 5 [33280/60000 (55%)]\tLoss: 0.287063\nTrain Epoch: 5 [33920/60000 (57%)]\tLoss: 0.279596\nTrain Epoch: 5 [34560/60000 (58%)]\tLoss: 0.181579\nTrain Epoch: 5 [35200/60000 (59%)]\tLoss: 0.443592\nTrain Epoch: 5 [35840/60000 (60%)]\tLoss: 0.095470\nTrain Epoch: 5 [36480/60000 (61%)]\tLoss: 0.277385\nTrain Epoch: 5 [37120/60000 (62%)]\tLoss: 0.263358\nTrain Epoch: 5 [37760/60000 (63%)]\tLoss: 0.190867\nTrain Epoch: 5 [38400/60000 (64%)]\tLoss: 0.176580\nTrain Epoch: 5 [39040/60000 (65%)]\tLoss: 0.360235\nTrain Epoch: 5 [39680/60000 (66%)]\tLoss: 0.172416\nTrain Epoch: 5 [40320/60000 (67%)]\tLoss: 0.174126\nTrain Epoch: 5 [40960/60000 (68%)]\tLoss: 0.202162\nTrain Epoch: 5 [41600/60000 (69%)]\tLoss: 0.196991\nTrain Epoch: 5 [42240/60000 (70%)]\tLoss: 0.224622\nTrain Epoch: 5 [42880/60000 (71%)]\tLoss: 0.180406\nTrain Epoch: 5 [43520/60000 (72%)]\tLoss: 0.060447\nTrain Epoch: 5 [44160/60000 (74%)]\tLoss: 0.322497\nTrain Epoch: 5 [44800/60000 (75%)]\tLoss: 0.239324\nTrain Epoch: 5 [45440/60000 (76%)]\tLoss: 0.348920\nTrain Epoch: 5 [46080/60000 (77%)]\tLoss: 0.240017\nTrain Epoch: 5 [46720/60000 (78%)]\tLoss: 0.237575\nTrain Epoch: 5 [47360/60000 (79%)]\tLoss: 0.142648\nTrain Epoch: 5 [48000/60000 (80%)]\tLoss: 0.227562\nTrain Epoch: 5 [48640/60000 (81%)]\tLoss: 0.254358\nTrain Epoch: 5 [49280/60000 (82%)]\tLoss: 0.135818\nTrain Epoch: 5 [49920/60000 (83%)]\tLoss: 0.386120\nTrain Epoch: 5 [50560/60000 (84%)]\tLoss: 0.328150\nTrain Epoch: 5 [51200/60000 (85%)]\tLoss: 0.276833\nTrain Epoch: 5 [51840/60000 (86%)]\tLoss: 0.308869\nTrain Epoch: 5 [52480/60000 (87%)]\tLoss: 0.246442\nTrain Epoch: 5 [53120/60000 (88%)]\tLoss: 0.240874\nTrain Epoch: 5 [53760/60000 (90%)]\tLoss: 0.114337\nTrain Epoch: 5 [54400/60000 (91%)]\tLoss: 0.217325\nTrain Epoch: 5 [55040/60000 (92%)]\tLoss: 0.223010\nTrain Epoch: 5 [55680/60000 (93%)]\tLoss: 0.138459\nTrain Epoch: 5 [56320/60000 (94%)]\tLoss: 0.283678\nTrain Epoch: 5 [56960/60000 (95%)]\tLoss: 0.158834\nTrain Epoch: 5 [57600/60000 (96%)]\tLoss: 0.164267\nTrain Epoch: 5 [58240/60000 (97%)]\tLoss: 0.290795\nTrain Epoch: 5 [58880/60000 (98%)]\tLoss: 0.451639\nTrain Epoch: 5 [59520/60000 (99%)]\tLoss: 0.349018\n\nTest set: Average loss: 0.0797, Accuracy: 9758/10000 (98%)\n\nTrain Epoch: 6 [0/60000 (0%)]\tLoss: 0.311334\nTrain Epoch: 6 [640/60000 (1%)]\tLoss: 0.129143\nTrain Epoch: 6 [1280/60000 (2%)]\tLoss: 0.227222\nTrain Epoch: 6 [1920/60000 (3%)]\tLoss: 0.157591\nTrain Epoch: 6 [2560/60000 (4%)]\tLoss: 0.205490\nTrain Epoch: 6 [3200/60000 (5%)]\tLoss: 0.421089\nTrain Epoch: 6 [3840/60000 (6%)]\tLoss: 0.157544\nTrain Epoch: 6 [4480/60000 (7%)]\tLoss: 0.087023\nTrain Epoch: 6 [5120/60000 (9%)]\tLoss: 0.130669\nTrain Epoch: 6 [5760/60000 (10%)]\tLoss: 0.059450\nTrain Epoch: 6 [6400/60000 (11%)]\tLoss: 0.121786\nTrain Epoch: 6 [7040/60000 (12%)]\tLoss: 0.177859\nTrain Epoch: 6 [7680/60000 (13%)]\tLoss: 0.217464\nTrain Epoch: 6 [8320/60000 (14%)]\tLoss: 0.183426\nTrain Epoch: 6 [8960/60000 (15%)]\tLoss: 0.237282\nTrain Epoch: 6 [9600/60000 (16%)]\tLoss: 0.210031\nTrain Epoch: 6 [10240/60000 (17%)]\tLoss: 0.256110\nTrain Epoch: 6 [10880/60000 (18%)]\tLoss: 0.155481\nTrain Epoch: 6 [11520/60000 (19%)]\tLoss: 0.166967\nTrain Epoch: 6 [12160/60000 (20%)]\tLoss: 0.144590\nTrain Epoch: 6 [12800/60000 (21%)]\tLoss: 0.229593\nTrain Epoch: 6 [13440/60000 (22%)]\tLoss: 0.092102\nTrain Epoch: 6 [14080/60000 (23%)]\tLoss: 0.144247\nTrain Epoch: 6 [14720/60000 (25%)]\tLoss: 0.459083\nTrain Epoch: 6 [15360/60000 (26%)]\tLoss: 0.174974\nTrain Epoch: 6 [16000/60000 (27%)]\tLoss: 0.146433\nTrain Epoch: 6 [16640/60000 (28%)]\tLoss: 0.291392\nTrain Epoch: 6 [17280/60000 (29%)]\tLoss: 0.203127\nTrain Epoch: 6 [17920/60000 (30%)]\tLoss: 0.255063\nTrain Epoch: 6 [18560/60000 (31%)]\tLoss: 0.167576\nTrain Epoch: 6 [19200/60000 (32%)]\tLoss: 0.171914\nTrain Epoch: 6 [19840/60000 (33%)]\tLoss: 0.215950\nTrain Epoch: 6 [20480/60000 (34%)]\tLoss: 0.246624\nTrain Epoch: 6 [21120/60000 (35%)]\tLoss: 0.242730\nTrain Epoch: 6 [21760/60000 (36%)]\tLoss: 0.345666\nTrain Epoch: 6 [22400/60000 (37%)]\tLoss: 0.229078\nTrain Epoch: 6 [23040/60000 (38%)]\tLoss: 0.283169\nTrain Epoch: 6 [23680/60000 (39%)]\tLoss: 0.246430\nTrain Epoch: 6 [24320/60000 (41%)]\tLoss: 0.217211\nTrain Epoch: 6 [24960/60000 (42%)]\tLoss: 0.168141\nTrain Epoch: 6 [25600/60000 (43%)]\tLoss: 0.297715\nTrain Epoch: 6 [26240/60000 (44%)]\tLoss: 0.200130\nTrain Epoch: 6 [26880/60000 (45%)]\tLoss: 0.344390\nTrain Epoch: 6 [27520/60000 (46%)]\tLoss: 0.246202\nTrain Epoch: 6 [28160/60000 (47%)]\tLoss: 0.272422\nTrain Epoch: 6 [28800/60000 (48%)]\tLoss: 0.117001\nTrain Epoch: 6 [29440/60000 (49%)]\tLoss: 0.246031\nTrain Epoch: 6 [30080/60000 (50%)]\tLoss: 0.138119\nTrain Epoch: 6 [30720/60000 (51%)]\tLoss: 0.214345\nTrain Epoch: 6 [31360/60000 (52%)]\tLoss: 0.134483\nTrain Epoch: 6 [32000/60000 (53%)]\tLoss: 0.201771\nTrain Epoch: 6 [32640/60000 (54%)]\tLoss: 0.201668\nTrain Epoch: 6 [33280/60000 (55%)]\tLoss: 0.111183\nTrain Epoch: 6 [33920/60000 (57%)]\tLoss: 0.093289\nTrain Epoch: 6 [34560/60000 (58%)]\tLoss: 0.171475\nTrain Epoch: 6 [35200/60000 (59%)]\tLoss: 0.178729\nTrain Epoch: 6 [35840/60000 (60%)]\tLoss: 0.144986\nTrain Epoch: 6 [36480/60000 (61%)]\tLoss: 0.302206\nTrain Epoch: 6 [37120/60000 (62%)]\tLoss: 0.389723\nTrain Epoch: 6 [37760/60000 (63%)]\tLoss: 0.268302\nTrain Epoch: 6 [38400/60000 (64%)]\tLoss: 0.358240\nTrain Epoch: 6 [39040/60000 (65%)]\tLoss: 0.241359\nTrain Epoch: 6 [39680/60000 (66%)]\tLoss: 0.282464\nTrain Epoch: 6 [40320/60000 (67%)]\tLoss: 0.205064\nTrain Epoch: 6 [40960/60000 (68%)]\tLoss: 0.106739\nTrain Epoch: 6 [41600/60000 (69%)]\tLoss: 0.076333\nTrain Epoch: 6 [42240/60000 (70%)]\tLoss: 0.157558\nTrain Epoch: 6 [42880/60000 (71%)]\tLoss: 0.217494\nTrain Epoch: 6 [43520/60000 (72%)]\tLoss: 0.183687\nTrain Epoch: 6 [44160/60000 (74%)]\tLoss: 0.217155\nTrain Epoch: 6 [44800/60000 (75%)]\tLoss: 0.108482\nTrain Epoch: 6 [45440/60000 (76%)]\tLoss: 0.324247\nTrain Epoch: 6 [46080/60000 (77%)]\tLoss: 0.352494\nTrain Epoch: 6 [46720/60000 (78%)]\tLoss: 0.163462\nTrain Epoch: 6 [47360/60000 (79%)]\tLoss: 0.154820\nTrain Epoch: 6 [48000/60000 (80%)]\tLoss: 0.174164\nTrain Epoch: 6 [48640/60000 (81%)]\tLoss: 0.196258\nTrain Epoch: 6 [49280/60000 (82%)]\tLoss: 0.226030\nTrain Epoch: 6 [49920/60000 (83%)]\tLoss: 0.306971\nTrain Epoch: 6 [50560/60000 (84%)]\tLoss: 0.387282\nTrain Epoch: 6 [51200/60000 (85%)]\tLoss: 0.213550\nTrain Epoch: 6 [51840/60000 (86%)]\tLoss: 0.133755\nTrain Epoch: 6 [52480/60000 (87%)]\tLoss: 0.176044\nTrain Epoch: 6 [53120/60000 (88%)]\tLoss: 0.282900\nTrain Epoch: 6 [53760/60000 (90%)]\tLoss: 0.154157\nTrain Epoch: 6 [54400/60000 (91%)]\tLoss: 0.138895\nTrain Epoch: 6 [55040/60000 (92%)]\tLoss: 0.254137\nTrain Epoch: 6 [55680/60000 (93%)]\tLoss: 0.107765\nTrain Epoch: 6 [56320/60000 (94%)]\tLoss: 0.118788\nTrain Epoch: 6 [56960/60000 (95%)]\tLoss: 0.142051\nTrain Epoch: 6 [57600/60000 (96%)]\tLoss: 0.176375\nTrain Epoch: 6 [58240/60000 (97%)]\tLoss: 0.131573\nTrain Epoch: 6 [58880/60000 (98%)]\tLoss: 0.347166\nTrain Epoch: 6 [59520/60000 (99%)]\tLoss: 0.217951\n\nTest set: Average loss: 0.0690, Accuracy: 9776/10000 (98%)\n\nTrain Epoch: 7 [0/60000 (0%)]\tLoss: 0.142441\nTrain Epoch: 7 [640/60000 (1%)]\tLoss: 0.078599\nTrain Epoch: 7 [1280/60000 (2%)]\tLoss: 0.121731\nTrain Epoch: 7 [1920/60000 (3%)]\tLoss: 0.070044\nTrain Epoch: 7 [2560/60000 (4%)]\tLoss: 0.224216\nTrain Epoch: 7 [3200/60000 (5%)]\tLoss: 0.104122\nTrain Epoch: 7 [3840/60000 (6%)]\tLoss: 0.228575\nTrain Epoch: 7 [4480/60000 (7%)]\tLoss: 0.377044\nTrain Epoch: 7 [5120/60000 (9%)]\tLoss: 0.296184\nTrain Epoch: 7 [5760/60000 (10%)]\tLoss: 0.099891\nTrain Epoch: 7 [6400/60000 (11%)]\tLoss: 0.269691\nTrain Epoch: 7 [7040/60000 (12%)]\tLoss: 0.240640\nTrain Epoch: 7 [7680/60000 (13%)]\tLoss: 0.171192\nTrain Epoch: 7 [8320/60000 (14%)]\tLoss: 0.306889\nTrain Epoch: 7 [8960/60000 (15%)]\tLoss: 0.238503\nTrain Epoch: 7 [9600/60000 (16%)]\tLoss: 0.286252\nTrain Epoch: 7 [10240/60000 (17%)]\tLoss: 0.171058\nTrain Epoch: 7 [10880/60000 (18%)]\tLoss: 0.208866\nTrain Epoch: 7 [11520/60000 (19%)]\tLoss: 0.418091\nTrain Epoch: 7 [12160/60000 (20%)]\tLoss: 0.115058\nTrain Epoch: 7 [12800/60000 (21%)]\tLoss: 0.159557\nTrain Epoch: 7 [13440/60000 (22%)]\tLoss: 0.085076\nTrain Epoch: 7 [14080/60000 (23%)]\tLoss: 0.244673\nTrain Epoch: 7 [14720/60000 (25%)]\tLoss: 0.316326\nTrain Epoch: 7 [15360/60000 (26%)]\tLoss: 0.370775\nTrain Epoch: 7 [16000/60000 (27%)]\tLoss: 0.235262\nTrain Epoch: 7 [16640/60000 (28%)]\tLoss: 0.296188\nTrain Epoch: 7 [17280/60000 (29%)]\tLoss: 0.224960\nTrain Epoch: 7 [17920/60000 (30%)]\tLoss: 0.162341\nTrain Epoch: 7 [18560/60000 (31%)]\tLoss: 0.136551\nTrain Epoch: 7 [19200/60000 (32%)]\tLoss: 0.111435\nTrain Epoch: 7 [19840/60000 (33%)]\tLoss: 0.173483\nTrain Epoch: 7 [20480/60000 (34%)]\tLoss: 0.170351\nTrain Epoch: 7 [21120/60000 (35%)]\tLoss: 0.109828\nTrain Epoch: 7 [21760/60000 (36%)]\tLoss: 0.219692\nTrain Epoch: 7 [22400/60000 (37%)]\tLoss: 0.085780\nTrain Epoch: 7 [23040/60000 (38%)]\tLoss: 0.076800\nTrain Epoch: 7 [23680/60000 (39%)]\tLoss: 0.163377\nTrain Epoch: 7 [24320/60000 (41%)]\tLoss: 0.178391\nTrain Epoch: 7 [24960/60000 (42%)]\tLoss: 0.311988\nTrain Epoch: 7 [25600/60000 (43%)]\tLoss: 0.215559\nTrain Epoch: 7 [26240/60000 (44%)]\tLoss: 0.199207\nTrain Epoch: 7 [26880/60000 (45%)]\tLoss: 0.201917\nTrain Epoch: 7 [27520/60000 (46%)]\tLoss: 0.163283\nTrain Epoch: 7 [28160/60000 (47%)]\tLoss: 0.107533\nTrain Epoch: 7 [28800/60000 (48%)]\tLoss: 0.046209\nTrain Epoch: 7 [29440/60000 (49%)]\tLoss: 0.173062\nTrain Epoch: 7 [30080/60000 (50%)]\tLoss: 0.088925\nTrain Epoch: 7 [30720/60000 (51%)]\tLoss: 0.068962\nTrain Epoch: 7 [31360/60000 (52%)]\tLoss: 0.223214\nTrain Epoch: 7 [32000/60000 (53%)]\tLoss: 0.096083\nTrain Epoch: 7 [32640/60000 (54%)]\tLoss: 0.327635\nTrain Epoch: 7 [33280/60000 (55%)]\tLoss: 0.278620\nTrain Epoch: 7 [33920/60000 (57%)]\tLoss: 0.223806\nTrain Epoch: 7 [34560/60000 (58%)]\tLoss: 0.121638\nTrain Epoch: 7 [35200/60000 (59%)]\tLoss: 0.182739\nTrain Epoch: 7 [35840/60000 (60%)]\tLoss: 0.172866\nTrain Epoch: 7 [36480/60000 (61%)]\tLoss: 0.180873\nTrain Epoch: 7 [37120/60000 (62%)]\tLoss: 0.298984\nTrain Epoch: 7 [37760/60000 (63%)]\tLoss: 0.251939\nTrain Epoch: 7 [38400/60000 (64%)]\tLoss: 0.105321\nTrain Epoch: 7 [39040/60000 (65%)]\tLoss: 0.200500\nTrain Epoch: 7 [39680/60000 (66%)]\tLoss: 0.309791\nTrain Epoch: 7 [40320/60000 (67%)]\tLoss: 0.114949\nTrain Epoch: 7 [40960/60000 (68%)]\tLoss: 0.066153\nTrain Epoch: 7 [41600/60000 (69%)]\tLoss: 0.327437\nTrain Epoch: 7 [42240/60000 (70%)]\tLoss: 0.179023\nTrain Epoch: 7 [42880/60000 (71%)]\tLoss: 0.089861\nTrain Epoch: 7 [43520/60000 (72%)]\tLoss: 0.111230\nTrain Epoch: 7 [44160/60000 (74%)]\tLoss: 0.108233\nTrain Epoch: 7 [44800/60000 (75%)]\tLoss: 0.145669\nTrain Epoch: 7 [45440/60000 (76%)]\tLoss: 0.122024\nTrain Epoch: 7 [46080/60000 (77%)]\tLoss: 0.083490\nTrain Epoch: 7 [46720/60000 (78%)]\tLoss: 0.116002\nTrain Epoch: 7 [47360/60000 (79%)]\tLoss: 0.200240\nTrain Epoch: 7 [48000/60000 (80%)]\tLoss: 0.363707\nTrain Epoch: 7 [48640/60000 (81%)]\tLoss: 0.294594\nTrain Epoch: 7 [49280/60000 (82%)]\tLoss: 0.127643\nTrain Epoch: 7 [49920/60000 (83%)]\tLoss: 0.202008\nTrain Epoch: 7 [50560/60000 (84%)]\tLoss: 0.159551\nTrain Epoch: 7 [51200/60000 (85%)]\tLoss: 0.221197\nTrain Epoch: 7 [51840/60000 (86%)]\tLoss: 0.266463\nTrain Epoch: 7 [52480/60000 (87%)]\tLoss: 0.073967\nTrain Epoch: 7 [53120/60000 (88%)]\tLoss: 0.350092\nTrain Epoch: 7 [53760/60000 (90%)]\tLoss: 0.106500\nTrain Epoch: 7 [54400/60000 (91%)]\tLoss: 0.208859\nTrain Epoch: 7 [55040/60000 (92%)]\tLoss: 0.209937\nTrain Epoch: 7 [55680/60000 (93%)]\tLoss: 0.215286\nTrain Epoch: 7 [56320/60000 (94%)]\tLoss: 0.117026\nTrain Epoch: 7 [56960/60000 (95%)]\tLoss: 0.132321\nTrain Epoch: 7 [57600/60000 (96%)]\tLoss: 0.286004\nTrain Epoch: 7 [58240/60000 (97%)]\tLoss: 0.170485\nTrain Epoch: 7 [58880/60000 (98%)]\tLoss: 0.196613\nTrain Epoch: 7 [59520/60000 (99%)]\tLoss: 0.293870\n\nTest set: Average loss: 0.0657, Accuracy: 9801/10000 (98%)\n\nTrain Epoch: 8 [0/60000 (0%)]\tLoss: 0.315451\nTrain Epoch: 8 [640/60000 (1%)]\tLoss: 0.114413\nTrain Epoch: 8 [1280/60000 (2%)]\tLoss: 0.129036\nTrain Epoch: 8 [1920/60000 (3%)]\tLoss: 0.141999\nTrain Epoch: 8 [2560/60000 (4%)]\tLoss: 0.118697\nTrain Epoch: 8 [3200/60000 (5%)]\tLoss: 0.126823\nTrain Epoch: 8 [3840/60000 (6%)]\tLoss: 0.053924\nTrain Epoch: 8 [4480/60000 (7%)]\tLoss: 0.296224\nTrain Epoch: 8 [5120/60000 (9%)]\tLoss: 0.121338\nTrain Epoch: 8 [5760/60000 (10%)]\tLoss: 0.255161\nTrain Epoch: 8 [6400/60000 (11%)]\tLoss: 0.170684\nTrain Epoch: 8 [7040/60000 (12%)]\tLoss: 0.092008\nTrain Epoch: 8 [7680/60000 (13%)]\tLoss: 0.283091\nTrain Epoch: 8 [8320/60000 (14%)]\tLoss: 0.027133\nTrain Epoch: 8 [8960/60000 (15%)]\tLoss: 0.195686\nTrain Epoch: 8 [9600/60000 (16%)]\tLoss: 0.343612\nTrain Epoch: 8 [10240/60000 (17%)]\tLoss: 0.108563\nTrain Epoch: 8 [10880/60000 (18%)]\tLoss: 0.223832\nTrain Epoch: 8 [11520/60000 (19%)]\tLoss: 0.175617\nTrain Epoch: 8 [12160/60000 (20%)]\tLoss: 0.145828\nTrain Epoch: 8 [12800/60000 (21%)]\tLoss: 0.178722\nTrain Epoch: 8 [13440/60000 (22%)]\tLoss: 0.151158\nTrain Epoch: 8 [14080/60000 (23%)]\tLoss: 0.183155\nTrain Epoch: 8 [14720/60000 (25%)]\tLoss: 0.110281\nTrain Epoch: 8 [15360/60000 (26%)]\tLoss: 0.282224\nTrain Epoch: 8 [16000/60000 (27%)]\tLoss: 0.097411\nTrain Epoch: 8 [16640/60000 (28%)]\tLoss: 0.264533\nTrain Epoch: 8 [17280/60000 (29%)]\tLoss: 0.194778\nTrain Epoch: 8 [17920/60000 (30%)]\tLoss: 0.235924\nTrain Epoch: 8 [18560/60000 (31%)]\tLoss: 0.236801\nTrain Epoch: 8 [19200/60000 (32%)]\tLoss: 0.178174\nTrain Epoch: 8 [19840/60000 (33%)]\tLoss: 0.218752\nTrain Epoch: 8 [20480/60000 (34%)]\tLoss: 0.208353\nTrain Epoch: 8 [21120/60000 (35%)]\tLoss: 0.193034\nTrain Epoch: 8 [21760/60000 (36%)]\tLoss: 0.138453\nTrain Epoch: 8 [22400/60000 (37%)]\tLoss: 0.175271\nTrain Epoch: 8 [23040/60000 (38%)]\tLoss: 0.157295\nTrain Epoch: 8 [23680/60000 (39%)]\tLoss: 0.156248\nTrain Epoch: 8 [24320/60000 (41%)]\tLoss: 0.153413\nTrain Epoch: 8 [24960/60000 (42%)]\tLoss: 0.084870\nTrain Epoch: 8 [25600/60000 (43%)]\tLoss: 0.150966\nTrain Epoch: 8 [26240/60000 (44%)]\tLoss: 0.160973\nTrain Epoch: 8 [26880/60000 (45%)]\tLoss: 0.231433\nTrain Epoch: 8 [27520/60000 (46%)]\tLoss: 0.144396\nTrain Epoch: 8 [28160/60000 (47%)]\tLoss: 0.200417\nTrain Epoch: 8 [28800/60000 (48%)]\tLoss: 0.152939\nTrain Epoch: 8 [29440/60000 (49%)]\tLoss: 0.109962\nTrain Epoch: 8 [30080/60000 (50%)]\tLoss: 0.134907\nTrain Epoch: 8 [30720/60000 (51%)]\tLoss: 0.088782\nTrain Epoch: 8 [31360/60000 (52%)]\tLoss: 0.129031\nTrain Epoch: 8 [32000/60000 (53%)]\tLoss: 0.184744\nTrain Epoch: 8 [32640/60000 (54%)]\tLoss: 0.155463\nTrain Epoch: 8 [33280/60000 (55%)]\tLoss: 0.174192\nTrain Epoch: 8 [33920/60000 (57%)]\tLoss: 0.172103\nTrain Epoch: 8 [34560/60000 (58%)]\tLoss: 0.201503\nTrain Epoch: 8 [35200/60000 (59%)]\tLoss: 0.287885\nTrain Epoch: 8 [35840/60000 (60%)]\tLoss: 0.133675\nTrain Epoch: 8 [36480/60000 (61%)]\tLoss: 0.243534\nTrain Epoch: 8 [37120/60000 (62%)]\tLoss: 0.196020\nTrain Epoch: 8 [37760/60000 (63%)]\tLoss: 0.101380\nTrain Epoch: 8 [38400/60000 (64%)]\tLoss: 0.108299\nTrain Epoch: 8 [39040/60000 (65%)]\tLoss: 0.159048\nTrain Epoch: 8 [39680/60000 (66%)]\tLoss: 0.204734\nTrain Epoch: 8 [40320/60000 (67%)]\tLoss: 0.238383\nTrain Epoch: 8 [40960/60000 (68%)]\tLoss: 0.592663\nTrain Epoch: 8 [41600/60000 (69%)]\tLoss: 0.116080\nTrain Epoch: 8 [42240/60000 (70%)]\tLoss: 0.039719\nTrain Epoch: 8 [42880/60000 (71%)]\tLoss: 0.148190\nTrain Epoch: 8 [43520/60000 (72%)]\tLoss: 0.241765\nTrain Epoch: 8 [44160/60000 (74%)]\tLoss: 0.235942\nTrain Epoch: 8 [44800/60000 (75%)]\tLoss: 0.175277\nTrain Epoch: 8 [45440/60000 (76%)]\tLoss: 0.143608\nTrain Epoch: 8 [46080/60000 (77%)]\tLoss: 0.114853\nTrain Epoch: 8 [46720/60000 (78%)]\tLoss: 0.232284\nTrain Epoch: 8 [47360/60000 (79%)]\tLoss: 0.321072\nTrain Epoch: 8 [48000/60000 (80%)]\tLoss: 0.310765\nTrain Epoch: 8 [48640/60000 (81%)]\tLoss: 0.102070\nTrain Epoch: 8 [49280/60000 (82%)]\tLoss: 0.372137\nTrain Epoch: 8 [49920/60000 (83%)]\tLoss: 0.109344\nTrain Epoch: 8 [50560/60000 (84%)]\tLoss: 0.382866\nTrain Epoch: 8 [51200/60000 (85%)]\tLoss: 0.270467\nTrain Epoch: 8 [51840/60000 (86%)]\tLoss: 0.061211\nTrain Epoch: 8 [52480/60000 (87%)]\tLoss: 0.233812\nTrain Epoch: 8 [53120/60000 (88%)]\tLoss: 0.176510\nTrain Epoch: 8 [53760/60000 (90%)]\tLoss: 0.120536\nTrain Epoch: 8 [54400/60000 (91%)]\tLoss: 0.241959\nTrain Epoch: 8 [55040/60000 (92%)]\tLoss: 0.183966\nTrain Epoch: 8 [55680/60000 (93%)]\tLoss: 0.125279\nTrain Epoch: 8 [56320/60000 (94%)]\tLoss: 0.152849\nTrain Epoch: 8 [56960/60000 (95%)]\tLoss: 0.219788\nTrain Epoch: 8 [57600/60000 (96%)]\tLoss: 0.077843\nTrain Epoch: 8 [58240/60000 (97%)]\tLoss: 0.304191\nTrain Epoch: 8 [58880/60000 (98%)]\tLoss: 0.363550\nTrain Epoch: 8 [59520/60000 (99%)]\tLoss: 0.326421\n\nTest set: Average loss: 0.0632, Accuracy: 9807/10000 (98%)\n\nTrain Epoch: 9 [0/60000 (0%)]\tLoss: 0.140965\nTrain Epoch: 9 [640/60000 (1%)]\tLoss: 0.206063\nTrain Epoch: 9 [1280/60000 (2%)]\tLoss: 0.189364\nTrain Epoch: 9 [1920/60000 (3%)]\tLoss: 0.367962\nTrain Epoch: 9 [2560/60000 (4%)]\tLoss: 0.108362\nTrain Epoch: 9 [3200/60000 (5%)]\tLoss: 0.109142\nTrain Epoch: 9 [3840/60000 (6%)]\tLoss: 0.270022\nTrain Epoch: 9 [4480/60000 (7%)]\tLoss: 0.200647\nTrain Epoch: 9 [5120/60000 (9%)]\tLoss: 0.162118\nTrain Epoch: 9 [5760/60000 (10%)]\tLoss: 0.167245\nTrain Epoch: 9 [6400/60000 (11%)]\tLoss: 0.188903\nTrain Epoch: 9 [7040/60000 (12%)]\tLoss: 0.280550\nTrain Epoch: 9 [7680/60000 (13%)]\tLoss: 0.116265\nTrain Epoch: 9 [8320/60000 (14%)]\tLoss: 0.602693\nTrain Epoch: 9 [8960/60000 (15%)]\tLoss: 0.148682\nTrain Epoch: 9 [9600/60000 (16%)]\tLoss: 0.225477\nTrain Epoch: 9 [10240/60000 (17%)]\tLoss: 0.133642\nTrain Epoch: 9 [10880/60000 (18%)]\tLoss: 0.116083\nTrain Epoch: 9 [11520/60000 (19%)]\tLoss: 0.348113\nTrain Epoch: 9 [12160/60000 (20%)]\tLoss: 0.219562\nTrain Epoch: 9 [12800/60000 (21%)]\tLoss: 0.117716\nTrain Epoch: 9 [13440/60000 (22%)]\tLoss: 0.218508\nTrain Epoch: 9 [14080/60000 (23%)]\tLoss: 0.323755\nTrain Epoch: 9 [14720/60000 (25%)]\tLoss: 0.211174\nTrain Epoch: 9 [15360/60000 (26%)]\tLoss: 0.451853\nTrain Epoch: 9 [16000/60000 (27%)]\tLoss: 0.155174\nTrain Epoch: 9 [16640/60000 (28%)]\tLoss: 0.134905\nTrain Epoch: 9 [17280/60000 (29%)]\tLoss: 0.172428\nTrain Epoch: 9 [17920/60000 (30%)]\tLoss: 0.306172\nTrain Epoch: 9 [18560/60000 (31%)]\tLoss: 0.133085\nTrain Epoch: 9 [19200/60000 (32%)]\tLoss: 0.449040\nTrain Epoch: 9 [19840/60000 (33%)]\tLoss: 0.084722\nTrain Epoch: 9 [20480/60000 (34%)]\tLoss: 0.188086\nTrain Epoch: 9 [21120/60000 (35%)]\tLoss: 0.222472\nTrain Epoch: 9 [21760/60000 (36%)]\tLoss: 0.275132\nTrain Epoch: 9 [22400/60000 (37%)]\tLoss: 0.287421\nTrain Epoch: 9 [23040/60000 (38%)]\tLoss: 0.105733\nTrain Epoch: 9 [23680/60000 (39%)]\tLoss: 0.157949\nTrain Epoch: 9 [24320/60000 (41%)]\tLoss: 0.073462\nTrain Epoch: 9 [24960/60000 (42%)]\tLoss: 0.240201\nTrain Epoch: 9 [25600/60000 (43%)]\tLoss: 0.060848\nTrain Epoch: 9 [26240/60000 (44%)]\tLoss: 0.173801\nTrain Epoch: 9 [26880/60000 (45%)]\tLoss: 0.148143\nTrain Epoch: 9 [27520/60000 (46%)]\tLoss: 0.180779\nTrain Epoch: 9 [28160/60000 (47%)]\tLoss: 0.393192\nTrain Epoch: 9 [28800/60000 (48%)]\tLoss: 0.239243\nTrain Epoch: 9 [29440/60000 (49%)]\tLoss: 0.064345\nTrain Epoch: 9 [30080/60000 (50%)]\tLoss: 0.315658\nTrain Epoch: 9 [30720/60000 (51%)]\tLoss: 0.105739\nTrain Epoch: 9 [31360/60000 (52%)]\tLoss: 0.246439\nTrain Epoch: 9 [32000/60000 (53%)]\tLoss: 0.145221\nTrain Epoch: 9 [32640/60000 (54%)]\tLoss: 0.287615\nTrain Epoch: 9 [33280/60000 (55%)]\tLoss: 0.310717\nTrain Epoch: 9 [33920/60000 (57%)]\tLoss: 0.322760\nTrain Epoch: 9 [34560/60000 (58%)]\tLoss: 0.294462\nTrain Epoch: 9 [35200/60000 (59%)]\tLoss: 0.168697\nTrain Epoch: 9 [35840/60000 (60%)]\tLoss: 0.153495\nTrain Epoch: 9 [36480/60000 (61%)]\tLoss: 0.146843\nTrain Epoch: 9 [37120/60000 (62%)]\tLoss: 0.176622\nTrain Epoch: 9 [37760/60000 (63%)]\tLoss: 0.400825\nTrain Epoch: 9 [38400/60000 (64%)]\tLoss: 0.197533\nTrain Epoch: 9 [39040/60000 (65%)]\tLoss: 0.109741\nTrain Epoch: 9 [39680/60000 (66%)]\tLoss: 0.049689\nTrain Epoch: 9 [40320/60000 (67%)]\tLoss: 0.253087\nTrain Epoch: 9 [40960/60000 (68%)]\tLoss: 0.222971\nTrain Epoch: 9 [41600/60000 (69%)]\tLoss: 0.095467\nTrain Epoch: 9 [42240/60000 (70%)]\tLoss: 0.043052\nTrain Epoch: 9 [42880/60000 (71%)]\tLoss: 0.105347\nTrain Epoch: 9 [43520/60000 (72%)]\tLoss: 0.133342\nTrain Epoch: 9 [44160/60000 (74%)]\tLoss: 0.266375\nTrain Epoch: 9 [44800/60000 (75%)]\tLoss: 0.156081\nTrain Epoch: 9 [45440/60000 (76%)]\tLoss: 0.206747\nTrain Epoch: 9 [46080/60000 (77%)]\tLoss: 0.158561\nTrain Epoch: 9 [46720/60000 (78%)]\tLoss: 0.416148\nTrain Epoch: 9 [47360/60000 (79%)]\tLoss: 0.147991\nTrain Epoch: 9 [48000/60000 (80%)]\tLoss: 0.112567\nTrain Epoch: 9 [48640/60000 (81%)]\tLoss: 0.100846\nTrain Epoch: 9 [49280/60000 (82%)]\tLoss: 0.103345\nTrain Epoch: 9 [49920/60000 (83%)]\tLoss: 0.205922\nTrain Epoch: 9 [50560/60000 (84%)]\tLoss: 0.097610\nTrain Epoch: 9 [51200/60000 (85%)]\tLoss: 0.071967\nTrain Epoch: 9 [51840/60000 (86%)]\tLoss: 0.068125\nTrain Epoch: 9 [52480/60000 (87%)]\tLoss: 0.057313\nTrain Epoch: 9 [53120/60000 (88%)]\tLoss: 0.162428\nTrain Epoch: 9 [53760/60000 (90%)]\tLoss: 0.097614\nTrain Epoch: 9 [54400/60000 (91%)]\tLoss: 0.075174\nTrain Epoch: 9 [55040/60000 (92%)]\tLoss: 0.095530\nTrain Epoch: 9 [55680/60000 (93%)]\tLoss: 0.142529\nTrain Epoch: 9 [56320/60000 (94%)]\tLoss: 0.132163\nTrain Epoch: 9 [56960/60000 (95%)]\tLoss: 0.201932\nTrain Epoch: 9 [57600/60000 (96%)]\tLoss: 0.238939\nTrain Epoch: 9 [58240/60000 (97%)]\tLoss: 0.037396\nTrain Epoch: 9 [58880/60000 (98%)]\tLoss: 0.077772\nTrain Epoch: 9 [59520/60000 (99%)]\tLoss: 0.177759\n\nTest set: Average loss: 0.0559, Accuracy: 9813/10000 (98%)\n\nTrain Epoch: 10 [0/60000 (0%)]\tLoss: 0.112115\nTrain Epoch: 10 [640/60000 (1%)]\tLoss: 0.089035\nTrain Epoch: 10 [1280/60000 (2%)]\tLoss: 0.177925\nTrain Epoch: 10 [1920/60000 (3%)]\tLoss: 0.147350\nTrain Epoch: 10 [2560/60000 (4%)]\tLoss: 0.170561\nTrain Epoch: 10 [3200/60000 (5%)]\tLoss: 0.207891\nTrain Epoch: 10 [3840/60000 (6%)]\tLoss: 0.340160\nTrain Epoch: 10 [4480/60000 (7%)]\tLoss: 0.229032\nTrain Epoch: 10 [5120/60000 (9%)]\tLoss: 0.335419\nTrain Epoch: 10 [5760/60000 (10%)]\tLoss: 0.101219\nTrain Epoch: 10 [6400/60000 (11%)]\tLoss: 0.085085\nTrain Epoch: 10 [7040/60000 (12%)]\tLoss: 0.053658\nTrain Epoch: 10 [7680/60000 (13%)]\tLoss: 0.106224\nTrain Epoch: 10 [8320/60000 (14%)]\tLoss: 0.146947\nTrain Epoch: 10 [8960/60000 (15%)]\tLoss: 0.210157\nTrain Epoch: 10 [9600/60000 (16%)]\tLoss: 0.167598\nTrain Epoch: 10 [10240/60000 (17%)]\tLoss: 0.184822\nTrain Epoch: 10 [10880/60000 (18%)]\tLoss: 0.149518\nTrain Epoch: 10 [11520/60000 (19%)]\tLoss: 0.091374\nTrain Epoch: 10 [12160/60000 (20%)]\tLoss: 0.331635\nTrain Epoch: 10 [12800/60000 (21%)]\tLoss: 0.345818\nTrain Epoch: 10 [13440/60000 (22%)]\tLoss: 0.057789\nTrain Epoch: 10 [14080/60000 (23%)]\tLoss: 0.189208\nTrain Epoch: 10 [14720/60000 (25%)]\tLoss: 0.116747\nTrain Epoch: 10 [15360/60000 (26%)]\tLoss: 0.101344\nTrain Epoch: 10 [16000/60000 (27%)]\tLoss: 0.116675\nTrain Epoch: 10 [16640/60000 (28%)]\tLoss: 0.158562\nTrain Epoch: 10 [17280/60000 (29%)]\tLoss: 0.173697\nTrain Epoch: 10 [17920/60000 (30%)]\tLoss: 0.167972\nTrain Epoch: 10 [18560/60000 (31%)]\tLoss: 0.125186\nTrain Epoch: 10 [19200/60000 (32%)]\tLoss: 0.116458\nTrain Epoch: 10 [19840/60000 (33%)]\tLoss: 0.107688\nTrain Epoch: 10 [20480/60000 (34%)]\tLoss: 0.131942\nTrain Epoch: 10 [21120/60000 (35%)]\tLoss: 0.189690\nTrain Epoch: 10 [21760/60000 (36%)]\tLoss: 0.106075\nTrain Epoch: 10 [22400/60000 (37%)]\tLoss: 0.100791\nTrain Epoch: 10 [23040/60000 (38%)]\tLoss: 0.151750\nTrain Epoch: 10 [23680/60000 (39%)]\tLoss: 0.242852\nTrain Epoch: 10 [24320/60000 (41%)]\tLoss: 0.367772\nTrain Epoch: 10 [24960/60000 (42%)]\tLoss: 0.160668\nTrain Epoch: 10 [25600/60000 (43%)]\tLoss: 0.209858\nTrain Epoch: 10 [26240/60000 (44%)]\tLoss: 0.267443\nTrain Epoch: 10 [26880/60000 (45%)]\tLoss: 0.134159\nTrain Epoch: 10 [27520/60000 (46%)]\tLoss: 0.176844\nTrain Epoch: 10 [28160/60000 (47%)]\tLoss: 0.083609\nTrain Epoch: 10 [28800/60000 (48%)]\tLoss: 0.093472\nTrain Epoch: 10 [29440/60000 (49%)]\tLoss: 0.133502\nTrain Epoch: 10 [30080/60000 (50%)]\tLoss: 0.207314\nTrain Epoch: 10 [30720/60000 (51%)]\tLoss: 0.095819\nTrain Epoch: 10 [31360/60000 (52%)]\tLoss: 0.165338\nTrain Epoch: 10 [32000/60000 (53%)]\tLoss: 0.172792\nTrain Epoch: 10 [32640/60000 (54%)]\tLoss: 0.200346\nTrain Epoch: 10 [33280/60000 (55%)]\tLoss: 0.188566\nTrain Epoch: 10 [33920/60000 (57%)]\tLoss: 0.063107\nTrain Epoch: 10 [34560/60000 (58%)]\tLoss: 0.208076\nTrain Epoch: 10 [35200/60000 (59%)]\tLoss: 0.336500\nTrain Epoch: 10 [35840/60000 (60%)]\tLoss: 0.098523\nTrain Epoch: 10 [36480/60000 (61%)]\tLoss: 0.239501\nTrain Epoch: 10 [37120/60000 (62%)]\tLoss: 0.108441\nTrain Epoch: 10 [37760/60000 (63%)]\tLoss: 0.161891\nTrain Epoch: 10 [38400/60000 (64%)]\tLoss: 0.232178\nTrain Epoch: 10 [39040/60000 (65%)]\tLoss: 0.281599\nTrain Epoch: 10 [39680/60000 (66%)]\tLoss: 0.202701\nTrain Epoch: 10 [40320/60000 (67%)]\tLoss: 0.313276\nTrain Epoch: 10 [40960/60000 (68%)]\tLoss: 0.149932\nTrain Epoch: 10 [41600/60000 (69%)]\tLoss: 0.078690\nTrain Epoch: 10 [42240/60000 (70%)]\tLoss: 0.068174\nTrain Epoch: 10 [42880/60000 (71%)]\tLoss: 0.114682\nTrain Epoch: 10 [43520/60000 (72%)]\tLoss: 0.278032\nTrain Epoch: 10 [44160/60000 (74%)]\tLoss: 0.207701\nTrain Epoch: 10 [44800/60000 (75%)]\tLoss: 0.149129\nTrain Epoch: 10 [45440/60000 (76%)]\tLoss: 0.209997\nTrain Epoch: 10 [46080/60000 (77%)]\tLoss: 0.181944\nTrain Epoch: 10 [46720/60000 (78%)]\tLoss: 0.071149\nTrain Epoch: 10 [47360/60000 (79%)]\tLoss: 0.088598\nTrain Epoch: 10 [48000/60000 (80%)]\tLoss: 0.196593\nTrain Epoch: 10 [48640/60000 (81%)]\tLoss: 0.195960\nTrain Epoch: 10 [49280/60000 (82%)]\tLoss: 0.227564\nTrain Epoch: 10 [49920/60000 (83%)]\tLoss: 0.051203\nTrain Epoch: 10 [50560/60000 (84%)]\tLoss: 0.105916\nTrain Epoch: 10 [51200/60000 (85%)]\tLoss: 0.176384\nTrain Epoch: 10 [51840/60000 (86%)]\tLoss: 0.054657\nTrain Epoch: 10 [52480/60000 (87%)]\tLoss: 0.107465\nTrain Epoch: 10 [53120/60000 (88%)]\tLoss: 0.072626\nTrain Epoch: 10 [53760/60000 (90%)]\tLoss: 0.187904\nTrain Epoch: 10 [54400/60000 (91%)]\tLoss: 0.104509\nTrain Epoch: 10 [55040/60000 (92%)]\tLoss: 0.174006\nTrain Epoch: 10 [55680/60000 (93%)]\tLoss: 0.122760\nTrain Epoch: 10 [56320/60000 (94%)]\tLoss: 0.150131\nTrain Epoch: 10 [56960/60000 (95%)]\tLoss: 0.076365\nTrain Epoch: 10 [57600/60000 (96%)]\tLoss: 0.127536\nTrain Epoch: 10 [58240/60000 (97%)]\tLoss: 0.233154\nTrain Epoch: 10 [58880/60000 (98%)]\tLoss: 0.113188\nTrain Epoch: 10 [59520/60000 (99%)]\tLoss: 0.282389\n\nTest set: Average loss: 0.0531, Accuracy: 9837/10000 (98%)\n\n\n\nThe experiment completed successfully. Finalizing run...\nLogging experiment finalizing status in history service\n\n\nRun is completed.", + "run_properties": { + "SendToClient": "1", + "arguments": "--output-dir ./outputs", + "created_utc": "2018-09-25T11:56:04.832205Z", + "distributed_processes": [], + "end_time_utc": "2018-09-25T12:15:57.841467Z", + "log_files": { + "azureml-logs/55_batchai_execution.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/55_batchai_execution.txt?sv=2017-04-17&sr=b&sig=NNkIC62xdG1h6156XtjtgwTJ1ScXlfxhBiBicNNoExE%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", + "azureml-logs/60_control_log.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/60_control_log.txt?sv=2017-04-17&sr=b&sig=i2mtPt6w5xHkEjpkyfl%2BSD1GPpIdpzIbY6sVUQ62QMo%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", + "azureml-logs/80_driver_log.txt": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/80_driver_log.txt?sv=2017-04-17&sr=b&sig=CvqNHP18huWuXWdi%2BeiPcnztgJfI1iQQ6fV6Li25z1Y%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r", + "azureml-logs/azureml.log": "https://onnxamlistorageekgyifen.blob.core.windows.net/azureml/ExperimentRun/pytorch1-mnist_1537876563990/azureml-logs/azureml.log?sv=2017-04-17&sr=b&sig=UTaxvUU4Ua%2FpsXPwQnSIV%2FbKK1zERtclIIjcTfbcSzQ%3D&st=2018-09-25T12%3A06%3A00Z&se=2018-09-25T20%3A16%3A00Z&sp=r" + }, + "properties": { + "ContentSnapshotId": "727976ee-33bf-44c7-af65-ef1a1cbd2980", + "azureml.runsource": "experiment" + }, + "run_duration": "0:19:53", + "run_id": "pytorch1-mnist_1537876563990", + "script_name": "mnist.py", + "status": "Completed", + "tags": {} + }, + "widget_settings": {}, + "workbench_uri": "https://mlworkspace.azure.ai/portal/subscriptions/75f78a03-482f-4fd8-8c71-5ddc08f92726/resourceGroups/onnxdemos/providers/Microsoft.MachineLearningServices/workspaces/onnx-aml-ignite-demo/experiment/pytorch1-mnist/run/pytorch1-mnist_1537876563990" + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb b/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb index 292eb621..704041a2 100644 --- a/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb +++ b/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb @@ -1,477 +1,477 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploying a web service to Azure Kubernetes Service (AKS)\n", - "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", - "We then test and delete the service, image and model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.webservice import Webservice, AksWebservice\n", - "from azureml.core.image import Image\n", - "from azureml.core.model import Model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Get workspace\n", - "Load existing workspace from the config file info." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Register the model\n", - "Register an existing trained model, add descirption and tags." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "from azureml.core.model import Model\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", - " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)\n", - "\n", - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create an image\n", - "Create an image using the registered model the script that will load and run the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", - " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", - " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " # you can return any data type as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " error = str(e)\n", - " return error" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"Image with ridge regression model\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", - " )\n", - "\n", - "image = ContainerImage.create(name = \"myimage1\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Use a custom Docker image\n", - "\n", - "You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", - "\n", - "Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n", - "```python\n", - "# use an image available in public Container Registry without authentication\n", - "image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n", - "\n", - "# or, use an image available in a private Container Registry\n", - "image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n", - "image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n", - "image_config.base_image_registry.username = \"username\"\n", - "image_config.base_image_registry.password = \"password\"\n", - "\n", - "# or, use an image built during training.\n", - "image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n", - "```\n", - "You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Provision the AKS Cluster\n", - "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the default configuration (can also provide parameters to customize)\n", - "prov_config = AksCompute.provisioning_configuration()\n", - "\n", - "aks_name = 'my-aks-9' \n", - "# Create the cluster\n", - "aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create AKS Cluster in an existing virtual network (optional)\n", - "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-azure-kubernetes-service) for more details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "'''\n", - "from azureml.core.compute import ComputeTarget, AksCompute\n", - "\n", - "# Create the compute configuration and set virtual network information\n", - "config = AksCompute.provisioning_configuration(location=\"eastus2\")\n", - "config.vnet_resourcegroup_name = \"mygroup\"\n", - "config.vnet_name = \"mynetwork\"\n", - "config.subnet_name = \"default\"\n", - "config.service_cidr = \"10.0.0.0/16\"\n", - "config.dns_service_ip = \"10.0.0.10\"\n", - "config.docker_bridge_cidr = \"172.17.0.1/16\"\n", - "\n", - "# Create the compute target\n", - "aks_target = ComputeTarget.create(workspace = ws,\n", - " name = \"myaks\",\n", - " provisioning_configuration = config)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Enable SSL on the AKS Cluster (optional)\n", - "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service) for more details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# provisioning_config = AksCompute.provisioning_configuration(ssl_cert_pem_file=\"cert.pem\", ssl_key_pem_file=\"key.pem\", ssl_cname=\"www.contoso.com\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_target.wait_for_completion(show_output = True)\n", - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional step: Attach existing AKS cluster\n", - "\n", - "If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "'''\n", - "# Use the default configuration (can also provide parameters to customize)\n", - "resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n", - "\n", - "create_name='my-existing-aks' \n", - "# Create the cluster\n", - "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", - "aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n", - "# Wait for the operation to complete\n", - "aks_target.wait_for_completion(True)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploy web service to AKS" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Set the web service configuration (using default here)\n", - "aks_config = AksWebservice.deploy_configuration()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service_name ='aks-service-1'\n", - "\n", - "aks_service = Webservice.deploy_from_image(workspace = ws, \n", - " name = aks_service_name,\n", - " image = image,\n", - " deployment_config = aks_config,\n", - " deployment_target = aks_target)\n", - "aks_service.wait_for_deployment(show_output = True)\n", - "print(aks_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Test the web service using run method\n", - "We test the web sevice by passing data.\n", - "Run() method retrieves API keys behind the scenes to make sure that call is authenticated." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "prediction = aks_service.run(input_data = test_sample)\n", - "print(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Test the web service using raw HTTP request (optional)\n", - "Alternatively you can construct a raw HTTP request and send it to the service. In this case you need to explicitly pass the HTTP header. This process is shown in the next 2 cells." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# retreive the API keys. AML generates two keys.\n", - "'''\n", - "key1, Key2 = aks_service.get_keys()\n", - "print(key1)\n", - "'''" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# construct raw HTTP request and send to the service\n", - "'''\n", - "%%time\n", - "\n", - "import requests\n", - "\n", - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "# Don't forget to add key to the HTTP header.\n", - "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", - "\n", - "resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n", - "\n", - "\n", - "print(\"prediction:\", resp.text)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Clean up\n", - "Delete the service, image and model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service.delete()\n", - "image.delete()\n", - "model.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "aashishb" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploying a web service to Azure Kubernetes Service (AKS)\n", + "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", + "We then test and delete the service, image and model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.compute import AksCompute, ComputeTarget\n", + "from azureml.core.webservice import Webservice, AksWebservice\n", + "from azureml.core.image import Image\n", + "from azureml.core.model import Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Get workspace\n", + "Load existing workspace from the config file info." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register the model\n", + "Register an existing trained model, add descirption and tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", + " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create an image\n", + "Create an image using the registered model the script that will load and run the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " # you can return any data type as long as it is JSON-serializable\n", + " return result.tolist()\n", + " except Exception as e:\n", + " error = str(e)\n", + " return error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"Image with ridge regression model\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", + " )\n", + "\n", + "image = ContainerImage.create(name = \"myimage1\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Use a custom Docker image\n", + "\n", + "You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", + "\n", + "Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n", + "```python\n", + "# use an image available in public Container Registry without authentication\n", + "image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n", + "\n", + "# or, use an image available in a private Container Registry\n", + "image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n", + "image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n", + "image_config.base_image_registry.username = \"username\"\n", + "image_config.base_image_registry.password = \"password\"\n", + "\n", + "# or, use an image built during training.\n", + "image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n", + "```\n", + "You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Provision the AKS Cluster\n", + "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the default configuration (can also provide parameters to customize)\n", + "prov_config = AksCompute.provisioning_configuration()\n", + "\n", + "aks_name = 'my-aks-9' \n", + "# Create the cluster\n", + "aks_target = ComputeTarget.create(workspace = ws, \n", + " name = aks_name, \n", + " provisioning_configuration = prov_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create AKS Cluster in an existing virtual network (optional)\n", + "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-azure-kubernetes-service) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'''\n", + "from azureml.core.compute import ComputeTarget, AksCompute\n", + "\n", + "# Create the compute configuration and set virtual network information\n", + "config = AksCompute.provisioning_configuration(location=\"eastus2\")\n", + "config.vnet_resourcegroup_name = \"mygroup\"\n", + "config.vnet_name = \"mynetwork\"\n", + "config.subnet_name = \"default\"\n", + "config.service_cidr = \"10.0.0.0/16\"\n", + "config.dns_service_ip = \"10.0.0.10\"\n", + "config.docker_bridge_cidr = \"172.17.0.1/16\"\n", + "\n", + "# Create the compute target\n", + "aks_target = ComputeTarget.create(workspace = ws,\n", + " name = \"myaks\",\n", + " provisioning_configuration = config)\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Enable SSL on the AKS Cluster (optional)\n", + "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service) for more details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# provisioning_config = AksCompute.provisioning_configuration(ssl_cert_pem_file=\"cert.pem\", ssl_key_pem_file=\"key.pem\", ssl_cname=\"www.contoso.com\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_target.wait_for_completion(show_output = True)\n", + "print(aks_target.provisioning_state)\n", + "print(aks_target.provisioning_errors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optional step: Attach existing AKS cluster\n", + "\n", + "If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'''\n", + "# Use the default configuration (can also provide parameters to customize)\n", + "resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n", + "\n", + "create_name='my-existing-aks' \n", + "# Create the cluster\n", + "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", + "aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n", + "# Wait for the operation to complete\n", + "aks_target.wait_for_completion(True)\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploy web service to AKS" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Set the web service configuration (using default here)\n", + "aks_config = AksWebservice.deploy_configuration()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service_name ='aks-service-1'\n", + "\n", + "aks_service = Webservice.deploy_from_image(workspace = ws, \n", + " name = aks_service_name,\n", + " image = image,\n", + " deployment_config = aks_config,\n", + " deployment_target = aks_target)\n", + "aks_service.wait_for_deployment(show_output = True)\n", + "print(aks_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Test the web service using run method\n", + "We test the web sevice by passing data.\n", + "Run() method retrieves API keys behind the scenes to make sure that call is authenticated." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "prediction = aks_service.run(input_data = test_sample)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Test the web service using raw HTTP request (optional)\n", + "Alternatively you can construct a raw HTTP request and send it to the service. In this case you need to explicitly pass the HTTP header. This process is shown in the next 2 cells." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# retreive the API keys. AML generates two keys.\n", + "'''\n", + "key1, Key2 = aks_service.get_keys()\n", + "print(key1)\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# construct raw HTTP request and send to the service\n", + "'''\n", + "%%time\n", + "\n", + "import requests\n", + "\n", + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "# Don't forget to add key to the HTTP header.\n", + "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", + "\n", + "resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n", + "\n", + "\n", + "print(\"prediction:\", resp.text)\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Clean up\n", + "Delete the service, image and model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service.delete()\n", + "image.delete()\n", + "model.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "aashishb" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb b/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb index ba5ccd15..bf34fb7b 100644 --- a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb +++ b/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb @@ -1,453 +1,453 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register Model, Create Image and Deploy Service\n", - "\n", - "This example shows how to deploy a web service in step-by-step fashion:\n", - "\n", - " 1. Register model\n", - " 2. Query versions of models and select one to deploy\n", - " 3. Create Docker image\n", - " 4. Query versions of images\n", - " 5. Deploy the image as web service\n", - " \n", - "**IMPORTANT**:\n", - " * This notebook requires you to first complete [train-within-notebook](../../training/train-within-notebook/train-within-notebook.ipynb) example\n", - " \n", - "The train-within-notebook example taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n", - "\n", - "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "import sklearn\n", - "\n", - "library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n", - "\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n", - " model_name = \"sklearn_regression_model.pkl\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "regression_models = Model.list(workspace=ws, tags=['area'])\n", - "for m in regression_models:\n", - " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can pick a specific model to deploy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(model.name, model.description, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Docker Image" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", - " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", - " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " # you can return any datatype as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " error = str(e)\n", - " return error" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that following command can take few minutes. \n", - "\n", - "You can add tags and descriptions to images. Also, an image can contain multiple models." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Image with ridge regression model\")\n", - "\n", - "image = Image.create(name = \"myimage1\",\n", - " # this is the model object. note you can pass in 0-n models via this list-type parameter\n", - " # in case you need to reference multiple models, or none at all, in your scoring script.\n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Use a custom Docker image\n", - "\n", - "You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", - "\n", - "Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n", - "```python\n", - "# use an image available in public Container Registry without authentication\n", - "image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n", - "\n", - "# or, use an image available in a private Container Registry\n", - "image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n", - "image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n", - "image_config.base_image_registry.username = \"username\"\n", - "image_config.base_image_registry.password = \"password\"\n", - "\n", - "# or, use an image built during training.\n", - "image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n", - "```\n", - "You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "List images by tag and find out the detailed build log for debugging." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "for i in Image.list(workspace = ws,tags = [\"area\"]):\n", - " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy image as web service on Azure Container Instance\n", - "\n", - "Note that the service creation can take few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", - " description = 'Predict diabetes using regression model')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'my-aci-service-2'\n", - "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test web service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Call the web service with some dummy input data to get a prediction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "prediction = aci_service.run(input_data=test_sample)\n", - "print(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete ACI to clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "aci_service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "aashishb" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register Model, Create Image and Deploy Service\n", + "\n", + "This example shows how to deploy a web service in step-by-step fashion:\n", + "\n", + " 1. Register model\n", + " 2. Query versions of models and select one to deploy\n", + " 3. Create Docker image\n", + " 4. Query versions of images\n", + " 5. Deploy the image as web service\n", + " \n", + "**IMPORTANT**:\n", + " * This notebook requires you to first complete [train-within-notebook](../../training/train-within-notebook/train-within-notebook.ipynb) example\n", + " \n", + "The train-within-notebook example taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "import sklearn\n", + "\n", + "library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n", + "\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n", + " model_name = \"sklearn_regression_model.pkl\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "regression_models = Model.list(workspace=ws, tags=['area'])\n", + "for m in regression_models:\n", + " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can pick a specific model to deploy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(model.name, model.description, model.version, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Docker Image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " # you can return any datatype as long as it is JSON-serializable\n", + " return result.tolist()\n", + " except Exception as e:\n", + " error = str(e)\n", + " return error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that following command can take few minutes. \n", + "\n", + "You can add tags and descriptions to images. Also, an image can contain multiple models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.image import Image, ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", + " execution_script=\"score.py\",\n", + " conda_file=\"myenv.yml\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Image with ridge regression model\")\n", + "\n", + "image = Image.create(name = \"myimage1\",\n", + " # this is the model object. note you can pass in 0-n models via this list-type parameter\n", + " # in case you need to reference multiple models, or none at all, in your scoring script.\n", + " models = [model],\n", + " image_config = image_config, \n", + " workspace = ws)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Use a custom Docker image\n", + "\n", + "You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", + "\n", + "Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n", + "```python\n", + "# use an image available in public Container Registry without authentication\n", + "image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n", + "\n", + "# or, use an image available in a private Container Registry\n", + "image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n", + "image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n", + "image_config.base_image_registry.username = \"username\"\n", + "image_config.base_image_registry.password = \"password\"\n", + "\n", + "# or, use an image built during training.\n", + "image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n", + "```\n", + "You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "List images by tag and find out the detailed build log for debugging." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "for i in Image.list(workspace = ws,tags = [\"area\"]):\n", + " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy image as web service on Azure Container Instance\n", + "\n", + "Note that the service creation can take few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", + " description = 'Predict diabetes using regression model')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'my-aci-service-2'\n", + "print(aci_service_name)\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test web service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call the web service with some dummy input data to get a prediction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "prediction = aci_service.run(input_data=test_sample)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete ACI to clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "aci_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "aashishb" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.ipynb b/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.ipynb index b236705b..cb2118b7 100644 --- a/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.ipynb +++ b/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.ipynb @@ -1,606 +1,606 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train using Azure Machine Learning Compute\n", - "\n", - "* Initialize a Workspace\n", - "* Create an Experiment\n", - "* Introduction to AmlCompute\n", - "* Submit an AmlCompute run in a few different ways\n", - " - Provision as a run based compute target \n", - " - Provision as a persistent compute target (Basic)\n", - " - Provision as a persistent compute target (Advanced)\n", - "* Additional operations to perform on AmlCompute\n", - "* Download model explanation data from the Run History Portal\n", - "* Print the explanation data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize a Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create An Experiment\n", - "\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "experiment_name = 'explainer-remote-run-on-amlcompute'\n", - "experiment = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction to AmlCompute\n", - "\n", - "Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n", - "\n", - "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n", - "\n", - "For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n", - "\n", - "If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n", - "\n", - "**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", - "\n", - "\n", - "The training script `run_explainer.py` is already created for you. Let's have a look." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit an AmlCompute run in a few different ways\n", - "\n", - "First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n", - "\n", - "You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "\n", - "AmlCompute.supported_vmsizes(workspace=ws)\n", - "# AmlCompute.supported_vmsizes(workspace=ws, location='southcentralus')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create project directory\n", - "\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import shutil\n", - "\n", - "project_folder = './explainer-remote-run-on-amlcompute'\n", - "os.makedirs(project_folder, exist_ok=True)\n", - "shutil.copy('run_explainer.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Provision as a run based compute target\n", - "\n", - "You can provision AmlCompute as a compute target at run-time. In this case, the compute is auto-created for your run, scales up to max_nodes that you specify, and then **deleted automatically** after the run completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", - "\n", - "# create a new runconfig object\n", - "run_config = RunConfiguration()\n", - "\n", - "# signal that you want to use AmlCompute to execute script.\n", - "run_config.target = \"amlcompute\"\n", - "\n", - "# AmlCompute will be created in the same region as workspace\n", - "# Set vm size for AmlCompute\n", - "run_config.amlcompute.vm_size = 'STANDARD_D2_V2'\n", - "\n", - "# enable Docker \n", - "run_config.environment.docker.enabled = True\n", - "\n", - "# set Docker base image to the default CPU-based image\n", - "run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n", - "\n", - "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", - "run_config.environment.python.user_managed_dependencies = False\n", - "\n", - "azureml_pip_packages = [\n", - " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", - " 'azureml-explain-model'\n", - "]\n", - "\n", - "# specify CondaDependencies obj\n", - "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", - " pip_packages=azureml_pip_packages)\n", - "\n", - "# Now submit a run on AmlCompute\n", - "from azureml.core.script_run_config import ScriptRunConfig\n", - "\n", - "script_run_config = ScriptRunConfig(source_directory=project_folder,\n", - " script='run_explainer.py',\n", - " run_config=run_config)\n", - "\n", - "run = experiment.submit(script_run_config)\n", - "\n", - "# Show run details\n", - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Provision as a persistent compute target (Basic)\n", - "\n", - "You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n", - "\n", - "* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n", - "* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpucluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", - " print('Found existing cluster, use it.')\n", - "except ComputeTargetException:\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", - " max_nodes=4)\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", - "\n", - "cpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure & Run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# create a new RunConfig object\n", - "run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to AmlCompute target created in previous step\n", - "run_config.target = cpu_cluster.name\n", - "\n", - "# enable Docker \n", - "run_config.environment.docker.enabled = True\n", - "\n", - "azureml_pip_packages = [\n", - " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", - " 'azureml-explain-model'\n", - "]\n", - "\n", - "# specify CondaDependencies obj\n", - "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", - " pip_packages=azureml_pip_packages)\n", - "\n", - "from azureml.core import Run\n", - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory=project_folder, \n", - " script='run_explainer.py', \n", - " run_config=run_config) \n", - "run = experiment.submit(config=src)\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Provision as a persistent compute target (Advanced)\n", - "\n", - "You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n", - "\n", - "In addition to `vm_size` and `max_nodes`, you can specify:\n", - "* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n", - "* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n", - "* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n", - "* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n", - "* `vnet_name`: Name of VNet\n", - "* `subnet_name`: Name of SubNet within the VNet" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpucluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", - " print('Found existing cluster, use it.')\n", - "except ComputeTargetException:\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", - " vm_priority='lowpriority',\n", - " min_nodes=2,\n", - " max_nodes=4,\n", - " idle_seconds_before_scaledown='300',\n", - " vnet_resourcegroup_name='',\n", - " vnet_name='',\n", - " subnet_name='')\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", - "\n", - "cpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure & Run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# create a new RunConfig object\n", - "run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to AmlCompute target created in previous step\n", - "run_config.target = cpu_cluster.name\n", - "\n", - "# enable Docker \n", - "run_config.environment.docker.enabled = True\n", - "\n", - "azureml_pip_packages = [\n", - " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", - " 'azureml-explain-model'\n", - "]\n", - "\n", - "# specify CondaDependencies obj\n", - "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", - " pip_packages=azureml_pip_packages)\n", - "\n", - "from azureml.core import Run\n", - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory=project_folder, \n", - " script='run_explainer.py', \n", - " run_config=run_config) \n", - "run = experiment.submit(config=src)\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", - "\n", - "client = ExplanationClient.from_run(run)\n", - "# Get the top k (e.g., 4) most important features with their importance values\n", - "explanation = client.download_model_explanation(top_k=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Additional operations to perform on AmlCompute\n", - "\n", - "You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get_status () gets the latest status of the AmlCompute target\n", - "cpu_cluster.get_status().serialize()\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n", - "# cpu_cluster.update(min_nodes=1)\n", - "# cpu_cluster.update(max_nodes=10)\n", - "cpu_cluster.update(idle_seconds_before_scaledown=300)\n", - "# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", - "# 'cpucluster' in this case but use a different VM family for instance.\n", - "\n", - "# cpu_cluster.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download Model Explanation Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", - "\n", - "# Get model explanation data\n", - "client = ExplanationClient.from_run(run)\n", - "explanation = client.download_model_explanation()\n", - "local_importance_values = explanation.local_importance_values\n", - "expected_values = explanation.expected_values\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Or you can use the saved run.id to retrive the feature importance values\n", - "client = ExplanationClient.from_run_id(ws, experiment_name, run.id)\n", - "explanation = client.download_model_explanation()\n", - "local_importance_values = explanation.local_importance_values\n", - "expected_values = explanation.expected_values" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the top k (e.g., 4) most important features with their importance values\n", - "explanation = client.download_model_explanation(top_k=4)\n", - "global_importance_values = explanation.get_ranked_global_values()\n", - "global_importance_names = explanation.get_ranked_global_names()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('global importance values: {}'.format(global_importance_values))\n", - "print('global importance names: {}'.format(global_importance_names))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "Great, you are ready to move on to the remaining notebooks." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train using Azure Machine Learning Compute\n", + "\n", + "* Initialize a Workspace\n", + "* Create an Experiment\n", + "* Introduction to AmlCompute\n", + "* Submit an AmlCompute run in a few different ways\n", + " - Provision as a run based compute target \n", + " - Provision as a persistent compute target (Basic)\n", + " - Provision as a persistent compute target (Advanced)\n", + "* Additional operations to perform on AmlCompute\n", + "* Download model explanation data from the Run History Portal\n", + "* Print the explanation data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize a Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create An Experiment\n", + "\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "experiment_name = 'explainer-remote-run-on-amlcompute'\n", + "experiment = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction to AmlCompute\n", + "\n", + "Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n", + "\n", + "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n", + "\n", + "For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n", + "\n", + "If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n", + "\n", + "**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", + "\n", + "\n", + "The training script `run_explainer.py` is already created for you. Let's have a look." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit an AmlCompute run in a few different ways\n", + "\n", + "First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n", + "\n", + "You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "\n", + "AmlCompute.supported_vmsizes(workspace=ws)\n", + "# AmlCompute.supported_vmsizes(workspace=ws, location='southcentralus')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create project directory\n", + "\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import shutil\n", + "\n", + "project_folder = './explainer-remote-run-on-amlcompute'\n", + "os.makedirs(project_folder, exist_ok=True)\n", + "shutil.copy('run_explainer.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Provision as a run based compute target\n", + "\n", + "You can provision AmlCompute as a compute target at run-time. In this case, the compute is auto-created for your run, scales up to max_nodes that you specify, and then **deleted automatically** after the run completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", + "\n", + "# create a new runconfig object\n", + "run_config = RunConfiguration()\n", + "\n", + "# signal that you want to use AmlCompute to execute script.\n", + "run_config.target = \"amlcompute\"\n", + "\n", + "# AmlCompute will be created in the same region as workspace\n", + "# Set vm size for AmlCompute\n", + "run_config.amlcompute.vm_size = 'STANDARD_D2_V2'\n", + "\n", + "# enable Docker \n", + "run_config.environment.docker.enabled = True\n", + "\n", + "# set Docker base image to the default CPU-based image\n", + "run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n", + "\n", + "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", + "run_config.environment.python.user_managed_dependencies = False\n", + "\n", + "azureml_pip_packages = [\n", + " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", + " 'azureml-explain-model'\n", + "]\n", + "\n", + "# specify CondaDependencies obj\n", + "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", + " pip_packages=azureml_pip_packages)\n", + "\n", + "# Now submit a run on AmlCompute\n", + "from azureml.core.script_run_config import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory=project_folder,\n", + " script='run_explainer.py',\n", + " run_config=run_config)\n", + "\n", + "run = experiment.submit(script_run_config)\n", + "\n", + "# Show run details\n", + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Provision as a persistent compute target (Basic)\n", + "\n", + "You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n", + "\n", + "* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n", + "* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=4)\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + "\n", + "cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure & Run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# create a new RunConfig object\n", + "run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to AmlCompute target created in previous step\n", + "run_config.target = cpu_cluster.name\n", + "\n", + "# enable Docker \n", + "run_config.environment.docker.enabled = True\n", + "\n", + "azureml_pip_packages = [\n", + " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", + " 'azureml-explain-model'\n", + "]\n", + "\n", + "# specify CondaDependencies obj\n", + "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", + " pip_packages=azureml_pip_packages)\n", + "\n", + "from azureml.core import Run\n", + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory=project_folder, \n", + " script='run_explainer.py', \n", + " run_config=run_config) \n", + "run = experiment.submit(config=src)\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Provision as a persistent compute target (Advanced)\n", + "\n", + "You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n", + "\n", + "In addition to `vm_size` and `max_nodes`, you can specify:\n", + "* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n", + "* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n", + "* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n", + "* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n", + "* `vnet_name`: Name of VNet\n", + "* `subnet_name`: Name of SubNet within the VNet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " vm_priority='lowpriority',\n", + " min_nodes=2,\n", + " max_nodes=4,\n", + " idle_seconds_before_scaledown='300',\n", + " vnet_resourcegroup_name='',\n", + " vnet_name='',\n", + " subnet_name='')\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + "\n", + "cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure & Run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# create a new RunConfig object\n", + "run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to AmlCompute target created in previous step\n", + "run_config.target = cpu_cluster.name\n", + "\n", + "# enable Docker \n", + "run_config.environment.docker.enabled = True\n", + "\n", + "azureml_pip_packages = [\n", + " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", + " 'azureml-explain-model'\n", + "]\n", + "\n", + "# specify CondaDependencies obj\n", + "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", + " pip_packages=azureml_pip_packages)\n", + "\n", + "from azureml.core import Run\n", + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory=project_folder, \n", + " script='run_explainer.py', \n", + " run_config=run_config) \n", + "run = experiment.submit(config=src)\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", + "\n", + "client = ExplanationClient.from_run(run)\n", + "# Get the top k (e.g., 4) most important features with their importance values\n", + "explanation = client.download_model_explanation(top_k=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Additional operations to perform on AmlCompute\n", + "\n", + "You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get_status () gets the latest status of the AmlCompute target\n", + "cpu_cluster.get_status().serialize()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n", + "# cpu_cluster.update(min_nodes=1)\n", + "# cpu_cluster.update(max_nodes=10)\n", + "cpu_cluster.update(idle_seconds_before_scaledown=300)\n", + "# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", + "# 'cpu-cluster' in this case but use a different VM family for instance.\n", + "\n", + "# cpu_cluster.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download Model Explanation Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", + "\n", + "# Get model explanation data\n", + "client = ExplanationClient.from_run(run)\n", + "explanation = client.download_model_explanation()\n", + "local_importance_values = explanation.local_importance_values\n", + "expected_values = explanation.expected_values\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Or you can use the saved run.id to retrive the feature importance values\n", + "client = ExplanationClient.from_run_id(ws, experiment_name, run.id)\n", + "explanation = client.download_model_explanation()\n", + "local_importance_values = explanation.local_importance_values\n", + "expected_values = explanation.expected_values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the top k (e.g., 4) most important features with their importance values\n", + "explanation = client.download_model_explanation(top_k=4)\n", + "global_importance_values = explanation.get_ranked_global_values()\n", + "global_importance_names = explanation.get_ranked_global_names()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('global importance values: {}'.format(global_importance_values))\n", + "print('global importance names: {}'.format(global_importance_names))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "Great, you are ready to move on to the remaining notebooks." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.ipynb index 2c57d6a6..0caaf4d6 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.ipynb @@ -1,279 +1,279 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Breast cancer diagnosis classification with scikit-learn (run model explainer locally)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package\n", - "\n", - "1. Train a SVM classification model using Scikit-learn\n", - "2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n", - "3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n", - "4. Visualize the global and local explanations with the visualization dashboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn import svm\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Run model explainer locally with full data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the breast cancer diagnosis data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "breast_cancer_data = load_breast_cancer()\n", - "classes = breast_cancer_data.target_names.tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split data into train and test\n", - "from sklearn.model_selection import train_test_split\n", - "x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a SVM classification model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", - "model = clf.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions (global explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", - "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Sorted SHAP values\n", - "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", - "# Corresponding feature names\n", - "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", - "# feature ranks (based on original order of features)\n", - "print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n", - "# per class feature names\n", - "print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n", - "# per class feature importance values\n", - "print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions as a collection of local (instance-level) explanations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# feature shap values for all features and all data points in the training data\n", - "print('local importance values: {}'.format(global_explanation.local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain local data points (individual instances)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# explain the first member of the test set\n", - "instance_num = 0\n", - "local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the prediction for the first member of the test set and explain why model made that prediction\n", - "prediction_value = clf.predict(x_test)[instance_num]\n", - "\n", - "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", - "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", - "\n", - "\n", - "dict(zip(sorted_local_importance_names, sorted_local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 2. Load visualization dashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", - "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "# Or, in Jupyter Labs, uncomment below\n", - "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", - "# jupyter labextension install microsoft-mli-widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.visualize import ExplanationDashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ExplanationDashboard(global_explanation, model, x_test)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Breast cancer diagnosis classification with scikit-learn (run model explainer locally)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package\n", + "\n", + "1. Train a SVM classification model using Scikit-learn\n", + "2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n", + "3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n", + "4. Visualize the global and local explanations with the visualization dashboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn import svm\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Run model explainer locally with full data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load the breast cancer diagnosis data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "breast_cancer_data = load_breast_cancer()\n", + "classes = breast_cancer_data.target_names.tolist()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and test\n", + "from sklearn.model_selection import train_test_split\n", + "x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a SVM classification model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", + "model = clf.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions (global explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", + "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Sorted SHAP values\n", + "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", + "# Corresponding feature names\n", + "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", + "# feature ranks (based on original order of features)\n", + "print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n", + "# per class feature names\n", + "print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n", + "# per class feature importance values\n", + "print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions as a collection of local (instance-level) explanations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# feature shap values for all features and all data points in the training data\n", + "print('local importance values: {}'.format(global_explanation.local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain local data points (individual instances)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# explain the first member of the test set\n", + "instance_num = 0\n", + "local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the prediction for the first member of the test set and explain why model made that prediction\n", + "prediction_value = clf.predict(x_test)[instance_num]\n", + "\n", + "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", + "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", + "\n", + "\n", + "dict(zip(sorted_local_importance_names, sorted_local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. Load visualization dashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", + "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "# Or, in Jupyter Labs, uncomment below\n", + "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", + "# jupyter labextension install microsoft-mli-widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.visualize import ExplanationDashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ExplanationDashboard(global_explanation, model, x_test)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.ipynb index 0f8dd7ce..30b71ecc 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.ipynb @@ -1,280 +1,280 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Iris flower classification with scikit-learn (run model explainer locally)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package\n", - "\n", - "1. Train a SVM classification model using Scikit-learn\n", - "2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n", - "3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n", - "4. Visualize the global and local explanations with the visualization dashboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_iris\n", - "from sklearn import svm\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Run model explainer locally with full data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the breast cancer diagnosis data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iris = load_iris()\n", - "X = iris['data']\n", - "y = iris['target']\n", - "classes = iris['target_names']\n", - "feature_names = iris['feature_names']" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split data into train and test\n", - "from sklearn.model_selection import train_test_split\n", - "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a SVM classification model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", - "model = clf.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(model, x_train, features = feature_names, classes=classes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions (global explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Sorted SHAP values\n", - "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", - "# Corresponding feature names\n", - "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", - "# feature ranks (based on original order of features)\n", - "print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n", - "# per class feature names\n", - "print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n", - "# per class feature importance values\n", - "print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions as a collection of local (instance-level) explanations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# feature shap values for all features and all data points in the training data\n", - "print('local importance values: {}'.format(global_explanation.local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain local data points (individual instances)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# explain the first member of the test set\n", - "instance_num = 0\n", - "local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the prediction for the first member of the test set and explain why model made that prediction\n", - "prediction_value = clf.predict(x_test)[instance_num]\n", - "\n", - "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", - "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", - "\n", - "\n", - "dict(zip(sorted_local_importance_names, sorted_local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load visualization dashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", - "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "# Or, in Jupyter Labs, uncomment below\n", - "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", - "# jupyter labextension install microsoft-mli-widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.visualize import ExplanationDashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ExplanationDashboard(global_explanation, model, x_test)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Iris flower classification with scikit-learn (run model explainer locally)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package\n", + "\n", + "1. Train a SVM classification model using Scikit-learn\n", + "2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n", + "3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n", + "4. Visualize the global and local explanations with the visualization dashboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_iris\n", + "from sklearn import svm\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Run model explainer locally with full data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load the breast cancer diagnosis data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iris = load_iris()\n", + "X = iris['data']\n", + "y = iris['target']\n", + "classes = iris['target_names']\n", + "feature_names = iris['feature_names']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and test\n", + "from sklearn.model_selection import train_test_split\n", + "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a SVM classification model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", + "model = clf.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(model, x_train, features = feature_names, classes=classes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions (global explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Sorted SHAP values\n", + "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", + "# Corresponding feature names\n", + "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", + "# feature ranks (based on original order of features)\n", + "print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n", + "# per class feature names\n", + "print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n", + "# per class feature importance values\n", + "print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions as a collection of local (instance-level) explanations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# feature shap values for all features and all data points in the training data\n", + "print('local importance values: {}'.format(global_explanation.local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain local data points (individual instances)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# explain the first member of the test set\n", + "instance_num = 0\n", + "local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the prediction for the first member of the test set and explain why model made that prediction\n", + "prediction_value = clf.predict(x_test)[instance_num]\n", + "\n", + "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", + "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", + "\n", + "\n", + "dict(zip(sorted_local_importance_names, sorted_local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load visualization dashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", + "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "# Or, in Jupyter Labs, uncomment below\n", + "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", + "# jupyter labextension install microsoft-mli-widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.visualize import ExplanationDashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ExplanationDashboard(global_explanation, model, x_test)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.ipynb index 926144fe..6707ce35 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.ipynb @@ -1,272 +1,272 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Boston Housing Price Prediction with scikit-learn (run model explainer locally)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package\n", - "\n", - "1. Train a GradientBoosting regression model using Scikit-learn\n", - "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n", - "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n", - "4. Visualize the global and local explanations with the visualization dashboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn import datasets\n", - "from sklearn.ensemble import GradientBoostingRegressor\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Run model explainer locally with full data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the Boston house price data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "boston_data = datasets.load_boston()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split data into train and test\n", - "from sklearn.model_selection import train_test_split\n", - "x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a GradientBoosting Regression model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n", - " learning_rate=0.1, loss='huber',\n", - " random_state=1)\n", - "model = reg.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(model, x_train, features = boston_data.feature_names)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions (global explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", - "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Sorted SHAP values \n", - "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", - "# Corresponding feature names\n", - "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", - "# feature ranks (based on original order of features)\n", - "print('global importance rank: {}'.format(global_explanation.global_importance_rank))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions as a collection of local (instance-level) explanations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# feature shap values for all features and all data points in the training data\n", - "print('local importance values: {}'.format(global_explanation.local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain local data points (individual instances)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_explanation = tabular_explainer.explain_local(x_test[0,:])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# sorted local feature importance information; reflects the original feature order\n", - "sorted_local_importance_names = local_explanation.get_ranked_local_names()\n", - "sorted_local_importance_values = local_explanation.get_ranked_local_values()\n", - "\n", - "print('sorted local importance names: {}'.format(sorted_local_importance_names))\n", - "print('sorted local importance values: {}'.format(sorted_local_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load visualization dashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", - "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "# Or, in Jupyter Labs, uncomment below\n", - "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", - "# jupyter labextension install microsoft-mli-widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.visualize import ExplanationDashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ExplanationDashboard(global_explanation, model, x_test)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Boston Housing Price Prediction with scikit-learn (run model explainer locally)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package\n", + "\n", + "1. Train a GradientBoosting regression model using Scikit-learn\n", + "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n", + "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n", + "4. Visualize the global and local explanations with the visualization dashboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn import datasets\n", + "from sklearn.ensemble import GradientBoostingRegressor\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Run model explainer locally with full data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load the Boston house price data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "boston_data = datasets.load_boston()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and test\n", + "from sklearn.model_selection import train_test_split\n", + "x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a GradientBoosting Regression model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n", + " learning_rate=0.1, loss='huber',\n", + " random_state=1)\n", + "model = reg.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(model, x_train, features = boston_data.feature_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions (global explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", + "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Sorted SHAP values \n", + "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n", + "# Corresponding feature names\n", + "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n", + "# feature ranks (based on original order of features)\n", + "print('global importance rank: {}'.format(global_explanation.global_importance_rank))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions as a collection of local (instance-level) explanations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# feature shap values for all features and all data points in the training data\n", + "print('local importance values: {}'.format(global_explanation.local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain local data points (individual instances)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_explanation = tabular_explainer.explain_local(x_test[0,:])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# sorted local feature importance information; reflects the original feature order\n", + "sorted_local_importance_names = local_explanation.get_ranked_local_names()\n", + "sorted_local_importance_values = local_explanation.get_ranked_local_values()\n", + "\n", + "print('sorted local importance names: {}'.format(sorted_local_importance_names))\n", + "print('sorted local importance values: {}'.format(sorted_local_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load visualization dashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", + "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "# Or, in Jupyter Labs, uncomment below\n", + "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", + "# jupyter labextension install microsoft-mli-widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.visualize import ExplanationDashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ExplanationDashboard(global_explanation, model, x_test)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.ipynb index b98487cc..ee2744bf 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.ipynb @@ -1,302 +1,337 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Summary\n", - "From raw data that is a mixture of categoricals and numeric, featurize the categoricals using one hot encoding. Use tabular explainer to get explain object and then get raw feature importances" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package on raw features\n", - "\n", - "1. Train a Logistic Regression model using Scikit-learn\n", - "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n", - "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n", - "4. Visualize the global and local explanations with the visualization dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This example needs sklearn-pandas. If it is not installed, uncomment and run the following line." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#!pip install sklearn-pandas" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.pipeline import Pipeline\n", - "from sklearn.impute import SimpleImputer\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", - "from sklearn.linear_model import LogisticRegression\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer\n", - "from sklearn_pandas import DataFrameMapper\n", - "import pandas as pd\n", - "import numpy as np" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "titanic_url = ('https://raw.githubusercontent.com/amueller/'\n", - " 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n", - "data = pd.read_csv(titanic_url)\n", - "# fill missing values\n", - "data = data.fillna(method=\"ffill\")\n", - "data = data.fillna(method=\"bfill\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Run model explainer locally with full data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "numeric_features = ['age', 'fare']\n", - "categorical_features = ['embarked', 'sex', 'pclass']\n", - "\n", - "y = data['survived'].values\n", - "X = data[categorical_features + numeric_features]\n", - "\n", - "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.pipeline import Pipeline\n", - "from sklearn.impute import SimpleImputer\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", - "from sklearn_pandas import DataFrameMapper\n", - "\n", - "# Impute, standardize the numeric features and one-hot encode the categorical features. \n", - "\n", - "transformations = [\n", - " ([\"age\", \"fare\"], Pipeline(steps=[\n", - " ('imputer', SimpleImputer(strategy='median')),\n", - " ('scaler', StandardScaler())\n", - " ])),\n", - " ([\"embarked\"], Pipeline(steps=[\n", - " (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n", - " (\"encoder\", OneHotEncoder(sparse=False))])),\n", - " ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n", - "]\n", - "\n", - "\n", - "# Append classifier to preprocessing pipeline.\n", - "# Now we have a full prediction pipeline.\n", - "clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n", - " ('classifier', LogisticRegression(solver='lbfgs'))])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a Logistic Regression model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = clf.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(clf.steps[-1][1], initialization_examples=x_train, features=x_train.columns, transformations=transformations)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", - "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sorted_global_importance_values = global_explanation.get_ranked_global_values()\n", - "sorted_global_importance_names = global_explanation.get_ranked_global_names()\n", - "dict(zip(sorted_global_importance_names, sorted_global_importance_values))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions as a collection of local (instance-level) explanations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# explain the first member of the test set\n", - "local_explanation = tabular_explainer.explain_local(x_test[:1])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the prediction for the first member of the test set and explain why model made that prediction\n", - "prediction_value = clf.predict(x_test)[0]\n", - "\n", - "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", - "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", - "\n", - "# Sorted local SHAP values\n", - "print('ranked local importance values: {}'.format(sorted_local_importance_values))\n", - "# Corresponding feature names\n", - "print('ranked local importance names: {}'.format(sorted_local_importance_names))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 2. Load visualization dashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", - "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", - "# Or, in Jupyter Labs, uncomment below\n", - "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", - "# jupyter labextension install microsoft-mli-widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.explain.model.visualize import ExplanationDashboard" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ExplanationDashboard(global_explanation, model, x_test)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Summary\n", + "From raw data that is a mixture of categoricals and numeric, featurize the categoricals using one hot encoding. Use tabular explainer to get explain object and then get raw feature importances" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package on raw features\n", + "\n", + "1. Train a Logistic Regression model using Scikit-learn\n", + "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n", + "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n", + "4. Visualize the global and local explanations with the visualization dashboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.pipeline import Pipeline\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.linear_model import LogisticRegression\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer\n", + "import pandas as pd\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "titanic_url = ('https://raw.githubusercontent.com/amueller/'\n", + " 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n", + "data = pd.read_csv(titanic_url)\n", + "# fill missing values\n", + "data = data.fillna(method=\"ffill\")\n", + "data = data.fillna(method=\"bfill\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Run model explainer locally with full data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "numeric_features = ['age', 'fare']\n", + "categorical_features = ['embarked', 'sex', 'pclass']\n", + "\n", + "y = data['survived'].values\n", + "X = data[categorical_features + numeric_features]\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "sklearn imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.pipeline import Pipeline\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.compose import ColumnTransformer\n", + "\n", + "transformations = ColumnTransformer([\n", + " (\"age_fare\", Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='median')),\n", + " ('scaler', StandardScaler())\n", + " ]), [\"age\", \"fare\"]),\n", + " (\"embarked\", Pipeline(steps=[\n", + " (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n", + " (\"encoder\", OneHotEncoder(sparse=False))]), [\"embarked\"]),\n", + " (\"sex_pclass\", OneHotEncoder(sparse=False), [\"sex\", \"pclass\"]) \n", + "])\n", + "\n", + "\n", + "# Append classifier to preprocessing pipeline.\n", + "# Now we have a full prediction pipeline.\n", + "clf = Pipeline(steps=[('preprocessor', transformations),\n", + " ('classifier', LogisticRegression(solver='lbfgs'))])\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'''\n", + "# Uncomment below if sklearn-pandas is not installed\n", + "#!pip install sklearn-pandas\n", + "from sklearn_pandas import DataFrameMapper\n", + "\n", + "# Impute, standardize the numeric features and one-hot encode the categorical features. \n", + "\n", + "transformations = [\n", + " ([\"age\", \"fare\"], Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='median')),\n", + " ('scaler', StandardScaler())\n", + " ])),\n", + " ([\"embarked\"], Pipeline(steps=[\n", + " (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n", + " (\"encoder\", OneHotEncoder(sparse=False))])),\n", + " ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n", + "]\n", + "\n", + "\n", + "# Append classifier to preprocessing pipeline.\n", + "# Now we have a full prediction pipeline.\n", + "clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n", + " ('classifier', LogisticRegression(solver='lbfgs'))])\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a Logistic Regression model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = clf.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(clf.steps[-1][1], initialization_examples=x_train, features=x_train.columns, transformations=transformations)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", + "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sorted_global_importance_values = global_explanation.get_ranked_global_values()\n", + "sorted_global_importance_names = global_explanation.get_ranked_global_names()\n", + "dict(zip(sorted_global_importance_names, sorted_global_importance_values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions as a collection of local (instance-level) explanations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# explain the first member of the test set\n", + "local_explanation = tabular_explainer.explain_local(x_test[:1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the prediction for the first member of the test set and explain why model made that prediction\n", + "prediction_value = clf.predict(x_test)[0]\n", + "\n", + "sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n", + "sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n", + "\n", + "# Sorted local SHAP values\n", + "print('ranked local importance values: {}'.format(sorted_local_importance_values))\n", + "# Corresponding feature names\n", + "print('ranked local importance names: {}'.format(sorted_local_importance_names))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. Load visualization dashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Note you will need to have extensions enabled prior to jupyter kernel starting\n", + "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n", + "# Or, in Jupyter Labs, uncomment below\n", + "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", + "# jupyter labextension install microsoft-mli-widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.explain.model.visualize import ExplanationDashboard" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ExplanationDashboard(global_explanation, model, x_test)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.ipynb index 0ac1032a..c6b0765c 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.ipynb @@ -1,262 +1,262 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Breast cancer diagnosis classification with scikit-learn (save model explanations via AML Run History)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package\n", - "\n", - "1. Train a SVM classification model using Scikit-learn\n", - "2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn import svm\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Run model explainer locally with full data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the breast cancer diagnosis data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "breast_cancer_data = load_breast_cancer()\n", - "classes = breast_cancer_data.target_names.tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split data into train and test\n", - "from sklearn.model_selection import train_test_split\n", - "x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a SVM classification model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", - "model = clf.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions (global explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", - "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 2. Save Model Explanation With AML Run History" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "from azureml.core import Workspace, Experiment, Run\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer\n", - "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'explain_model'\n", - "experiment = Experiment(ws, experiment_name)\n", - "run = experiment.start_logging()\n", - "client = ExplanationClient.from_run(run)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Uploading model explanation data for storage or visualization in webUX\n", - "# The explanation can then be downloaded on any compute\n", - "client.upload_model_explanation(global_explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get model explanation data\n", - "explanation = client.download_model_explanation()\n", - "local_importance_values = explanation.local_importance_values\n", - "expected_values = explanation.expected_values" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the top k (e.g., 4) most important features with their importance values\n", - "explanation = client.download_model_explanation(top_k=4)\n", - "global_importance_values = explanation.get_ranked_global_values()\n", - "global_importance_names = explanation.get_ranked_global_names()\n", - "per_class_names = explanation.get_ranked_per_class_names()[0]\n", - "per_class_values = explanation.get_ranked_per_class_values()[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('per class feature importance values: {}'.format(per_class_values))\n", - "print('per class feature importance names: {}'.format(per_class_names))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dict(zip(per_class_names, per_class_values))" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Breast cancer diagnosis classification with scikit-learn (save model explanations via AML Run History)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package\n", + "\n", + "1. Train a SVM classification model using Scikit-learn\n", + "2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn import svm\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Run model explainer locally with full data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load the breast cancer diagnosis data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "breast_cancer_data = load_breast_cancer()\n", + "classes = breast_cancer_data.target_names.tolist()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and test\n", + "from sklearn.model_selection import train_test_split\n", + "x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a SVM classification model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "clf = svm.SVC(gamma=0.001, C=100., probability=True)\n", + "model = clf.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions (global explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", + "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. Save Model Explanation With AML Run History" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "from azureml.core import Workspace, Experiment, Run\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer\n", + "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'explain_model'\n", + "experiment = Experiment(ws, experiment_name)\n", + "run = experiment.start_logging()\n", + "client = ExplanationClient.from_run(run)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Uploading model explanation data for storage or visualization in webUX\n", + "# The explanation can then be downloaded on any compute\n", + "client.upload_model_explanation(global_explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get model explanation data\n", + "explanation = client.download_model_explanation()\n", + "local_importance_values = explanation.local_importance_values\n", + "expected_values = explanation.expected_values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the top k (e.g., 4) most important features with their importance values\n", + "explanation = client.download_model_explanation(top_k=4)\n", + "global_importance_values = explanation.get_ranked_global_values()\n", + "global_importance_names = explanation.get_ranked_global_names()\n", + "per_class_names = explanation.get_ranked_per_class_names()[0]\n", + "per_class_values = explanation.get_ranked_per_class_values()[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('per class feature importance values: {}'.format(per_class_values))\n", + "print('per class feature importance names: {}'.format(per_class_names))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dict(zip(per_class_names, per_class_values))" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.ipynb b/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.ipynb index 1fbdb4ea..9ee72cec 100644 --- a/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.ipynb +++ b/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.ipynb @@ -1,276 +1,276 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Boston Housing Price Prediction with scikit-learn (save model explanations via AML Run History)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explain a model with the AML explain-model package\n", - "\n", - "1. Train a GradientBoosting regression model using Scikit-learn\n", - "2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Save Model Explanation With AML Run History" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Import Iris dataset\n", - "from sklearn import datasets\n", - "from sklearn.ensemble import GradientBoostingRegressor\n", - "\n", - "import azureml.core\n", - "from azureml.core import Workspace, Experiment, Run\n", - "from azureml.explain.model.tabular_explainer import TabularExplainer\n", - "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'explain_model'\n", - "experiment = Experiment(ws, experiment_name)\n", - "run = experiment.start_logging()\n", - "client = ExplanationClient.from_run(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the Boston house price data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "boston_data = datasets.load_boston()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split data into train and test\n", - "from sklearn.model_selection import train_test_split\n", - "x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train a GradientBoosting Regression model, which you want to explain" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n", - " learning_rate=0.1, loss='huber',\n", - " random_state=1)\n", - "model = clf.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain predictions on your local machine" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tabular_explainer = TabularExplainer(model, x_train, features=boston_data.feature_names)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain overall model predictions (global explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", - "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", - "global_explanation = tabular_explainer.explain_global(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Uploading model explanation data for storage or visualization in webUX\n", - "# The explanation can then be downloaded on any compute\n", - "client.upload_model_explanation(global_explanation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get model explanation data\n", - "explanation = client.download_model_explanation()\n", - "local_importance_values = explanation.local_importance_values\n", - "expected_values = explanation.expected_values" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the values\n", - "print('expected values: {}'.format(expected_values))\n", - "print('local importance values: {}'.format(local_importance_values))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the top k (e.g., 4) most important features with their importance values\n", - "explanation = client.download_model_explanation(top_k=4)\n", - "global_importance_values = explanation.get_ranked_global_values()\n", - "global_importance_names = explanation.get_ranked_global_names()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('global importance values: {}'.format(global_importance_values))\n", - "print('global importance names: {}'.format(global_importance_names))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explain individual instance predictions (local explanation) ##### needs to get updated with the new build" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_explanation = tabular_explainer.explain_local(x_test[0,:])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# local feature importance information\n", - "local_importance_values = local_explanation.local_importance_values\n", - "print('local importance values: {}'.format(local_importance_values))" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "mesameki" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Boston Housing Price Prediction with scikit-learn (save model explanations via AML Run History)" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explain a model with the AML explain-model package\n", + "\n", + "1. Train a GradientBoosting regression model using Scikit-learn\n", + "2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Save Model Explanation With AML Run History" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Import Iris dataset\n", + "from sklearn import datasets\n", + "from sklearn.ensemble import GradientBoostingRegressor\n", + "\n", + "import azureml.core\n", + "from azureml.core import Workspace, Experiment, Run\n", + "from azureml.explain.model.tabular_explainer import TabularExplainer\n", + "from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'explain_model'\n", + "experiment = Experiment(ws, experiment_name)\n", + "run = experiment.start_logging()\n", + "client = ExplanationClient.from_run(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load the Boston house price data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "boston_data = datasets.load_boston()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split data into train and test\n", + "from sklearn.model_selection import train_test_split\n", + "x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train a GradientBoosting Regression model, which you want to explain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n", + " learning_rate=0.1, loss='huber',\n", + " random_state=1)\n", + "model = clf.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain predictions on your local machine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_explainer = TabularExplainer(model, x_train, features=boston_data.feature_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain overall model predictions (global explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n", + "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n", + "global_explanation = tabular_explainer.explain_global(x_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Uploading model explanation data for storage or visualization in webUX\n", + "# The explanation can then be downloaded on any compute\n", + "client.upload_model_explanation(global_explanation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get model explanation data\n", + "explanation = client.download_model_explanation()\n", + "local_importance_values = explanation.local_importance_values\n", + "expected_values = explanation.expected_values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Print the values\n", + "print('expected values: {}'.format(expected_values))\n", + "print('local importance values: {}'.format(local_importance_values))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the top k (e.g., 4) most important features with their importance values\n", + "explanation = client.download_model_explanation(top_k=4)\n", + "global_importance_values = explanation.get_ranked_global_values()\n", + "global_importance_names = explanation.get_ranked_global_names()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('global importance values: {}'.format(global_importance_values))\n", + "print('global importance names: {}'.format(global_importance_names))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explain individual instance predictions (local explanation) ##### needs to get updated with the new build" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_explanation = tabular_explainer.explain_local(x_test[0,:])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# local feature importance information\n", + "local_importance_values = local_explanation.local_importance_values\n", + "print('local importance values: {}'.format(local_importance_values))" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "mesameki" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb index c989192b..656c2827 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb @@ -1,1191 +1,1191 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# NYC Taxi Data Regression Model\n", - "This is an [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) version of two-part tutorial ([Part 1](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep), [Part 2](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models)) available for Azure Machine Learning.\n", - "\n", - "You can combine the two part tutorial into one using AzureML Pipelines as Pipelines provide a way to stitch together various steps involved (like data preparation and training in this case) in a machine learning workflow.\n", - "\n", - "In this notebook, you learn how to prepare data for regression modeling by using the [Azure Machine Learning Data Prep SDK](https://aka.ms/data-prep-sdk) for Python. You run various transformations to filter and combine two different NYC taxi data sets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", - "\n", - "After you complete building the model, you can predict the cost of a taxi trip by training a model on data features. These features include the pickup day and time, the number of passengers, and the pickup location.\n", - "\n", - "## Prerequisite\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", - "\n", - "We will run various transformations to filter and combine two different NYC taxi data sets. We will use DataPrep SDK for this preparing data. \n", - "\n", - "Perform `pip install azureml-dataprep` if you have't already done so." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare data for regression modeling\n", - "First, we will prepare data for regression modeling. We will leverage the convenience of Azure Open Datasets along with the power of Azure Machine Learning service to create a regression model to predict NYC taxi fare prices. Perform `pip install azureml-contrib-opendatasets` to get the open dataset package. The Open Datasets package contains a class representing each data source (NycTlcGreen and NycTlcYellow) to easily filter date parameters before downloading.\n", - "\n", - "\n", - "### Load data\n", - "Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid MemoryError with large datasets. To download a year of taxi data, iteratively fetch one month at a time, and before appending it to green_df_raw, randomly sample 500 records from each month to avoid bloating the dataframe. Then preview the data. To keep this process short, we are sampling data of only 1 month.\n", - "\n", - "Note: Open Datasets has mirroring classes for working in Spark environments where data size and memory aren't a concern." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.opendatasets import NycTlcGreen, NycTlcYellow\n", - "import pandas as pd\n", - "from datetime import datetime\n", - "from dateutil.relativedelta import relativedelta\n", - "\n", - "green_df_raw = pd.DataFrame([])\n", - "start = datetime.strptime(\"1/1/2016\",\"%m/%d/%Y\")\n", - "end = datetime.strptime(\"1/31/2016\",\"%m/%d/%Y\")\n", - "\n", - "number_of_months = 1\n", - "sample_size = 5000\n", - "\n", - "for sample_month in range(number_of_months):\n", - " temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", - " .to_pandas_dataframe()\n", - " green_df_raw = green_df_raw.append(temp_df_green.sample(sample_size))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "yellow_df_raw = pd.DataFrame([])\n", - "start = datetime.strptime(\"1/1/2016\",\"%m/%d/%Y\")\n", - "end = datetime.strptime(\"1/31/2016\",\"%m/%d/%Y\")\n", - "\n", - "sample_size = 500\n", - "\n", - "for sample_month in range(number_of_months):\n", - " temp_df_yellow = NycTlcYellow(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", - " .to_pandas_dataframe()\n", - " yellow_df_raw = yellow_df_raw.append(temp_df_yellow.sample(sample_size))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### See the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.dataprep as dprep\n", - "from IPython.display import display\n", - "\n", - "display(green_df_raw.head(5))\n", - "display(yellow_df_raw.head(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download data locally and then upload to Azure Blob\n", - "This is a one-time process to save the dave in the default datastore. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "dataDir = \"data\"\n", - "\n", - "if not os.path.exists(dataDir):\n", - " os.mkdir(dataDir)\n", - "\n", - "greenDir = dataDir + \"/green\"\n", - "yelloDir = dataDir + \"/yellow\"\n", - "\n", - "if not os.path.exists(greenDir):\n", - " os.mkdir(greenDir)\n", - " \n", - "if not os.path.exists(yelloDir):\n", - " os.mkdir(yelloDir)\n", - " \n", - "greenTaxiData = greenDir + \"/part-00000\"\n", - "yellowTaxiData = yelloDir + \"/part-00000\"\n", - "\n", - "green_df_raw.to_csv(greenTaxiData, index=False)\n", - "yellow_df_raw.to_csv(yellowTaxiData, index=False)\n", - "\n", - "print(\"Data written to local folder.\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(\"Workspace: \" + ws.name, \"Region: \" + ws.location, sep = '\\n')\n", - "\n", - "# Default datastore\n", - "default_store = ws.get_default_datastore() \n", - "\n", - "default_store.upload_files([greenTaxiData], \n", - " target_path = 'green', \n", - " overwrite = False, \n", - " show_progress = True)\n", - "\n", - "default_store.upload_files([yellowTaxiData], \n", - " target_path = 'yellow', \n", - " overwrite = False, \n", - " show_progress = True)\n", - "\n", - "print(\"Upload calls completed.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Setup Compute\n", - "#### Create new or use an existing compute" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "aml_compute = ws.get_default_compute_target(\"CPU\")\n", - "aml_compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Define RunConfig for the compute\n", - "We need `azureml-dataprep` SDK for all the steps below. We will also use `pandas`, `scikit-learn` and `automl` for the training step. Defining the `runconfig` for that." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# Create a new runconfig object\n", - "aml_run_config = RunConfiguration()\n", - "\n", - "# Use the aml_compute you created above. \n", - "aml_run_config.target = aml_compute\n", - "\n", - "# Enable Docker\n", - "aml_run_config.environment.docker.enabled = True\n", - "\n", - "# Set Docker base image to the default CPU-based image\n", - "aml_run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/base:0.2.1\"\n", - "\n", - "# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", - "aml_run_config.environment.python.user_managed_dependencies = False\n", - "\n", - "# Auto-prepare the Docker image when used for execution (if it is not already prepared)\n", - "aml_run_config.auto_prepare_environment = True\n", - "\n", - "# Specify CondaDependencies obj, add necessary packages\n", - "aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", - " conda_packages=['pandas','scikit-learn'], \n", - " pip_packages=['azureml-sdk', 'azureml-dataprep', 'azureml-train-automl==1.0.33'], \n", - " pin_sdk_version=False)\n", - "\n", - "print (\"Run configuration created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare data\n", - "Now we will prepare for regression modeling by using the `Azure Machine Learning Data Prep SDK for Python`. We run various transformations to filter and combine two different NYC taxi data sets.\n", - "\n", - "We achieve this by creating a separate step for each transformation as this allows us to reuse the steps and saves us from running all over again in case of any change. We will keep data preparation scripts in one subfolder and training scripts in another.\n", - "\n", - "> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Define Useful Colums\n", - "Here we are defining a set of \"useful\" columns for both Green and Yellow taxi data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(green_df_raw.columns)\n", - "display(yellow_df_raw.columns)\n", - "\n", - "# useful columns needed for the Azure Machine Learning NYC Taxi tutorial\n", - "useful_columns = str([\"cost\", \"distance\", \"dropoff_datetime\", \"dropoff_latitude\", \n", - " \"dropoff_longitude\", \"passengers\", \"pickup_datetime\", \n", - " \"pickup_latitude\", \"pickup_longitude\", \"store_forward\", \"vendor\"]).replace(\",\", \";\")\n", - "\n", - "print(\"Useful columns defined.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Cleanse Green taxi data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.data.data_reference import DataReference \n", - "from azureml.pipeline.core import PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep\n", - "\n", - "# python scripts folder\n", - "prepare_data_folder = './scripts/prepdata'\n", - "\n", - "blob_green_data = DataReference(\n", - " datastore=default_store,\n", - " data_reference_name=\"green_taxi_data\",\n", - " path_on_datastore=\"green/part-00000\")\n", - "\n", - "# rename columns as per Azure Machine Learning NYC Taxi tutorial\n", - "green_columns = str({ \n", - " \"vendorID\": \"vendor\",\n", - " \"lpepPickupDatetime\": \"pickup_datetime\",\n", - " \"lpepDropoffDatetime\": \"dropoff_datetime\",\n", - " \"storeAndFwdFlag\": \"store_forward\",\n", - " \"pickupLongitude\": \"pickup_longitude\",\n", - " \"pickupLatitude\": \"pickup_latitude\",\n", - " \"dropoffLongitude\": \"dropoff_longitude\",\n", - " \"dropoffLatitude\": \"dropoff_latitude\",\n", - " \"passengerCount\": \"passengers\",\n", - " \"fareAmount\": \"cost\",\n", - " \"tripDistance\": \"distance\"\n", - "}).replace(\",\", \";\")\n", - "\n", - "# Define output after cleansing step\n", - "cleansed_green_data = PipelineData(\"green_taxi_data\", datastore=default_store)\n", - "\n", - "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# cleansing step creation\n", - "# See the cleanse.py for details about input and output\n", - "cleansingStepGreen = PythonScriptStep(\n", - " name=\"Cleanse Green Taxi Data\",\n", - " script_name=\"cleanse.py\", \n", - " arguments=[\"--input_cleanse\", blob_green_data, \n", - " \"--useful_columns\", useful_columns,\n", - " \"--columns\", green_columns,\n", - " \"--output_cleanse\", cleansed_green_data],\n", - " inputs=[blob_green_data],\n", - " outputs=[cleansed_green_data],\n", - " compute_target=aml_compute,\n", - " runconfig=aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"cleansingStepGreen created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Cleanse Yellow taxi data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "blob_yellow_data = DataReference(\n", - " datastore=default_store,\n", - " data_reference_name=\"yellow_taxi_data\",\n", - " path_on_datastore=\"yellow/part-00000\")\n", - "\n", - "yellow_columns = str({\n", - " \"vendorID\": \"vendor\",\n", - " \"tpepPickupDateTime\": \"pickup_datetime\",\n", - " \"tpepDropoffDateTime\": \"dropoff_datetime\",\n", - " \"storeAndFwdFlag\": \"store_forward\",\n", - " \"startLon\": \"pickup_longitude\",\n", - " \"startLat\": \"pickup_latitude\",\n", - " \"endLon\": \"dropoff_longitude\",\n", - " \"endLat\": \"dropoff_latitude\",\n", - " \"passengerCount\": \"passengers\",\n", - " \"fareAmount\": \"cost\",\n", - " \"tripDistance\": \"distance\"\n", - "}).replace(\",\", \";\")\n", - "\n", - "# Define output after cleansing step\n", - "cleansed_yellow_data = PipelineData(\"yellow_taxi_data\", datastore=default_store)\n", - "\n", - "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# cleansing step creation\n", - "# See the cleanse.py for details about input and output\n", - "cleansingStepYellow = PythonScriptStep(\n", - " name=\"Cleanse Yellow Taxi Data\",\n", - " script_name=\"cleanse.py\", \n", - " arguments=[\"--input_cleanse\", blob_yellow_data, \n", - " \"--useful_columns\", useful_columns,\n", - " \"--columns\", yellow_columns,\n", - " \"--output_cleanse\", cleansed_yellow_data],\n", - " inputs=[blob_yellow_data],\n", - " outputs=[cleansed_yellow_data],\n", - " compute_target=aml_compute,\n", - " runconfig=aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"cleansingStepYellow created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Merge cleansed Green and Yellow datasets\n", - "We are creating a single data source by merging the cleansed versions of Green and Yellow taxi data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define output after merging step\n", - "merged_data = PipelineData(\"merged_data\", datastore=default_store)\n", - "\n", - "print('Merge script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# merging step creation\n", - "# See the merge.py for details about input and output\n", - "mergingStep = PythonScriptStep(\n", - " name=\"Merge Taxi Data\",\n", - " script_name=\"merge.py\", \n", - " arguments=[\"--input_green_merge\", cleansed_green_data, \n", - " \"--input_yellow_merge\", cleansed_yellow_data,\n", - " \"--output_merge\", merged_data],\n", - " inputs=[cleansed_green_data, cleansed_yellow_data],\n", - " outputs=[merged_data],\n", - " compute_target=aml_compute,\n", - " runconfig=aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"mergingStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Filter data\n", - "This step filters out coordinates for locations that are outside the city border. We use a TypeConverter object to change the latitude and longitude fields to decimal type. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define output after merging step\n", - "filtered_data = PipelineData(\"filtered_data\", datastore=default_store)\n", - "\n", - "print('Filter script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# filter step creation\n", - "# See the filter.py for details about input and output\n", - "filterStep = PythonScriptStep(\n", - " name=\"Filter Taxi Data\",\n", - " script_name=\"filter.py\", \n", - " arguments=[\"--input_filter\", merged_data, \n", - " \"--output_filter\", filtered_data],\n", - " inputs=[merged_data],\n", - " outputs=[filtered_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"FilterStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Normalize data\n", - "In this step, we split the pickup and dropoff datetime values into the respective date and time columns and then we rename the columns to use meaningful names." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define output after normalize step\n", - "normalized_data = PipelineData(\"normalized_data\", datastore=default_store)\n", - "\n", - "print('Normalize script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# normalize step creation\n", - "# See the normalize.py for details about input and output\n", - "normalizeStep = PythonScriptStep(\n", - " name=\"Normalize Taxi Data\",\n", - " script_name=\"normalize.py\", \n", - " arguments=[\"--input_normalize\", filtered_data, \n", - " \"--output_normalize\", normalized_data],\n", - " inputs=[filtered_data],\n", - " outputs=[normalized_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"normalizeStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Transform data\n", - "Transform the normalized taxi data to final required format. This steps does the following:\n", - "\n", - "- Split the pickup and dropoff date further into the day of the week, day of the month, and month values. \n", - "- To get the day of the week value, uses the derive_column_by_example() function. The function takes an array parameter of example objects that define the input data, and the preferred output. The function automatically determines the preferred transformation. For the pickup and dropoff time columns, split the time into the hour, minute, and second by using the split_column_by_example() function with no example parameter.\n", - "- After new features are generated, use the drop_columns() function to delete the original fields as the newly generated features are preferred. \n", - "- Rename the rest of the fields to use meaningful descriptions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define output after transforme step\n", - "transformed_data = PipelineData(\"transformed_data\", datastore=default_store)\n", - "\n", - "print('Transform script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", - "\n", - "# transform step creation\n", - "# See the transform.py for details about input and output\n", - "transformStep = PythonScriptStep(\n", - " name=\"Transform Taxi Data\",\n", - " script_name=\"transform.py\", \n", - " arguments=[\"--input_transform\", normalized_data,\n", - " \"--output_transform\", transformed_data],\n", - " inputs=[normalized_data],\n", - " outputs=[transformed_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=prepare_data_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"transformStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Extract features\n", - "Add the following columns to be features for our model creation. The prediction value will be *cost*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "feature_columns = str(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor']).replace(\",\", \";\")\n", - "\n", - "train_model_folder = './scripts/trainmodel'\n", - "\n", - "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", - "\n", - "# features data after transform step\n", - "features_data = PipelineData(\"features_data\", datastore=default_store)\n", - "\n", - "# featurization step creation\n", - "# See the featurization.py for details about input and output\n", - "featurizationStep = PythonScriptStep(\n", - " name=\"Extract Features\",\n", - " script_name=\"featurization.py\", \n", - " arguments=[\"--input_featurization\", transformed_data, \n", - " \"--useful_columns\", feature_columns,\n", - " \"--output_featurization\", features_data],\n", - " inputs=[transformed_data],\n", - " outputs=[features_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=train_model_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"featurizationStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Extract label" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "label_columns = str(['cost']).replace(\",\", \";\")\n", - "\n", - "# label data after transform step\n", - "label_data = PipelineData(\"label_data\", datastore=default_store)\n", - "\n", - "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", - "\n", - "# label step creation\n", - "# See the featurization.py for details about input and output\n", - "labelStep = PythonScriptStep(\n", - " name=\"Extract Labels\",\n", - " script_name=\"featurization.py\", \n", - " arguments=[\"--input_featurization\", transformed_data, \n", - " \"--useful_columns\", label_columns,\n", - " \"--output_featurization\", label_data],\n", - " inputs=[transformed_data],\n", - " outputs=[label_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=train_model_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"labelStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Split the data into train and test sets\n", - "This function segregates the data into the **x**, features, dataset for model training and **y**, values to predict, dataset for testing." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# train and test splits output\n", - "output_split_train_x = PipelineData(\"output_split_train_x\", datastore=default_store)\n", - "output_split_train_y = PipelineData(\"output_split_train_y\", datastore=default_store)\n", - "output_split_test_x = PipelineData(\"output_split_test_x\", datastore=default_store)\n", - "output_split_test_y = PipelineData(\"output_split_test_y\", datastore=default_store)\n", - "\n", - "print('Data spilt script is in {}.'.format(os.path.realpath(train_model_folder)))\n", - "\n", - "# test train split step creation\n", - "# See the train_test_split.py for details about input and output\n", - "testTrainSplitStep = PythonScriptStep(\n", - " name=\"Train Test Data Split\",\n", - " script_name=\"train_test_split.py\", \n", - " arguments=[\"--input_split_features\", features_data, \n", - " \"--input_split_labels\", label_data,\n", - " \"--output_split_train_x\", output_split_train_x,\n", - " \"--output_split_train_y\", output_split_train_y,\n", - " \"--output_split_test_x\", output_split_test_x,\n", - " \"--output_split_test_y\", output_split_test_y],\n", - " inputs=[features_data, label_data],\n", - " outputs=[output_split_train_x, output_split_train_y, output_split_test_x, output_split_test_y],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=train_model_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"testTrainSplitStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Use automated machine learning to build regression model\n", - "Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) in AML Pipelines for this part. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Automatically train a model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create experiment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment = Experiment(ws, 'NYCTaxi_Tutorial_Pipelines')\n", - "\n", - "print(\"Experiment created\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create get_data script\n", - "\n", - "A script with `get_data()` function is necessary to fetch training features(X) and labels(Y) on remote compute, from input data. Here we use mounted path of `train_test_split` step to get the x and y train values. They are added as environment variable on compute machine by default\n", - "\n", - "Note: Every DataReference are added as environment variable on compute machine since the defualt mode is mount" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('get_data.py will be written to {}.'.format(os.path.realpath(train_model_folder)))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $train_model_folder/get_data.py\n", - "\n", - "def get_data():\n", - " print(\"In get_data\")\n", - " print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])\n", - " X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + \"/part-00000\", header=0)\n", - " y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + \"/part-00000\", header=0)\n", - " \n", - " return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Define settings for autogeneration and tuning\n", - "\n", - "Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well. Also note that we have to use a get_data() function for remote excutions. See get_data script for more details.\n", - "\n", - "Use your defined training settings as a parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case.\n", - "\n", - "Note: When using AmlCompute, we can't pass Numpy arrays directly to the fit method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "from azureml.train.automl import AutoMLConfig\n", - "\n", - "# Change iterations to a reasonable number (50) to get better accuracy\n", - "automl_settings = {\n", - " \"iteration_timeout_minutes\" : 10,\n", - " \"iterations\" : 2,\n", - " \"primary_metric\" : 'spearman_correlation',\n", - " \"preprocess\" : True,\n", - " \"verbosity\" : logging.INFO,\n", - " \"n_cross_validations\": 5\n", - "}\n", - "\n", - "automl_config = AutoMLConfig(task = 'regression',\n", - " debug_log = 'automated_ml_errors.log',\n", - " path = train_model_folder,\n", - " compute_target=aml_compute,\n", - " run_configuration=aml_run_config,\n", - " data_script = train_model_folder + \"/get_data.py\",\n", - " **automl_settings)\n", - " \n", - "print(\"AutoML config created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Define AutoMLStep" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl import AutoMLStep\n", - "\n", - "trainWithAutomlStep = AutoMLStep(\n", - " name='AutoML_Regression',\n", - " automl_config=automl_config,\n", - " inputs=[output_split_train_x, output_split_train_y],\n", - " allow_reuse=True,\n", - " hash_paths=[os.path.realpath(train_model_folder)])\n", - "\n", - "print(\"trainWithAutomlStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Build and run the pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import Pipeline\n", - "from azureml.widgets import RunDetails\n", - "\n", - "pipeline_steps = [trainWithAutomlStep]\n", - "\n", - "pipeline = Pipeline(workspace = ws, steps=pipeline_steps)\n", - "print(\"Pipeline is built.\")\n", - "\n", - "pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)\n", - "\n", - "print(\"Pipeline submitted for execution.\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore the results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Before we proceed we need to wait for the run to complete.\n", - "pipeline_run.wait_for_completion()\n", - "\n", - "# functions to download output to local and fetch as dataframe\n", - "def get_download_path(download_path, output_name):\n", - " output_folder = os.listdir(download_path + '/azureml')[0]\n", - " path = download_path + '/azureml/' + output_folder + '/' + output_name\n", - " return path\n", - "\n", - "def fetch_df(step, output_name):\n", - " output_data = step.get_output_data(output_name)\n", - " \n", - " download_path = './outputs/' + output_name\n", - " output_data.download(download_path)\n", - " df_path = get_download_path(download_path, output_name) + '/part-00000'\n", - " return dprep.auto_read_file(path=df_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View cleansed taxi data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "green_cleanse_step = pipeline_run.find_step_run(cleansingStepGreen.name)[0]\n", - "yellow_cleanse_step = pipeline_run.find_step_run(cleansingStepYellow.name)[0]\n", - "\n", - "cleansed_green_df = fetch_df(green_cleanse_step, cleansed_green_data.name)\n", - "cleansed_yellow_df = fetch_df(yellow_cleanse_step, cleansed_yellow_data.name)\n", - "\n", - "display(cleansed_green_df.head(5))\n", - "display(cleansed_yellow_df.head(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View the combined taxi data profile" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "merge_step = pipeline_run.find_step_run(mergingStep.name)[0]\n", - "combined_df = fetch_df(merge_step, merged_data.name)\n", - "\n", - "display(combined_df.get_profile())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View the filtered taxi data profile" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "filter_step = pipeline_run.find_step_run(filterStep.name)[0]\n", - "filtered_df = fetch_df(filter_step, filtered_data.name)\n", - "\n", - "display(filtered_df.get_profile())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View normalized taxi data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "normalize_step = pipeline_run.find_step_run(normalizeStep.name)[0]\n", - "normalized_df = fetch_df(normalize_step, normalized_data.name)\n", - "\n", - "display(normalized_df.head(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View transformed taxi data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "transform_step = pipeline_run.find_step_run(transformStep.name)[0]\n", - "transformed_df = fetch_df(transform_step, transformed_data.name)\n", - "\n", - "display(transformed_df.get_profile())\n", - "display(transformed_df.head(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View training data used by AutoML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", - "train_split_x = fetch_df(split_step, output_split_train_x.name)\n", - "train_split_y = fetch_df(split_step, output_split_train_y.name)\n", - "\n", - "display_x_train = train_split_x.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"])\n", - "display_y_train = train_split_y.rename_columns(column_pairs={\"Column1\": \"cost\"})\n", - "\n", - "display(display_x_train.get_profile())\n", - "display(display_x_train.head(5))\n", - "display(display_y_train.get_profile())\n", - "display(display_y_train.head(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### View the details of the AutoML run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl.run import AutoMLRun\n", - "#from azureml.widgets import RunDetails\n", - "\n", - "# workaround to get the automl run as its the last step in the pipeline \n", - "# and get_steps() returns the steps from latest to first\n", - "\n", - "for step in pipeline_run.get_steps():\n", - " automl_step_run_id = step.id\n", - " print(step.name)\n", - " print(automl_step_run_id)\n", - " break\n", - "\n", - "automl_run = AutoMLRun(experiment = experiment, run_id=automl_step_run_id)\n", - "#RunDetails(automl_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Retrieve all Child runs\n", - "\n", - "We use SDK methods to fetch all the child runs and see individual metrics that we log." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(automl_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retreive the best model\n", - "\n", - "Uncomment the below cell to retrieve the best model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# best_run, fitted_model = automl_run.get_output()\n", - "# print(best_run)\n", - "# print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Get test data\n", - "\n", - "Uncomment the below cell to get test data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", - "\n", - "# x_test = fetch_df(split_step, output_split_test_x.name)\n", - "# y_test = fetch_df(split_step, output_split_test_y.name)\n", - "\n", - "# display(x_test.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"]).head(5))\n", - "# display(y_test.rename_columns(column_pairs={\"Column1\": \"cost\"}).head(5))\n", - "\n", - "# x_test = x_test.to_pandas_dataframe()\n", - "# y_test = y_test.to_pandas_dataframe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Test the best fitted model\n", - "\n", - "Uncomment the below cell to test the best fitted model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# y_predict = fitted_model.predict(x_test.values)\n", - "\n", - "# y_actual = y_test.iloc[:,0].values.tolist()\n", - "\n", - "# display(pd.DataFrame({'Actual':y_actual, 'Predicted':y_predict}).head(5))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import matplotlib.pyplot as plt\n", - "\n", - "# fig = plt.figure(figsize=(14, 10))\n", - "# ax1 = fig.add_subplot(111)\n", - "\n", - "# distance_vals = [x[4] for x in x_test.values]\n", - "\n", - "# ax1.scatter(distance_vals[:100], y_predict[:100], s=18, c='b', marker=\"s\", label='Predicted')\n", - "# ax1.scatter(distance_vals[:100], y_actual[:100], s=18, c='r', marker=\"o\", label='Actual')\n", - "\n", - "# ax1.set_xlabel('distance (mi)')\n", - "# ax1.set_title('Predicted and Actual Cost/Distance')\n", - "# ax1.set_ylabel('Cost ($)')\n", - "\n", - "# plt.legend(loc='upper left', prop={'size': 12})\n", - "# plt.rcParams.update({'font.size': 14})\n", - "# plt.show()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sanpil" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# NYC Taxi Data Regression Model\n", + "This is an [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) version of two-part tutorial ([Part 1](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep), [Part 2](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models)) available for Azure Machine Learning.\n", + "\n", + "You can combine the two part tutorial into one using AzureML Pipelines as Pipelines provide a way to stitch together various steps involved (like data preparation and training in this case) in a machine learning workflow.\n", + "\n", + "In this notebook, you learn how to prepare data for regression modeling by using the [Azure Machine Learning Data Prep SDK](https://aka.ms/data-prep-sdk) for Python. You run various transformations to filter and combine two different NYC taxi data sets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", + "\n", + "After you complete building the model, you can predict the cost of a taxi trip by training a model on data features. These features include the pickup day and time, the number of passengers, and the pickup location.\n", + "\n", + "## Prerequisite\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", + "\n", + "We will run various transformations to filter and combine two different NYC taxi data sets. We will use DataPrep SDK for this preparing data. \n", + "\n", + "Perform `pip install azureml-dataprep` if you have't already done so." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare data for regression modeling\n", + "First, we will prepare data for regression modeling. We will leverage the convenience of Azure Open Datasets along with the power of Azure Machine Learning service to create a regression model to predict NYC taxi fare prices. Perform `pip install azureml-contrib-opendatasets` to get the open dataset package. The Open Datasets package contains a class representing each data source (NycTlcGreen and NycTlcYellow) to easily filter date parameters before downloading.\n", + "\n", + "\n", + "### Load data\n", + "Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid MemoryError with large datasets. To download a year of taxi data, iteratively fetch one month at a time, and before appending it to green_df_raw, randomly sample 500 records from each month to avoid bloating the dataframe. Then preview the data. To keep this process short, we are sampling data of only 1 month.\n", + "\n", + "Note: Open Datasets has mirroring classes for working in Spark environments where data size and memory aren't a concern." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.opendatasets import NycTlcGreen, NycTlcYellow\n", + "import pandas as pd\n", + "from datetime import datetime\n", + "from dateutil.relativedelta import relativedelta\n", + "\n", + "green_df_raw = pd.DataFrame([])\n", + "start = datetime.strptime(\"1/1/2016\",\"%m/%d/%Y\")\n", + "end = datetime.strptime(\"1/31/2016\",\"%m/%d/%Y\")\n", + "\n", + "number_of_months = 1\n", + "sample_size = 5000\n", + "\n", + "for sample_month in range(number_of_months):\n", + " temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", + " .to_pandas_dataframe()\n", + " green_df_raw = green_df_raw.append(temp_df_green.sample(sample_size))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "yellow_df_raw = pd.DataFrame([])\n", + "start = datetime.strptime(\"1/1/2016\",\"%m/%d/%Y\")\n", + "end = datetime.strptime(\"1/31/2016\",\"%m/%d/%Y\")\n", + "\n", + "sample_size = 500\n", + "\n", + "for sample_month in range(number_of_months):\n", + " temp_df_yellow = NycTlcYellow(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", + " .to_pandas_dataframe()\n", + " yellow_df_raw = yellow_df_raw.append(temp_df_yellow.sample(sample_size))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### See the data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.dataprep as dprep\n", + "from IPython.display import display\n", + "\n", + "display(green_df_raw.head(5))\n", + "display(yellow_df_raw.head(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download data locally and then upload to Azure Blob\n", + "This is a one-time process to save the dave in the default datastore. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "dataDir = \"data\"\n", + "\n", + "if not os.path.exists(dataDir):\n", + " os.mkdir(dataDir)\n", + "\n", + "greenDir = dataDir + \"/green\"\n", + "yelloDir = dataDir + \"/yellow\"\n", + "\n", + "if not os.path.exists(greenDir):\n", + " os.mkdir(greenDir)\n", + " \n", + "if not os.path.exists(yelloDir):\n", + " os.mkdir(yelloDir)\n", + " \n", + "greenTaxiData = greenDir + \"/part-00000\"\n", + "yellowTaxiData = yelloDir + \"/part-00000\"\n", + "\n", + "green_df_raw.to_csv(greenTaxiData, index=False)\n", + "yellow_df_raw.to_csv(yellowTaxiData, index=False)\n", + "\n", + "print(\"Data written to local folder.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(\"Workspace: \" + ws.name, \"Region: \" + ws.location, sep = '\\n')\n", + "\n", + "# Default datastore\n", + "default_store = ws.get_default_datastore() \n", + "\n", + "default_store.upload_files([greenTaxiData], \n", + " target_path = 'green', \n", + " overwrite = False, \n", + " show_progress = True)\n", + "\n", + "default_store.upload_files([yellowTaxiData], \n", + " target_path = 'yellow', \n", + " overwrite = False, \n", + " show_progress = True)\n", + "\n", + "print(\"Upload calls completed.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Compute\n", + "#### Create new or use an existing compute" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "aml_compute = ws.get_default_compute_target(\"CPU\")\n", + "aml_compute" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Define RunConfig for the compute\n", + "We need `azureml-dataprep` SDK for all the steps below. We will also use `pandas`, `scikit-learn` and `automl` for the training step. Defining the `runconfig` for that." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# Create a new runconfig object\n", + "aml_run_config = RunConfiguration()\n", + "\n", + "# Use the aml_compute you created above. \n", + "aml_run_config.target = aml_compute\n", + "\n", + "# Enable Docker\n", + "aml_run_config.environment.docker.enabled = True\n", + "\n", + "# Set Docker base image to the default CPU-based image\n", + "aml_run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/base:0.2.1\"\n", + "\n", + "# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", + "aml_run_config.environment.python.user_managed_dependencies = False\n", + "\n", + "# Auto-prepare the Docker image when used for execution (if it is not already prepared)\n", + "aml_run_config.auto_prepare_environment = True\n", + "\n", + "# Specify CondaDependencies obj, add necessary packages\n", + "aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", + " conda_packages=['pandas','scikit-learn'], \n", + " pip_packages=['azureml-sdk', 'azureml-dataprep', 'azureml-train-automl==1.0.33'], \n", + " pin_sdk_version=False)\n", + "\n", + "print (\"Run configuration created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare data\n", + "Now we will prepare for regression modeling by using the `Azure Machine Learning Data Prep SDK for Python`. We run various transformations to filter and combine two different NYC taxi data sets.\n", + "\n", + "We achieve this by creating a separate step for each transformation as this allows us to reuse the steps and saves us from running all over again in case of any change. We will keep data preparation scripts in one subfolder and training scripts in another.\n", + "\n", + "> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Define Useful Colums\n", + "Here we are defining a set of \"useful\" columns for both Green and Yellow taxi data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(green_df_raw.columns)\n", + "display(yellow_df_raw.columns)\n", + "\n", + "# useful columns needed for the Azure Machine Learning NYC Taxi tutorial\n", + "useful_columns = str([\"cost\", \"distance\", \"dropoff_datetime\", \"dropoff_latitude\", \n", + " \"dropoff_longitude\", \"passengers\", \"pickup_datetime\", \n", + " \"pickup_latitude\", \"pickup_longitude\", \"store_forward\", \"vendor\"]).replace(\",\", \";\")\n", + "\n", + "print(\"Useful columns defined.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleanse Green taxi data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.data.data_reference import DataReference \n", + "from azureml.pipeline.core import PipelineData\n", + "from azureml.pipeline.steps import PythonScriptStep\n", + "\n", + "# python scripts folder\n", + "prepare_data_folder = './scripts/prepdata'\n", + "\n", + "blob_green_data = DataReference(\n", + " datastore=default_store,\n", + " data_reference_name=\"green_taxi_data\",\n", + " path_on_datastore=\"green/part-00000\")\n", + "\n", + "# rename columns as per Azure Machine Learning NYC Taxi tutorial\n", + "green_columns = str({ \n", + " \"vendorID\": \"vendor\",\n", + " \"lpepPickupDatetime\": \"pickup_datetime\",\n", + " \"lpepDropoffDatetime\": \"dropoff_datetime\",\n", + " \"storeAndFwdFlag\": \"store_forward\",\n", + " \"pickupLongitude\": \"pickup_longitude\",\n", + " \"pickupLatitude\": \"pickup_latitude\",\n", + " \"dropoffLongitude\": \"dropoff_longitude\",\n", + " \"dropoffLatitude\": \"dropoff_latitude\",\n", + " \"passengerCount\": \"passengers\",\n", + " \"fareAmount\": \"cost\",\n", + " \"tripDistance\": \"distance\"\n", + "}).replace(\",\", \";\")\n", + "\n", + "# Define output after cleansing step\n", + "cleansed_green_data = PipelineData(\"green_taxi_data\", datastore=default_store)\n", + "\n", + "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# cleansing step creation\n", + "# See the cleanse.py for details about input and output\n", + "cleansingStepGreen = PythonScriptStep(\n", + " name=\"Cleanse Green Taxi Data\",\n", + " script_name=\"cleanse.py\", \n", + " arguments=[\"--input_cleanse\", blob_green_data, \n", + " \"--useful_columns\", useful_columns,\n", + " \"--columns\", green_columns,\n", + " \"--output_cleanse\", cleansed_green_data],\n", + " inputs=[blob_green_data],\n", + " outputs=[cleansed_green_data],\n", + " compute_target=aml_compute,\n", + " runconfig=aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"cleansingStepGreen created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleanse Yellow taxi data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "blob_yellow_data = DataReference(\n", + " datastore=default_store,\n", + " data_reference_name=\"yellow_taxi_data\",\n", + " path_on_datastore=\"yellow/part-00000\")\n", + "\n", + "yellow_columns = str({\n", + " \"vendorID\": \"vendor\",\n", + " \"tpepPickupDateTime\": \"pickup_datetime\",\n", + " \"tpepDropoffDateTime\": \"dropoff_datetime\",\n", + " \"storeAndFwdFlag\": \"store_forward\",\n", + " \"startLon\": \"pickup_longitude\",\n", + " \"startLat\": \"pickup_latitude\",\n", + " \"endLon\": \"dropoff_longitude\",\n", + " \"endLat\": \"dropoff_latitude\",\n", + " \"passengerCount\": \"passengers\",\n", + " \"fareAmount\": \"cost\",\n", + " \"tripDistance\": \"distance\"\n", + "}).replace(\",\", \";\")\n", + "\n", + "# Define output after cleansing step\n", + "cleansed_yellow_data = PipelineData(\"yellow_taxi_data\", datastore=default_store)\n", + "\n", + "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# cleansing step creation\n", + "# See the cleanse.py for details about input and output\n", + "cleansingStepYellow = PythonScriptStep(\n", + " name=\"Cleanse Yellow Taxi Data\",\n", + " script_name=\"cleanse.py\", \n", + " arguments=[\"--input_cleanse\", blob_yellow_data, \n", + " \"--useful_columns\", useful_columns,\n", + " \"--columns\", yellow_columns,\n", + " \"--output_cleanse\", cleansed_yellow_data],\n", + " inputs=[blob_yellow_data],\n", + " outputs=[cleansed_yellow_data],\n", + " compute_target=aml_compute,\n", + " runconfig=aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"cleansingStepYellow created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Merge cleansed Green and Yellow datasets\n", + "We are creating a single data source by merging the cleansed versions of Green and Yellow taxi data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define output after merging step\n", + "merged_data = PipelineData(\"merged_data\", datastore=default_store)\n", + "\n", + "print('Merge script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# merging step creation\n", + "# See the merge.py for details about input and output\n", + "mergingStep = PythonScriptStep(\n", + " name=\"Merge Taxi Data\",\n", + " script_name=\"merge.py\", \n", + " arguments=[\"--input_green_merge\", cleansed_green_data, \n", + " \"--input_yellow_merge\", cleansed_yellow_data,\n", + " \"--output_merge\", merged_data],\n", + " inputs=[cleansed_green_data, cleansed_yellow_data],\n", + " outputs=[merged_data],\n", + " compute_target=aml_compute,\n", + " runconfig=aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"mergingStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Filter data\n", + "This step filters out coordinates for locations that are outside the city border. We use a TypeConverter object to change the latitude and longitude fields to decimal type. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define output after merging step\n", + "filtered_data = PipelineData(\"filtered_data\", datastore=default_store)\n", + "\n", + "print('Filter script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# filter step creation\n", + "# See the filter.py for details about input and output\n", + "filterStep = PythonScriptStep(\n", + " name=\"Filter Taxi Data\",\n", + " script_name=\"filter.py\", \n", + " arguments=[\"--input_filter\", merged_data, \n", + " \"--output_filter\", filtered_data],\n", + " inputs=[merged_data],\n", + " outputs=[filtered_data],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"FilterStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Normalize data\n", + "In this step, we split the pickup and dropoff datetime values into the respective date and time columns and then we rename the columns to use meaningful names." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define output after normalize step\n", + "normalized_data = PipelineData(\"normalized_data\", datastore=default_store)\n", + "\n", + "print('Normalize script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# normalize step creation\n", + "# See the normalize.py for details about input and output\n", + "normalizeStep = PythonScriptStep(\n", + " name=\"Normalize Taxi Data\",\n", + " script_name=\"normalize.py\", \n", + " arguments=[\"--input_normalize\", filtered_data, \n", + " \"--output_normalize\", normalized_data],\n", + " inputs=[filtered_data],\n", + " outputs=[normalized_data],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"normalizeStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Transform data\n", + "Transform the normalized taxi data to final required format. This steps does the following:\n", + "\n", + "- Split the pickup and dropoff date further into the day of the week, day of the month, and month values. \n", + "- To get the day of the week value, uses the derive_column_by_example() function. The function takes an array parameter of example objects that define the input data, and the preferred output. The function automatically determines the preferred transformation. For the pickup and dropoff time columns, split the time into the hour, minute, and second by using the split_column_by_example() function with no example parameter.\n", + "- After new features are generated, use the drop_columns() function to delete the original fields as the newly generated features are preferred. \n", + "- Rename the rest of the fields to use meaningful descriptions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define output after transforme step\n", + "transformed_data = PipelineData(\"transformed_data\", datastore=default_store)\n", + "\n", + "print('Transform script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", + "\n", + "# transform step creation\n", + "# See the transform.py for details about input and output\n", + "transformStep = PythonScriptStep(\n", + " name=\"Transform Taxi Data\",\n", + " script_name=\"transform.py\", \n", + " arguments=[\"--input_transform\", normalized_data,\n", + " \"--output_transform\", transformed_data],\n", + " inputs=[normalized_data],\n", + " outputs=[transformed_data],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=prepare_data_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"transformStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Extract features\n", + "Add the following columns to be features for our model creation. The prediction value will be *cost*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "feature_columns = str(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor']).replace(\",\", \";\")\n", + "\n", + "train_model_folder = './scripts/trainmodel'\n", + "\n", + "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", + "\n", + "# features data after transform step\n", + "features_data = PipelineData(\"features_data\", datastore=default_store)\n", + "\n", + "# featurization step creation\n", + "# See the featurization.py for details about input and output\n", + "featurizationStep = PythonScriptStep(\n", + " name=\"Extract Features\",\n", + " script_name=\"featurization.py\", \n", + " arguments=[\"--input_featurization\", transformed_data, \n", + " \"--useful_columns\", feature_columns,\n", + " \"--output_featurization\", features_data],\n", + " inputs=[transformed_data],\n", + " outputs=[features_data],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=train_model_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"featurizationStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Extract label" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "label_columns = str(['cost']).replace(\",\", \";\")\n", + "\n", + "# label data after transform step\n", + "label_data = PipelineData(\"label_data\", datastore=default_store)\n", + "\n", + "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", + "\n", + "# label step creation\n", + "# See the featurization.py for details about input and output\n", + "labelStep = PythonScriptStep(\n", + " name=\"Extract Labels\",\n", + " script_name=\"featurization.py\", \n", + " arguments=[\"--input_featurization\", transformed_data, \n", + " \"--useful_columns\", label_columns,\n", + " \"--output_featurization\", label_data],\n", + " inputs=[transformed_data],\n", + " outputs=[label_data],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=train_model_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"labelStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Split the data into train and test sets\n", + "This function segregates the data into the **x**, features, dataset for model training and **y**, values to predict, dataset for testing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# train and test splits output\n", + "output_split_train_x = PipelineData(\"output_split_train_x\", datastore=default_store)\n", + "output_split_train_y = PipelineData(\"output_split_train_y\", datastore=default_store)\n", + "output_split_test_x = PipelineData(\"output_split_test_x\", datastore=default_store)\n", + "output_split_test_y = PipelineData(\"output_split_test_y\", datastore=default_store)\n", + "\n", + "print('Data spilt script is in {}.'.format(os.path.realpath(train_model_folder)))\n", + "\n", + "# test train split step creation\n", + "# See the train_test_split.py for details about input and output\n", + "testTrainSplitStep = PythonScriptStep(\n", + " name=\"Train Test Data Split\",\n", + " script_name=\"train_test_split.py\", \n", + " arguments=[\"--input_split_features\", features_data, \n", + " \"--input_split_labels\", label_data,\n", + " \"--output_split_train_x\", output_split_train_x,\n", + " \"--output_split_train_y\", output_split_train_y,\n", + " \"--output_split_test_x\", output_split_test_x,\n", + " \"--output_split_test_y\", output_split_test_y],\n", + " inputs=[features_data, label_data],\n", + " outputs=[output_split_train_x, output_split_train_y, output_split_test_x, output_split_test_y],\n", + " compute_target=aml_compute,\n", + " runconfig = aml_run_config,\n", + " source_directory=train_model_folder,\n", + " allow_reuse=True\n", + ")\n", + "\n", + "print(\"testTrainSplitStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use automated machine learning to build regression model\n", + "Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) in AML Pipelines for this part. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Automatically train a model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment = Experiment(ws, 'NYCTaxi_Tutorial_Pipelines')\n", + "\n", + "print(\"Experiment created\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create get_data script\n", + "\n", + "A script with `get_data()` function is necessary to fetch training features(X) and labels(Y) on remote compute, from input data. Here we use mounted path of `train_test_split` step to get the x and y train values. They are added as environment variable on compute machine by default\n", + "\n", + "Note: Every DataReference are added as environment variable on compute machine since the defualt mode is mount" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('get_data.py will be written to {}.'.format(os.path.realpath(train_model_folder)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $train_model_folder/get_data.py\n", + "\n", + "def get_data():\n", + " print(\"In get_data\")\n", + " print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])\n", + " X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + \"/part-00000\", header=0)\n", + " y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + \"/part-00000\", header=0)\n", + " \n", + " return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Define settings for autogeneration and tuning\n", + "\n", + "Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well. Also note that we have to use a get_data() function for remote excutions. See get_data script for more details.\n", + "\n", + "Use your defined training settings as a parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case.\n", + "\n", + "Note: When using AmlCompute, we can't pass Numpy arrays directly to the fit method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "from azureml.train.automl import AutoMLConfig\n", + "\n", + "# Change iterations to a reasonable number (50) to get better accuracy\n", + "automl_settings = {\n", + " \"iteration_timeout_minutes\" : 10,\n", + " \"iterations\" : 2,\n", + " \"primary_metric\" : 'spearman_correlation',\n", + " \"preprocess\" : True,\n", + " \"verbosity\" : logging.INFO,\n", + " \"n_cross_validations\": 5\n", + "}\n", + "\n", + "automl_config = AutoMLConfig(task = 'regression',\n", + " debug_log = 'automated_ml_errors.log',\n", + " path = train_model_folder,\n", + " compute_target=aml_compute,\n", + " run_configuration=aml_run_config,\n", + " data_script = train_model_folder + \"/get_data.py\",\n", + " **automl_settings)\n", + " \n", + "print(\"AutoML config created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Define AutoMLStep" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.automl import AutoMLStep\n", + "\n", + "trainWithAutomlStep = AutoMLStep(\n", + " name='AutoML_Regression',\n", + " automl_config=automl_config,\n", + " inputs=[output_split_train_x, output_split_train_y],\n", + " allow_reuse=True,\n", + " hash_paths=[os.path.realpath(train_model_folder)])\n", + "\n", + "print(\"trainWithAutomlStep created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Build and run the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import Pipeline\n", + "from azureml.widgets import RunDetails\n", + "\n", + "pipeline_steps = [trainWithAutomlStep]\n", + "\n", + "pipeline = Pipeline(workspace = ws, steps=pipeline_steps)\n", + "print(\"Pipeline is built.\")\n", + "\n", + "pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)\n", + "\n", + "print(\"Pipeline submitted for execution.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(pipeline_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Explore the results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Before we proceed we need to wait for the run to complete.\n", + "pipeline_run.wait_for_completion()\n", + "\n", + "# functions to download output to local and fetch as dataframe\n", + "def get_download_path(download_path, output_name):\n", + " output_folder = os.listdir(download_path + '/azureml')[0]\n", + " path = download_path + '/azureml/' + output_folder + '/' + output_name\n", + " return path\n", + "\n", + "def fetch_df(step, output_name):\n", + " output_data = step.get_output_data(output_name)\n", + " \n", + " download_path = './outputs/' + output_name\n", + " output_data.download(download_path)\n", + " df_path = get_download_path(download_path, output_name) + '/part-00000'\n", + " return dprep.auto_read_file(path=df_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View cleansed taxi data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "green_cleanse_step = pipeline_run.find_step_run(cleansingStepGreen.name)[0]\n", + "yellow_cleanse_step = pipeline_run.find_step_run(cleansingStepYellow.name)[0]\n", + "\n", + "cleansed_green_df = fetch_df(green_cleanse_step, cleansed_green_data.name)\n", + "cleansed_yellow_df = fetch_df(yellow_cleanse_step, cleansed_yellow_data.name)\n", + "\n", + "display(cleansed_green_df.head(5))\n", + "display(cleansed_yellow_df.head(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View the combined taxi data profile" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "merge_step = pipeline_run.find_step_run(mergingStep.name)[0]\n", + "combined_df = fetch_df(merge_step, merged_data.name)\n", + "\n", + "display(combined_df.get_profile())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View the filtered taxi data profile" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "filter_step = pipeline_run.find_step_run(filterStep.name)[0]\n", + "filtered_df = fetch_df(filter_step, filtered_data.name)\n", + "\n", + "display(filtered_df.get_profile())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View normalized taxi data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "normalize_step = pipeline_run.find_step_run(normalizeStep.name)[0]\n", + "normalized_df = fetch_df(normalize_step, normalized_data.name)\n", + "\n", + "display(normalized_df.head(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View transformed taxi data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "transform_step = pipeline_run.find_step_run(transformStep.name)[0]\n", + "transformed_df = fetch_df(transform_step, transformed_data.name)\n", + "\n", + "display(transformed_df.get_profile())\n", + "display(transformed_df.head(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View training data used by AutoML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", + "train_split_x = fetch_df(split_step, output_split_train_x.name)\n", + "train_split_y = fetch_df(split_step, output_split_train_y.name)\n", + "\n", + "display_x_train = train_split_x.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"])\n", + "display_y_train = train_split_y.rename_columns(column_pairs={\"Column1\": \"cost\"})\n", + "\n", + "display(display_x_train.get_profile())\n", + "display(display_x_train.head(5))\n", + "display(display_y_train.get_profile())\n", + "display(display_y_train.head(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View the details of the AutoML run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.automl.run import AutoMLRun\n", + "#from azureml.widgets import RunDetails\n", + "\n", + "# workaround to get the automl run as its the last step in the pipeline \n", + "# and get_steps() returns the steps from latest to first\n", + "\n", + "for step in pipeline_run.get_steps():\n", + " automl_step_run_id = step.id\n", + " print(step.name)\n", + " print(automl_step_run_id)\n", + " break\n", + "\n", + "automl_run = AutoMLRun(experiment = experiment, run_id=automl_step_run_id)\n", + "#RunDetails(automl_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Retrieve all Child runs\n", + "\n", + "We use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(automl_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retreive the best model\n", + "\n", + "Uncomment the below cell to retrieve the best model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# best_run, fitted_model = automl_run.get_output()\n", + "# print(best_run)\n", + "# print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Get test data\n", + "\n", + "Uncomment the below cell to get test data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", + "\n", + "# x_test = fetch_df(split_step, output_split_test_x.name)\n", + "# y_test = fetch_df(split_step, output_split_test_y.name)\n", + "\n", + "# display(x_test.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"]).head(5))\n", + "# display(y_test.rename_columns(column_pairs={\"Column1\": \"cost\"}).head(5))\n", + "\n", + "# x_test = x_test.to_pandas_dataframe()\n", + "# y_test = y_test.to_pandas_dataframe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test the best fitted model\n", + "\n", + "Uncomment the below cell to test the best fitted model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# y_predict = fitted_model.predict(x_test.values)\n", + "\n", + "# y_actual = y_test.iloc[:,0].values.tolist()\n", + "\n", + "# display(pd.DataFrame({'Actual':y_actual, 'Predicted':y_predict}).head(5))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# import matplotlib.pyplot as plt\n", + "\n", + "# fig = plt.figure(figsize=(14, 10))\n", + "# ax1 = fig.add_subplot(111)\n", + "\n", + "# distance_vals = [x[4] for x in x_test.values]\n", + "\n", + "# ax1.scatter(distance_vals[:100], y_predict[:100], s=18, c='b', marker=\"s\", label='Predicted')\n", + "# ax1.scatter(distance_vals[:100], y_actual[:100], s=18, c='r', marker=\"o\", label='Actual')\n", + "\n", + "# ax1.set_xlabel('distance (mi)')\n", + "# ax1.set_title('Predicted and Actual Cost/Distance')\n", + "# ax1.set_ylabel('Cost ($)')\n", + "\n", + "# plt.legend(loc='upper left', prop={'size': 12})\n", + "# plt.rcParams.update({'font.size': 14})\n", + "# plt.show()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sanpil" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb index bf2a4dae..51c5dd75 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb @@ -1,603 +1,603 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Using Azure Machine Learning Pipelines for batch prediction\n", - "\n", - "In this notebook we will demonstrate how to run a batch scoring job using Azure Machine Learning pipelines. Our example job will be to take an already-trained image classification model, and run that model on some unlabeled images. The image classification model that we'll use is the __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and we'll run this model on unlabeled images from the __[ImageNet](http://image-net.org/)__ dataset. \n", - "\n", - "The outline of this notebook is as follows:\n", - "\n", - "- Register the pretrained inception model into the model registry. \n", - "- Store the dataset images in a blob container.\n", - "- Use the registered model to do batch scoring on the images in the data blob container." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "from azureml.core.compute import AmlCompute, ComputeTarget\n", - "from azureml.core.datastore import Datastore\n", - "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up machine learning resources" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set up datastores\n", - "First, let\u00e2\u20ac\u2122s access the datastore that has the model, labels, and images. \n", - "\n", - "### Create a datastore that points to a blob container containing sample images\n", - "\n", - "We have created a public blob container `sampledata` on an account named `pipelinedata`, containing images from the ImageNet evaluation set. In the next step, we create a datastore with the name `images_datastore`, which points to this container. In the call to `register_azure_blob_container` below, setting the `overwrite` flag to `True` overwrites any datastore that was created previously with that name. \n", - "\n", - "This step can be changed to point to your blob container by providing your own `datastore_name`, `container_name`, and `account_name`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "account_name = \"pipelinedata\"\n", - "datastore_name=\"images_datastore\"\n", - "container_name=\"sampledata\"\n", - "\n", - "batchscore_blob = Datastore.register_azure_blob_container(ws, \n", - " datastore_name=datastore_name, \n", - " container_name= container_name, \n", - " account_name=account_name, \n", - " overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, let\u00e2\u20ac\u2122s specify the default datastore for the outputs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def_data_store = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure data references\n", - "Now you need to add references to the data, as inputs to the appropriate pipeline steps in your pipeline. A data source in a pipeline is represented by a DataReference object. The DataReference object points to data that lives in, or is accessible from, a datastore. We need DataReference objects corresponding to the following: the directory containing the input images, the directory in which the pretrained model is stored, the directory containing the labels, and the output directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "input_images = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_images\",\n", - " path_on_datastore=\"batchscoring/images\",\n", - " mode=\"download\"\n", - " )\n", - "model_dir = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_model\",\n", - " path_on_datastore=\"batchscoring/models\",\n", - " mode=\"download\" \n", - " )\n", - "label_dir = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_labels\",\n", - " path_on_datastore=\"batchscoring/labels\",\n", - " mode=\"download\" \n", - " )\n", - "output_dir = PipelineData(name=\"scores\", \n", - " datastore=def_data_store, \n", - " output_path_on_compute=\"batchscoring/results\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create and attach Compute targets\n", - "Use the below code to create and attach Compute targets. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# choose a name for your cluster\n", - "aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n", - "cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n", - "cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n", - "vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n", - "\n", - "\n", - "if aml_compute_name in ws.compute_targets:\n", - " compute_target = ws.compute_targets[aml_compute_name]\n", - " if compute_target and type(compute_target) is AmlCompute:\n", - " print('found compute target. just use it. ' + aml_compute_name)\n", - "else:\n", - " print('creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n", - " vm_priority = 'lowpriority', # optional\n", - " min_nodes = cluster_min_nodes, \n", - " max_nodes = cluster_max_nodes)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n", - " \n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it will use the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - " \n", - " # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n", - " print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare the Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download the Model\n", - "\n", - "Download and extract the model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to `\"models\"`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create directory for model\n", - "model_dir = 'models'\n", - "if not os.path.isdir(model_dir):\n", - " os.mkdir(model_dir)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tarfile\n", - "import urllib.request\n", - "\n", - "url=\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\"\n", - "response = urllib.request.urlretrieve(url, \"model.tar.gz\")\n", - "tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n", - "tar.extractall(model_dir)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register the model with Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "from azureml.core.model import Model\n", - "\n", - "# register downloaded model \n", - "model = Model.register(model_path = \"models/inception_v3.ckpt\",\n", - " model_name = \"inception\", # this is the name the model is registered as\n", - " tags = {'pretrained': \"inception\"},\n", - " description = \"Imagenet trained tensorflow inception\",\n", - " workspace = ws)\n", - "# remove the downloaded dir after registration if you wish\n", - "shutil.rmtree(\"models\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Write your scoring script" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To do the scoring, we use a batch scoring script `batch_scoring.py`, which is located in the same directory that this notebook is in. You can take a look at this script to see how you might modify it for your custom batch scoring task.\n", - "\n", - "The python script `batch_scoring.py` takes input images, applies the image classification model to these images, and outputs a classification result to a results file.\n", - "\n", - "The script `batch_scoring.py` takes the following parameters:\n", - "\n", - "- `--model_name`: the name of the model being used, which is expected to be in the `model_dir` directory\n", - "- `--label_dir` : the directory holding the `labels.txt` file \n", - "- `--dataset_path`: the directory containing the input images\n", - "- `--output_dir` : the script will run the model on the data and output a `results-label.txt` to this directory\n", - "- `--batch_size` : the batch size used in running the model.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Build and run the batch scoring pipeline\n", - "You have everything you need to build the pipeline. Let\u00e2\u20ac\u2122s put all these together." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify the environment to run the script\n", - "Specify the conda dependencies for your script. You will need this object when you create the pipeline step later on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", - "\n", - "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n", - "\n", - "# Runconfig\n", - "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", - "amlcompute_run_config.environment.docker.enabled = True\n", - "amlcompute_run_config.environment.docker.gpu_support = True\n", - "amlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n", - "amlcompute_run_config.environment.spark.precache_packages = False" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify the parameters for your pipeline\n", - "A subset of the parameters to the python script can be given as input when we re-run a `PublishedPipeline`. In the current example, we define `batch_size` taken by the script as such parameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.graph import PipelineParameter\n", - "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the pipeline step\n", - "Create the pipeline step using the script, environment configuration, and parameters. Specify the compute target you already attached to your workspace as the target of execution of the script. We will use PythonScriptStep to create the pipeline step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "inception_model_name = \"inception_v3.ckpt\"\n", - "\n", - "batch_score_step = PythonScriptStep(\n", - " name=\"batch_scoring\",\n", - " script_name=\"batch_scoring.py\",\n", - " arguments=[\"--dataset_path\", input_images, \n", - " \"--model_name\", \"inception\",\n", - " \"--label_dir\", label_dir, \n", - " \"--output_dir\", output_dir, \n", - " \"--batch_size\", batch_size_param],\n", - " compute_target=compute_target,\n", - " inputs=[input_images, label_dir],\n", - " outputs=[output_dir],\n", - " runconfig=amlcompute_run_config\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run the pipeline\n", - "At this point you can run the pipeline and examine the output it produced. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", - "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_params={\"param_batch_size\": 20})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download and review output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "step_run = list(pipeline_run.get_children())[0]\n", - "step_run.download_file(\"./outputs/result-labels.txt\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", - "df.columns = [\"Filename\", \"Prediction\"]\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Publish a pipeline and rerun using a REST call" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a published pipeline\n", - "Once you are satisfied with the outcome of the run, you can publish the pipeline to run it with different input values later. When you publish a pipeline, you will get a REST endpoint that accepts invoking of the pipeline with the set of parameters you have already incorporated above using PipelineParameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline = pipeline_run.publish_pipeline(\n", - " name=\"Inception_v3_scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get published pipeline\n", - "\n", - "You can get the published pipeline using **pipeline id**.\n", - "\n", - "To get all the published pipelines for a given workspace(ws): \n", - "```css\n", - "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import PublishedPipeline\n", - "\n", - "pipeline_id = published_pipeline.id # use your published pipeline id\n", - "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Rerun the pipeline using the REST endpoint" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get AAD token\n", - "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.authentication import InteractiveLoginAuthentication\n", - "import requests\n", - "\n", - "auth = InteractiveLoginAuthentication()\n", - "aad_token = auth.get_authentication_header()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run published pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rest_endpoint = published_pipeline.endpoint\n", - "# specify batch size when running the pipeline\n", - "response = requests.post(rest_endpoint, \n", - " headers=aad_token, \n", - " json={\"ExperimentName\": \"batch_scoring\",\n", - " \"ParameterAssignments\": {\"param_batch_size\": 50}})\n", - "run_id = response.json()[\"Id\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the new run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.run import PipelineRun\n", - "published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run).show()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sanpil" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using Azure Machine Learning Pipelines for batch prediction\n", + "\n", + "In this notebook we will demonstrate how to run a batch scoring job using Azure Machine Learning pipelines. Our example job will be to take an already-trained image classification model, and run that model on some unlabeled images. The image classification model that we'll use is the __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and we'll run this model on unlabeled images from the __[ImageNet](http://image-net.org/)__ dataset. \n", + "\n", + "The outline of this notebook is as follows:\n", + "\n", + "- Register the pretrained inception model into the model registry. \n", + "- Store the dataset images in a blob container.\n", + "- Use the registered model to do batch scoring on the images in the data blob container." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.core.datastore import Datastore\n", + "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.pipeline.core import Pipeline, PipelineData\n", + "from azureml.pipeline.steps import PythonScriptStep" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up machine learning resources" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set up datastores\n", + "First, let’s access the datastore that has the model, labels, and images. \n", + "\n", + "### Create a datastore that points to a blob container containing sample images\n", + "\n", + "We have created a public blob container `sampledata` on an account named `pipelinedata`, containing images from the ImageNet evaluation set. In the next step, we create a datastore with the name `images_datastore`, which points to this container. In the call to `register_azure_blob_container` below, setting the `overwrite` flag to `True` overwrites any datastore that was created previously with that name. \n", + "\n", + "This step can be changed to point to your blob container by providing your own `datastore_name`, `container_name`, and `account_name`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "account_name = \"pipelinedata\"\n", + "datastore_name=\"images_datastore\"\n", + "container_name=\"sampledata\"\n", + "\n", + "batchscore_blob = Datastore.register_azure_blob_container(ws, \n", + " datastore_name=datastore_name, \n", + " container_name= container_name, \n", + " account_name=account_name, \n", + " overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, let’s specify the default datastore for the outputs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def_data_store = ws.get_default_datastore()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure data references\n", + "Now you need to add references to the data, as inputs to the appropriate pipeline steps in your pipeline. A data source in a pipeline is represented by a DataReference object. The DataReference object points to data that lives in, or is accessible from, a datastore. We need DataReference objects corresponding to the following: the directory containing the input images, the directory in which the pretrained model is stored, the directory containing the labels, and the output directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "input_images = DataReference(datastore=batchscore_blob, \n", + " data_reference_name=\"input_images\",\n", + " path_on_datastore=\"batchscoring/images\",\n", + " mode=\"download\"\n", + " )\n", + "model_dir = DataReference(datastore=batchscore_blob, \n", + " data_reference_name=\"input_model\",\n", + " path_on_datastore=\"batchscoring/models\",\n", + " mode=\"download\" \n", + " )\n", + "label_dir = DataReference(datastore=batchscore_blob, \n", + " data_reference_name=\"input_labels\",\n", + " path_on_datastore=\"batchscoring/labels\",\n", + " mode=\"download\" \n", + " )\n", + "output_dir = PipelineData(name=\"scores\", \n", + " datastore=def_data_store, \n", + " output_path_on_compute=\"batchscoring/results\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create and attach Compute targets\n", + "Use the below code to create and attach Compute targets. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# choose a name for your cluster\n", + "aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n", + "cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n", + "cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n", + "vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n", + "\n", + "\n", + "if aml_compute_name in ws.compute_targets:\n", + " compute_target = ws.compute_targets[aml_compute_name]\n", + " if compute_target and type(compute_target) is AmlCompute:\n", + " print('found compute target. just use it. ' + aml_compute_name)\n", + "else:\n", + " print('creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n", + " vm_priority = 'lowpriority', # optional\n", + " min_nodes = cluster_min_nodes, \n", + " max_nodes = cluster_max_nodes)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n", + " \n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it will use the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + " # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n", + " print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare the Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download the Model\n", + "\n", + "Download and extract the model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to `\"models\"`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create directory for model\n", + "model_dir = 'models'\n", + "if not os.path.isdir(model_dir):\n", + " os.mkdir(model_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tarfile\n", + "import urllib.request\n", + "\n", + "url=\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\"\n", + "response = urllib.request.urlretrieve(url, \"model.tar.gz\")\n", + "tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n", + "tar.extractall(model_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the model with Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "from azureml.core.model import Model\n", + "\n", + "# register downloaded model \n", + "model = Model.register(model_path = \"models/inception_v3.ckpt\",\n", + " model_name = \"inception\", # this is the name the model is registered as\n", + " tags = {'pretrained': \"inception\"},\n", + " description = \"Imagenet trained tensorflow inception\",\n", + " workspace = ws)\n", + "# remove the downloaded dir after registration if you wish\n", + "shutil.rmtree(\"models\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Write your scoring script" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To do the scoring, we use a batch scoring script `batch_scoring.py`, which is located in the same directory that this notebook is in. You can take a look at this script to see how you might modify it for your custom batch scoring task.\n", + "\n", + "The python script `batch_scoring.py` takes input images, applies the image classification model to these images, and outputs a classification result to a results file.\n", + "\n", + "The script `batch_scoring.py` takes the following parameters:\n", + "\n", + "- `--model_name`: the name of the model being used, which is expected to be in the `model_dir` directory\n", + "- `--label_dir` : the directory holding the `labels.txt` file \n", + "- `--dataset_path`: the directory containing the input images\n", + "- `--output_dir` : the script will run the model on the data and output a `results-label.txt` to this directory\n", + "- `--batch_size` : the batch size used in running the model.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build and run the batch scoring pipeline\n", + "You have everything you need to build the pipeline. Let’s put all these together." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify the environment to run the script\n", + "Specify the conda dependencies for your script. You will need this object when you create the pipeline step later on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", + "\n", + "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n", + "\n", + "# Runconfig\n", + "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", + "amlcompute_run_config.environment.docker.enabled = True\n", + "amlcompute_run_config.environment.docker.gpu_support = True\n", + "amlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n", + "amlcompute_run_config.environment.spark.precache_packages = False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify the parameters for your pipeline\n", + "A subset of the parameters to the python script can be given as input when we re-run a `PublishedPipeline`. In the current example, we define `batch_size` taken by the script as such parameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core.graph import PipelineParameter\n", + "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the pipeline step\n", + "Create the pipeline step using the script, environment configuration, and parameters. Specify the compute target you already attached to your workspace as the target of execution of the script. We will use PythonScriptStep to create the pipeline step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "inception_model_name = \"inception_v3.ckpt\"\n", + "\n", + "batch_score_step = PythonScriptStep(\n", + " name=\"batch_scoring\",\n", + " script_name=\"batch_scoring.py\",\n", + " arguments=[\"--dataset_path\", input_images, \n", + " \"--model_name\", \"inception\",\n", + " \"--label_dir\", label_dir, \n", + " \"--output_dir\", output_dir, \n", + " \"--batch_size\", batch_size_param],\n", + " compute_target=compute_target,\n", + " inputs=[input_images, label_dir],\n", + " outputs=[output_dir],\n", + " runconfig=amlcompute_run_config\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run the pipeline\n", + "At this point you can run the pipeline and examine the output it produced. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", + "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_params={\"param_batch_size\": 20})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(pipeline_run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download and review output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "step_run = list(pipeline_run.get_children())[0]\n", + "step_run.download_file(\"./outputs/result-labels.txt\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", + "df.columns = [\"Filename\", \"Prediction\"]\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Publish a pipeline and rerun using a REST call" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a published pipeline\n", + "Once you are satisfied with the outcome of the run, you can publish the pipeline to run it with different input values later. When you publish a pipeline, you will get a REST endpoint that accepts invoking of the pipeline with the set of parameters you have already incorporated above using PipelineParameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "published_pipeline = pipeline_run.publish_pipeline(\n", + " name=\"Inception_v3_scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n", + "\n", + "published_pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get published pipeline\n", + "\n", + "You can get the published pipeline using **pipeline id**.\n", + "\n", + "To get all the published pipelines for a given workspace(ws): \n", + "```css\n", + "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import PublishedPipeline\n", + "\n", + "pipeline_id = published_pipeline.id # use your published pipeline id\n", + "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", + "\n", + "published_pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Rerun the pipeline using the REST endpoint" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get AAD token\n", + "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.authentication import InteractiveLoginAuthentication\n", + "import requests\n", + "\n", + "auth = InteractiveLoginAuthentication()\n", + "aad_token = auth.get_authentication_header()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run published pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rest_endpoint = published_pipeline.endpoint\n", + "# specify batch size when running the pipeline\n", + "response = requests.post(rest_endpoint, \n", + " headers=aad_token, \n", + " json={\"ExperimentName\": \"batch_scoring\",\n", + " \"ParameterAssignments\": {\"param_batch_size\": 50}})\n", + "run_id = response.json()[\"Id\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the new run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core.run import PipelineRun\n", + "published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n", + "\n", + "RunDetails(published_pipeline_run).show()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sanpil" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb index b0deef24..33abfc1b 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb @@ -1,652 +1,652 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Neural style transfer on video\n", - "Using modified code from `pytorch`'s neural style [example](https://pytorch.org/tutorials/advanced/neural_style_tutorial.html), we show how to setup a pipeline for doing style transfer on video. The pipeline has following steps:\n", - "1. Split a video into images\n", - "2. Run neural style on each image using one of the provided models (from `pytorch` pretrained models for this example).\n", - "3. Stitch the image back into a video." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from azureml.core import Workspace, Experiment\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", - "\n", - "scripts_folder = \"scripts_folder\"\n", - "\n", - "if not os.path.isdir(scripts_folder):\n", - " os.mkdir(scripts_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import AmlCompute, ComputeTarget\n", - "from azureml.core.datastore import Datastore\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep, MpiStep\n", - "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", - "from azureml.core.compute_target import ComputeTargetException" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create or use existing compute" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# AmlCompute\n", - "cpu_cluster_name = \"cpu-cluster\"\n", - "try:\n", - " cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n", - " print(\"found existing cluster.\")\n", - "except ComputeTargetException:\n", - " print(\"creating new cluster\")\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_v2\",\n", - " max_nodes = 1)\n", - "\n", - " # create the cluster\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, provisioning_config)\n", - " cpu_cluster.wait_for_completion(show_output=True)\n", - " \n", - "# AmlCompute\n", - "gpu_cluster_name = \"gpu-cluster\"\n", - "try:\n", - " gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n", - " print(\"found existing cluster.\")\n", - "except ComputeTargetException:\n", - " print(\"creating new cluster\")\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", - " max_nodes = 3)\n", - "\n", - " # create the cluster\n", - " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n", - " gpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Python Scripts\n", - "We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n", - "\n", - "We install `ffmpeg` through conda dependencies." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy(\"neural_style_mpi.py\", scripts_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $scripts_folder/process_video.py\n", - "import argparse\n", - "import glob\n", - "import os\n", - "import subprocess\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Process input video\")\n", - "parser.add_argument('--input_video', required=True)\n", - "parser.add_argument('--output_audio', required=True)\n", - "parser.add_argument('--output_images', required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "os.makedirs(args.output_audio, exist_ok=True)\n", - "os.makedirs(args.output_images, exist_ok=True)\n", - "\n", - "subprocess.run(\"ffmpeg -i {} {}/video.aac\"\n", - " .format(args.input_video, args.output_audio),\n", - " shell=True, check=True\n", - " )\n", - "\n", - "subprocess.run(\"ffmpeg -i {} {}/%05d_video.jpg -hide_banner\"\n", - " .format(args.input_video, args.output_images),\n", - " shell=True, check=True\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $scripts_folder/stitch_video.py\n", - "import argparse\n", - "import os\n", - "import subprocess\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Process input video\")\n", - "parser.add_argument('--images_dir', required=True)\n", - "parser.add_argument('--input_audio', required=True)\n", - "parser.add_argument('--output_dir', required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "os.makedirs(args.output_dir, exist_ok=True)\n", - "\n", - "subprocess.run(\"ffmpeg -framerate 30 -i {}/%05d_video.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p \"\n", - " \"-y {}/video_without_audio.mp4\"\n", - " .format(args.images_dir, args.output_dir),\n", - " shell=True, check=True\n", - " )\n", - "\n", - "subprocess.run(\"ffmpeg -i {}/video_without_audio.mp4 -i {}/video.aac -map 0:0 -map 1:0 -vcodec \"\n", - " \"copy -acodec copy -y {}/video_with_audio.mp4\"\n", - " .format(args.output_dir, args.input_audio, args.output_dir),\n", - " shell=True, check=True\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The sample video **organutan.mp4** is stored at a publicly shared datastore. We are registering the datastore below. If you want to take a look at the original video, click here. (https://pipelinedata.blob.core.windows.net/sample-videos/orangutan.mp4)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# datastore for input video\n", - "account_name = \"pipelinedata\"\n", - "video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n", - " account_name=account_name, overwrite=True)\n", - "\n", - "# datastore for models\n", - "models_ds = Datastore.register_azure_blob_container(ws, \"models\", \"styletransfer\", \n", - " account_name=\"pipelinedata\", \n", - " overwrite=True)\n", - " \n", - "# downloaded models from https://pytorch.org/tutorials/advanced/neural_style_tutorial.html are kept here\n", - "models_dir = DataReference(data_reference_name=\"models\", datastore=models_ds, \n", - " path_on_datastore=\"saved_models\", mode=\"download\")\n", - "\n", - "# the default blob store attached to a workspace\n", - "default_datastore = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Sample video" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "video_name=os.getenv(\"STYLE_TRANSFER_VIDEO_NAME\", \"orangutan.mp4\") \n", - "orangutan_video = DataReference(datastore=video_ds,\n", - " data_reference_name=\"video\",\n", - " path_on_datastore=video_name, mode=\"download\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cd = CondaDependencies()\n", - "\n", - "cd.add_channel(\"conda-forge\")\n", - "cd.add_conda_package(\"ffmpeg\")\n", - "\n", - "cd.add_channel(\"pytorch\")\n", - "cd.add_conda_package(\"pytorch\")\n", - "cd.add_conda_package(\"torchvision\")\n", - "\n", - "# Runconfig\n", - "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", - "amlcompute_run_config.environment.docker.enabled = True\n", - "amlcompute_run_config.environment.docker.gpu_support = True\n", - "amlcompute_run_config.environment.docker.base_image = \"pytorch/pytorch\"\n", - "amlcompute_run_config.environment.spark.precache_packages = False" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ffmpeg_audio = PipelineData(name=\"ffmpeg_audio\", datastore=default_datastore)\n", - "ffmpeg_images = PipelineData(name=\"ffmpeg_images\", datastore=default_datastore)\n", - "processed_images = PipelineData(name=\"processed_images\", datastore=default_datastore)\n", - "output_video = PipelineData(name=\"output_video\", datastore=default_datastore)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Define tweakable parameters to pipeline\n", - "These parameters can be changed when the pipeline is published and rerun from a REST call" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.graph import PipelineParameter\n", - "# create a parameter for style (one of \"candy\", \"mosaic\", \"rain_princess\", \"udnie\") to transfer the images to\n", - "style_param = PipelineParameter(name=\"style\", default_value=\"mosaic\")\n", - "# create a parameter for the number of nodes to use in step no. 2 (style transfer)\n", - "nodecount_param = PipelineParameter(name=\"nodecount\", default_value=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "split_video_step = PythonScriptStep(\n", - " name=\"split video\",\n", - " script_name=\"process_video.py\",\n", - " arguments=[\"--input_video\", orangutan_video,\n", - " \"--output_audio\", ffmpeg_audio,\n", - " \"--output_images\", ffmpeg_images,\n", - " ],\n", - " compute_target=cpu_cluster,\n", - " inputs=[orangutan_video],\n", - " outputs=[ffmpeg_images, ffmpeg_audio],\n", - " runconfig=amlcompute_run_config,\n", - " source_directory=scripts_folder\n", - ")\n", - "\n", - "# create a MPI step for distributing style transfer step across multiple nodes in AmlCompute \n", - "# using 'nodecount_param' PipelineParameter\n", - "distributed_style_transfer_step = MpiStep(\n", - " name=\"mpi style transfer\",\n", - " script_name=\"neural_style_mpi.py\",\n", - " arguments=[\"--content-dir\", ffmpeg_images,\n", - " \"--output-dir\", processed_images,\n", - " \"--model-dir\", models_dir,\n", - " \"--style\", style_param,\n", - " \"--cuda\", 1\n", - " ],\n", - " compute_target=gpu_cluster,\n", - " node_count=nodecount_param, \n", - " process_count_per_node=1,\n", - " inputs=[models_dir, ffmpeg_images],\n", - " outputs=[processed_images],\n", - " pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n", - " use_gpu=True,\n", - " source_directory=scripts_folder\n", - ")\n", - "\n", - "stitch_video_step = PythonScriptStep(\n", - " name=\"stitch\",\n", - " script_name=\"stitch_video.py\",\n", - " arguments=[\"--images_dir\", processed_images, \n", - " \"--input_audio\", ffmpeg_audio, \n", - " \"--output_dir\", output_video],\n", - " compute_target=cpu_cluster,\n", - " inputs=[processed_images, ffmpeg_audio],\n", - " outputs=[output_video],\n", - " runconfig=amlcompute_run_config,\n", - " source_directory=scripts_folder\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Run the pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n", - "# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n", - "pipeline_run = Experiment(ws, 'style_transfer').submit(pipeline, pipeline_params={\"style\": \"mosaic\", \"nodecount\": 3})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Monitor using widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Downloads the video in `output_video` folder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Download output video" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def download_video(run, target_dir=None):\n", - " stitch_run = run.find_step_run(\"stitch\")[0]\n", - " port_data = stitch_run.get_output_data(\"output_video\")\n", - " port_data.download(target_dir, show_progress=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_run.wait_for_completion()\n", - "download_video(pipeline_run, \"output_video_mosaic\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Publish pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline = pipeline_run.publish_pipeline(\n", - " name=\"batch score style transfer\", description=\"style transfer\", version=\"1.0\")\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get published pipeline\n", - "\n", - "You can get the published pipeline using **pipeline id**.\n", - "\n", - "To get all the published pipelines for a given workspace(ws): \n", - "```css\n", - "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import PublishedPipeline\n", - "\n", - "pipeline_id = published_pipeline.id # use your published pipeline id\n", - "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Re-run pipeline through REST calls for other styles" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get AAD token\n", - "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.authentication import InteractiveLoginAuthentication\n", - "import requests\n", - "\n", - "auth = InteractiveLoginAuthentication()\n", - "aad_token = auth.get_authentication_header()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get endpoint URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rest_endpoint = published_pipeline.endpoint" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Send request and monitor" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# run the pipeline using PipelineParameter values style='candy' and nodecount=2\n", - "response = requests.post(rest_endpoint, \n", - " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"candy\", \"nodecount\": 2}}) \n", - "run_id = response.json()[\"Id\"]\n", - "\n", - "from azureml.pipeline.core.run import PipelineRun\n", - "published_pipeline_run_candy = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run_candy).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# run the pipeline using PipelineParameter values style='rain_princess' and nodecount=3\n", - "response = requests.post(rest_endpoint, \n", - " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"rain_princess\", \"nodecount\": 3}}) \n", - "run_id = response.json()[\"Id\"]\n", - "\n", - "published_pipeline_run_rain = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run_rain).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# run the pipeline using PipelineParameter values style='udnie' and nodecount=4\n", - "response = requests.post(rest_endpoint, \n", - " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 3}}) \n", - "run_id = response.json()[\"Id\"]\n", - "\n", - "published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run_udnie).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download output from re-run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline_run_candy.wait_for_completion()\n", - "published_pipeline_run_rain.wait_for_completion()\n", - "published_pipeline_run_udnie.wait_for_completion()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "download_video(published_pipeline_run_candy, target_dir=\"output_video_candy\")\n", - "download_video(published_pipeline_run_rain, target_dir=\"output_video_rain_princess\")\n", - "download_video(published_pipeline_run_udnie, target_dir=\"output_video_udnie\")" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sanpil" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Neural style transfer on video\n", + "Using modified code from `pytorch`'s neural style [example](https://pytorch.org/tutorials/advanced/neural_style_tutorial.html), we show how to setup a pipeline for doing style transfer on video. The pipeline has following steps:\n", + "1. Split a video into images\n", + "2. Run neural style on each image using one of the provided models (from `pytorch` pretrained models for this example).\n", + "3. Stitch the image back into a video." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azureml.core import Workspace, Experiment\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", + "\n", + "scripts_folder = \"scripts_folder\"\n", + "\n", + "if not os.path.isdir(scripts_folder):\n", + " os.mkdir(scripts_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.core.datastore import Datastore\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.pipeline.core import Pipeline, PipelineData\n", + "from azureml.pipeline.steps import PythonScriptStep, MpiStep\n", + "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", + "from azureml.core.compute_target import ComputeTargetException" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create or use existing compute" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# AmlCompute\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "try:\n", + " cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n", + " print(\"found existing cluster.\")\n", + "except ComputeTargetException:\n", + " print(\"creating new cluster\")\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_v2\",\n", + " max_nodes = 1)\n", + "\n", + " # create the cluster\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, provisioning_config)\n", + " cpu_cluster.wait_for_completion(show_output=True)\n", + " \n", + "# AmlCompute\n", + "gpu_cluster_name = \"gpu-cluster\"\n", + "try:\n", + " gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n", + " print(\"found existing cluster.\")\n", + "except ComputeTargetException:\n", + " print(\"creating new cluster\")\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", + " max_nodes = 3)\n", + "\n", + " # create the cluster\n", + " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n", + " gpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python Scripts\n", + "We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n", + "\n", + "We install `ffmpeg` through conda dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy(\"neural_style_mpi.py\", scripts_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $scripts_folder/process_video.py\n", + "import argparse\n", + "import glob\n", + "import os\n", + "import subprocess\n", + "\n", + "parser = argparse.ArgumentParser(description=\"Process input video\")\n", + "parser.add_argument('--input_video', required=True)\n", + "parser.add_argument('--output_audio', required=True)\n", + "parser.add_argument('--output_images', required=True)\n", + "\n", + "args = parser.parse_args()\n", + "\n", + "os.makedirs(args.output_audio, exist_ok=True)\n", + "os.makedirs(args.output_images, exist_ok=True)\n", + "\n", + "subprocess.run(\"ffmpeg -i {} {}/video.aac\"\n", + " .format(args.input_video, args.output_audio),\n", + " shell=True, check=True\n", + " )\n", + "\n", + "subprocess.run(\"ffmpeg -i {} {}/%05d_video.jpg -hide_banner\"\n", + " .format(args.input_video, args.output_images),\n", + " shell=True, check=True\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $scripts_folder/stitch_video.py\n", + "import argparse\n", + "import os\n", + "import subprocess\n", + "\n", + "parser = argparse.ArgumentParser(description=\"Process input video\")\n", + "parser.add_argument('--images_dir', required=True)\n", + "parser.add_argument('--input_audio', required=True)\n", + "parser.add_argument('--output_dir', required=True)\n", + "\n", + "args = parser.parse_args()\n", + "\n", + "os.makedirs(args.output_dir, exist_ok=True)\n", + "\n", + "subprocess.run(\"ffmpeg -framerate 30 -i {}/%05d_video.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p \"\n", + " \"-y {}/video_without_audio.mp4\"\n", + " .format(args.images_dir, args.output_dir),\n", + " shell=True, check=True\n", + " )\n", + "\n", + "subprocess.run(\"ffmpeg -i {}/video_without_audio.mp4 -i {}/video.aac -map 0:0 -map 1:0 -vcodec \"\n", + " \"copy -acodec copy -y {}/video_with_audio.mp4\"\n", + " .format(args.output_dir, args.input_audio, args.output_dir),\n", + " shell=True, check=True\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The sample video **organutan.mp4** is stored at a publicly shared datastore. We are registering the datastore below. If you want to take a look at the original video, click here. (https://pipelinedata.blob.core.windows.net/sample-videos/orangutan.mp4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# datastore for input video\n", + "account_name = \"pipelinedata\"\n", + "video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n", + " account_name=account_name, overwrite=True)\n", + "\n", + "# datastore for models\n", + "models_ds = Datastore.register_azure_blob_container(ws, \"models\", \"styletransfer\", \n", + " account_name=\"pipelinedata\", \n", + " overwrite=True)\n", + " \n", + "# downloaded models from https://pytorch.org/tutorials/advanced/neural_style_tutorial.html are kept here\n", + "models_dir = DataReference(data_reference_name=\"models\", datastore=models_ds, \n", + " path_on_datastore=\"saved_models\", mode=\"download\")\n", + "\n", + "# the default blob store attached to a workspace\n", + "default_datastore = ws.get_default_datastore()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Sample video" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "video_name=os.getenv(\"STYLE_TRANSFER_VIDEO_NAME\", \"orangutan.mp4\") \n", + "orangutan_video = DataReference(datastore=video_ds,\n", + " data_reference_name=\"video\",\n", + " path_on_datastore=video_name, mode=\"download\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cd = CondaDependencies()\n", + "\n", + "cd.add_channel(\"conda-forge\")\n", + "cd.add_conda_package(\"ffmpeg\")\n", + "\n", + "cd.add_channel(\"pytorch\")\n", + "cd.add_conda_package(\"pytorch\")\n", + "cd.add_conda_package(\"torchvision\")\n", + "\n", + "# Runconfig\n", + "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", + "amlcompute_run_config.environment.docker.enabled = True\n", + "amlcompute_run_config.environment.docker.gpu_support = True\n", + "amlcompute_run_config.environment.docker.base_image = \"pytorch/pytorch\"\n", + "amlcompute_run_config.environment.spark.precache_packages = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ffmpeg_audio = PipelineData(name=\"ffmpeg_audio\", datastore=default_datastore)\n", + "ffmpeg_images = PipelineData(name=\"ffmpeg_images\", datastore=default_datastore)\n", + "processed_images = PipelineData(name=\"processed_images\", datastore=default_datastore)\n", + "output_video = PipelineData(name=\"output_video\", datastore=default_datastore)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Define tweakable parameters to pipeline\n", + "These parameters can be changed when the pipeline is published and rerun from a REST call" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core.graph import PipelineParameter\n", + "# create a parameter for style (one of \"candy\", \"mosaic\", \"rain_princess\", \"udnie\") to transfer the images to\n", + "style_param = PipelineParameter(name=\"style\", default_value=\"mosaic\")\n", + "# create a parameter for the number of nodes to use in step no. 2 (style transfer)\n", + "nodecount_param = PipelineParameter(name=\"nodecount\", default_value=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "split_video_step = PythonScriptStep(\n", + " name=\"split video\",\n", + " script_name=\"process_video.py\",\n", + " arguments=[\"--input_video\", orangutan_video,\n", + " \"--output_audio\", ffmpeg_audio,\n", + " \"--output_images\", ffmpeg_images,\n", + " ],\n", + " compute_target=cpu_cluster,\n", + " inputs=[orangutan_video],\n", + " outputs=[ffmpeg_images, ffmpeg_audio],\n", + " runconfig=amlcompute_run_config,\n", + " source_directory=scripts_folder\n", + ")\n", + "\n", + "# create a MPI step for distributing style transfer step across multiple nodes in AmlCompute \n", + "# using 'nodecount_param' PipelineParameter\n", + "distributed_style_transfer_step = MpiStep(\n", + " name=\"mpi style transfer\",\n", + " script_name=\"neural_style_mpi.py\",\n", + " arguments=[\"--content-dir\", ffmpeg_images,\n", + " \"--output-dir\", processed_images,\n", + " \"--model-dir\", models_dir,\n", + " \"--style\", style_param,\n", + " \"--cuda\", 1\n", + " ],\n", + " compute_target=gpu_cluster,\n", + " node_count=nodecount_param, \n", + " process_count_per_node=1,\n", + " inputs=[models_dir, ffmpeg_images],\n", + " outputs=[processed_images],\n", + " pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n", + " use_gpu=True,\n", + " source_directory=scripts_folder\n", + ")\n", + "\n", + "stitch_video_step = PythonScriptStep(\n", + " name=\"stitch\",\n", + " script_name=\"stitch_video.py\",\n", + " arguments=[\"--images_dir\", processed_images, \n", + " \"--input_audio\", ffmpeg_audio, \n", + " \"--output_dir\", output_video],\n", + " compute_target=cpu_cluster,\n", + " inputs=[processed_images, ffmpeg_audio],\n", + " outputs=[output_video],\n", + " runconfig=amlcompute_run_config,\n", + " source_directory=scripts_folder\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n", + "# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n", + "pipeline_run = Experiment(ws, 'style_transfer').submit(pipeline, pipeline_params={\"style\": \"mosaic\", \"nodecount\": 3})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Monitor using widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(pipeline_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Downloads the video in `output_video` folder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Download output video" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def download_video(run, target_dir=None):\n", + " stitch_run = run.find_step_run(\"stitch\")[0]\n", + " port_data = stitch_run.get_output_data(\"output_video\")\n", + " port_data.download(target_dir, show_progress=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run.wait_for_completion()\n", + "download_video(pipeline_run, \"output_video_mosaic\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Publish pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "published_pipeline = pipeline_run.publish_pipeline(\n", + " name=\"batch score style transfer\", description=\"style transfer\", version=\"1.0\")\n", + "\n", + "published_pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get published pipeline\n", + "\n", + "You can get the published pipeline using **pipeline id**.\n", + "\n", + "To get all the published pipelines for a given workspace(ws): \n", + "```css\n", + "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import PublishedPipeline\n", + "\n", + "pipeline_id = published_pipeline.id # use your published pipeline id\n", + "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", + "\n", + "published_pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Re-run pipeline through REST calls for other styles" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get AAD token\n", + "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.authentication import InteractiveLoginAuthentication\n", + "import requests\n", + "\n", + "auth = InteractiveLoginAuthentication()\n", + "aad_token = auth.get_authentication_header()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get endpoint URL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rest_endpoint = published_pipeline.endpoint" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Send request and monitor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# run the pipeline using PipelineParameter values style='candy' and nodecount=2\n", + "response = requests.post(rest_endpoint, \n", + " headers=aad_token,\n", + " json={\"ExperimentName\": \"style_transfer\",\n", + " \"ParameterAssignments\": {\"style\": \"candy\", \"nodecount\": 2}}) \n", + "run_id = response.json()[\"Id\"]\n", + "\n", + "from azureml.pipeline.core.run import PipelineRun\n", + "published_pipeline_run_candy = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", + "\n", + "RunDetails(published_pipeline_run_candy).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# run the pipeline using PipelineParameter values style='rain_princess' and nodecount=3\n", + "response = requests.post(rest_endpoint, \n", + " headers=aad_token,\n", + " json={\"ExperimentName\": \"style_transfer\",\n", + " \"ParameterAssignments\": {\"style\": \"rain_princess\", \"nodecount\": 3}}) \n", + "run_id = response.json()[\"Id\"]\n", + "\n", + "published_pipeline_run_rain = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", + "\n", + "RunDetails(published_pipeline_run_rain).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# run the pipeline using PipelineParameter values style='udnie' and nodecount=4\n", + "response = requests.post(rest_endpoint, \n", + " headers=aad_token,\n", + " json={\"ExperimentName\": \"style_transfer\",\n", + " \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 3}}) \n", + "run_id = response.json()[\"Id\"]\n", + "\n", + "published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", + "\n", + "RunDetails(published_pipeline_run_udnie).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download output from re-run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "published_pipeline_run_candy.wait_for_completion()\n", + "published_pipeline_run_rain.wait_for_completion()\n", + "published_pipeline_run_udnie.wait_for_completion()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "download_video(published_pipeline_run_candy, target_dir=\"output_video_candy\")\n", + "download_video(published_pipeline_run_rain, target_dir=\"output_video_rain_princess\")\n", + "download_video(published_pipeline_run_udnie, target_dir=\"output_video_udnie\")" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sanpil" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb index 4d577c75..efb53b82 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb @@ -1,322 +1,322 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed Chainer\n", - "In this tutorial, you will run a Chainer training example on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using ChainerMN distributed training across a GPU cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current AmlCompute. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the AmlCompute ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './chainer-distr'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `train_mnist.py`. In practice, you should be able to take any custom Chainer training script as is and run it with Azure ML without having to modify your code." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `train_mnist.py` into the project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('train_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed Chainer tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'chainer-distr'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Chainer estimator\n", - "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import Chainer\n", - "\n", - "estimator = Chainer(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='train_mnist.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distributed Chainer\n", + "In this tutorial, you will run a Chainer training example on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using ChainerMN distributed training across a GPU cluster." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current AmlCompute. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the AmlCompute ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './chainer-distr'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare training script\n", + "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `train_mnist.py`. In practice, you should be able to take any custom Chainer training script as is and run it with Azure ML without having to modify your code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once your script is ready, copy the training script `train_mnist.py` into the project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('train_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed Chainer tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'chainer-distr'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Chainer estimator\n", + "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import MpiConfiguration\n", + "from azureml.train.dnn import Chainer\n", + "\n", + "estimator = Chainer(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " entry_script='train_mnist.py',\n", + " node_count=2,\n", + " distributed_training=MpiConfiguration(),\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb index 523f58ed..71648ff5 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb @@ -1,401 +1,401 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed CNTK using custom docker images\n", - "In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name,\n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current AmlCompute\n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload training data\n", - "For this tutorial, we will be using the MNIST dataset.\n", - "\n", - "First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import install_mnist\n", - "\n", - "install_mnist.main('mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", - "\n", - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following code will upload the training data to the path `./mnist` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./mnist', target_path='./mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'mnist'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './cntk-distr'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `cntk_distr_mnist.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('cntk_distr_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'cntk-distr'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an Estimator\n", - "The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.estimator import Estimator\n", - "\n", - "script_params = {\n", - " '--num_epochs': 20,\n", - " '--data_dir': ds_data.as_mount(),\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = Estimator(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='cntk_distr_mnist.py',\n", - " script_params=script_params,\n", - " node_count=2,\n", - " process_count_per_node=1,\n", - " distributed_backend='mpi',\n", - " pip_packages=['cntk-gpu==2.6'],\n", - " custom_docker_image='microsoft/mmlspark:gpu-0.12',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_image`. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", - "\n", - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distributed CNTK using custom docker images\n", + "In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name,\n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current AmlCompute\n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload training data\n", + "For this tutorial, we will be using the MNIST dataset.\n", + "\n", + "First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import install_mnist\n", + "\n", + "install_mnist.main('mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", + "\n", + "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code will upload the training data to the path `./mnist` on the default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='./mnist', target_path='./mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "path_on_datastore = 'mnist'\n", + "ds_data = ds.path(path_on_datastore)\n", + "print(ds_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the cluster ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './cntk-distr'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `cntk_distr_mnist.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('cntk_distr_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'cntk-distr'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an Estimator\n", + "The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.estimator import Estimator\n", + "\n", + "script_params = {\n", + " '--num_epochs': 20,\n", + " '--data_dir': ds_data.as_mount(),\n", + " '--output_dir': './outputs'\n", + "}\n", + "\n", + "estimator = Estimator(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " entry_script='cntk_distr_mnist.py',\n", + " script_params=script_params,\n", + " node_count=2,\n", + " process_count_per_node=1,\n", + " distributed_backend='mpi',\n", + " pip_packages=['cntk-gpu==2.6'],\n", + " custom_docker_image='microsoft/mmlspark:gpu-0.12',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_image`. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", + "\n", + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb index 83b4de13..f68cd387 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb @@ -1,342 +1,342 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed PyTorch with Horovod\n", - "In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current AmlCompute. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the AmlCompute ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n", - "\n", - "To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "Later within the script, we log the loss metric to our run:\n", - "```Python\n", - "run.log('loss', loss.item())\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('pytorch_horovod_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import PyTorch\n", - "\n", - "estimator = PyTorch(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='pytorch_horovod_mnist.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distributed PyTorch with Horovod\n", + "In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n", + "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current AmlCompute. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the AmlCompute ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './pytorch-distr-hvd'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare training script\n", + "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n", + "\n", + "However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n", + "\n", + "To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n", + "```Python\n", + "from azureml.core.run import Run\n", + "run = Run.get_context()\n", + "```\n", + "Later within the script, we log the loss metric to our run:\n", + "```Python\n", + "run.log('loss', loss.item())\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('pytorch_horovod_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'pytorch-distr-hvd'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a PyTorch estimator\n", + "The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import MpiConfiguration\n", + "from azureml.train.dnn import PyTorch\n", + "\n", + "estimator = PyTorch(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " entry_script='pytorch_horovod_mnist.py',\n", + " node_count=2,\n", + " distributed_training=MpiConfiguration(),\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True) # this provides a verbose log" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb index 41eae095..c8babdb1 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb @@ -1,411 +1,411 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed Tensorflow with Horovod\n", - "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data to datastore\n", - "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", - "\n", - "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data', exist_ok=True)\n", - "download_url = 'http://mattmahoney.net/dc/text8.zip'\n", - "urllib.request.urlretrieve(download_url, filename='./data/text8.zip')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Upload the contents of the data directory to the path `./data` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'data/text8.zip'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "project_folder = './tf-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_horovod_word2vec.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('tf_horovod_word2vec.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow).\n", - "\n", - "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--input_data': ds_data\n", - "}\n", - "\n", - "estimator= TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_horovod_word2vec.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " framework_version='1.13')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", - "\n", - "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distributed Tensorflow with Horovod\n", + "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload data to datastore\n", + "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", + "\n", + "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib\n", + "\n", + "os.makedirs('./data', exist_ok=True)\n", + "download_url = 'http://mattmahoney.net/dc/text8.zip'\n", + "urllib.request.urlretrieve(download_url, filename='./data/text8.zip')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Upload the contents of the data directory to the path `./data` on the default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "path_on_datastore = 'data/text8.zip'\n", + "ds_data = ds.path(path_on_datastore)\n", + "print(ds_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "project_folder = './tf-distr-hvd'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `tf_horovod_word2vec.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('tf_horovod_word2vec.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'tf-distr-hvd'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a TensorFlow estimator\n", + "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow).\n", + "\n", + "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import MpiConfiguration\n", + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params={\n", + " '--input_data': ds_data\n", + "}\n", + "\n", + "estimator= TensorFlow(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " script_params=script_params,\n", + " entry_script='tf_horovod_word2vec.py',\n", + " node_count=2,\n", + " distributed_training=MpiConfiguration(),\n", + " framework_version='1.13')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", + "\n", + "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb index 56588388..f1ab95cc 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb @@ -1,326 +1,326 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed TensorFlow with parameter server\n", - "In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './tf-distr-ps'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_mnist_replica.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('tf_mnist_replica.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-ps'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import TensorflowConfiguration\n", - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--num_gpus': 1,\n", - " '--train_steps': 500\n", - "}\n", - "\n", - "distributed_training = TensorflowConfiguration()\n", - "distributed_training.worker_count = 2\n", - "\n", - "estimator = TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_replica.py',\n", - " node_count=2,\n", - " distributed_training=distributed_training,\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distributed TensorFlow with parameter server\n", + "In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the cluster ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './tf-distr-ps'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `tf_mnist_replica.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('tf_mnist_replica.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'tf-distr-ps'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a TensorFlow estimator\n", + "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import TensorflowConfiguration\n", + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params={\n", + " '--num_gpus': 1,\n", + " '--train_steps': 500\n", + "}\n", + "\n", + "distributed_training = TensorflowConfiguration()\n", + "distributed_training.worker_count = 2\n", + "\n", + "estimator = TensorFlow(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " script_params=script_params,\n", + " entry_script='tf_mnist_replica.py',\n", + " node_count=2,\n", + " distributed_training=distributed_training,\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True) # this provides a verbose log" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb index 6d15a48c..b21b8b60 100644 --- a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb @@ -1,256 +1,256 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Export Run History as Tensorboard logs\n", - "\n", - "1. Run some training and log some metrics into Run History\n", - "2. Export the run history to some directory as Tensorboard logs\n", - "3. Launch a local Tensorboard to view the run history" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace, Experiment\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name and start the run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'export-to-tensorboard'\n", - "exp = Experiment(ws, experiment_name)\n", - "root_run = exp.start_logging()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n", - "from sklearn.datasets import load_diabetes\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.metrics import mean_squared_error\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "X, y = load_diabetes(return_X_y=True)\n", - "\n", - "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", - "\n", - "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", - "data = {\n", - " \"train\":{\"x\":x_train, \"y\":y_train}, \n", - " \"test\":{\"x\":x_test, \"y\":y_test}\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Example experiment\n", - "from tqdm import tqdm\n", - "\n", - "alphas = [.1, .2, .3, .4, .5, .6 , .7]\n", - "\n", - "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", - "for alpha in tqdm(alphas):\n", - " # create a bunch of child runs\n", - " with root_run.child_run(\"alpha\" + str(alpha)) as run:\n", - " # More data science stuff\n", - " reg = Ridge(alpha=alpha)\n", - " reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n", - " \n", - " preds = reg.predict(data[\"test\"][\"x\"])\n", - " mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n", - " # End train and eval\n", - "\n", - " # log alpha, mean_squared_error and feature names in run history\n", - " root_run.log(\"alpha\", alpha)\n", - " root_run.log(\"mse\", mse)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Export Run History to Tensorboard logs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Export Run History to Tensorboard logs\n", - "from azureml.tensorboard.export import export_to_tensorboard\n", - "import os\n", - "\n", - "logdir = 'exportedTBlogs'\n", - "log_path = os.path.join(os.getcwd(), logdir)\n", - "try:\n", - " os.stat(log_path)\n", - "except os.error:\n", - " os.mkdir(log_path)\n", - "print(logdir)\n", - "\n", - "# export run history for the project\n", - "export_to_tensorboard(root_run, logdir)\n", - "\n", - "# or export a particular run\n", - "# export_to_tensorboard(run, logdir)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "root_run.complete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard\n", - "\n", - "Or you can start the Tensorboard outside this notebook to view the result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.tensorboard import Tensorboard\n", - "\n", - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([], local_root=logdir, port=6006)\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Export Run History as Tensorboard logs\n", + "\n", + "1. Run some training and log some metrics into Run History\n", + "2. Export the run history to some directory as Tensorboard logs\n", + "3. Launch a local Tensorboard to view the run history" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace, Experiment\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set experiment name and start the run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'export-to-tensorboard'\n", + "exp = Experiment(ws, experiment_name)\n", + "root_run = exp.start_logging()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True)\n", + "\n", + "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", + "data = {\n", + " \"train\":{\"x\":x_train, \"y\":y_train}, \n", + " \"test\":{\"x\":x_test, \"y\":y_test}\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example experiment\n", + "from tqdm import tqdm\n", + "\n", + "alphas = [.1, .2, .3, .4, .5, .6 , .7]\n", + "\n", + "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", + "for alpha in tqdm(alphas):\n", + " # create a bunch of child runs\n", + " with root_run.child_run(\"alpha\" + str(alpha)) as run:\n", + " # More data science stuff\n", + " reg = Ridge(alpha=alpha)\n", + " reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n", + " \n", + " preds = reg.predict(data[\"test\"][\"x\"])\n", + " mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n", + " # End train and eval\n", + "\n", + " # log alpha, mean_squared_error and feature names in run history\n", + " root_run.log(\"alpha\", alpha)\n", + " root_run.log(\"mse\", mse)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Export Run History to Tensorboard logs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Export Run History to Tensorboard logs\n", + "from azureml.tensorboard.export import export_to_tensorboard\n", + "import os\n", + "\n", + "logdir = 'exportedTBlogs'\n", + "log_path = os.path.join(os.getcwd(), logdir)\n", + "try:\n", + " os.stat(log_path)\n", + "except os.error:\n", + " os.mkdir(log_path)\n", + "print(logdir)\n", + "\n", + "# export run history for the project\n", + "export_to_tensorboard(root_run, logdir)\n", + "\n", + "# or export a particular run\n", + "# export_to_tensorboard(run, logdir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "root_run.complete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard\n", + "\n", + "Or you can start the Tensorboard outside this notebook to view the result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.tensorboard import Tensorboard\n", + "\n", + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([], local_root=logdir, port=6006)\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb index 6070cb56..db95bd70 100644 --- a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb @@ -1,542 +1,542 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" - } - }, - "source": [ - "# How to use Estimator in Azure ML\n", - "\n", - "## Introduction\n", - "This tutorial shows how to use the Estimator pattern in Azure Machine Learning SDK. Estimator is a convenient object in Azure Machine Learning that wraps run configuration information to help simplify the tasks of specifying how a script is executed.\n", - "\n", - "\n", - "## Prerequisite:\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's get started. First let's import some Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" - } - }, - "outputs": [], - "source": [ - "import azureml.core\n", - "from azureml.core import Workspace\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" - } - }, - "source": [ - "## Create an Azure ML experiment\n", - "Let's create an experiment named \"estimator-test\". The script runs will be recorded under this experiment in Azure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" - } - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "exp = Experiment(workspace=ws, name='estimator-test')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"cpu-cluster\"\n", - "\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n", - "\n", - " # create the cluster\n", - " cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(cpu_cluster.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compute_targets = ws.compute_targets\n", - "for name, ct in compute_targets.items():\n", - " print(name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" - } - }, - "source": [ - "## Use a simple script\n", - "We have already created a simple \"hello world\" script. This is the script that we will submit through the estimator pattern. It prints a hello-world message, and if Azure ML SDK is installed, it will also logs an array of values ([Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number)). The script takes as input the number of Fibonacci numbers in the sequence to log." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./dummy_train.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create A Generic Estimator" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First we import the Estimator class and also a widget to visualize a run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.estimator import Estimator\n", - "from azureml.widgets import RunDetails" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The simplest estimator is to submit the current folder to the local computer. Estimator by default will attempt to use Docker-based execution. Let's turn that off for now. It then builds a conda environment locally, installs Azure ML SDK in it, and runs your script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# use a conda environment, don't use Docker, on local computer\n", - "# Let's see how you can pass bool arguments in the script_params. Passing `'--my_bool_var': ''` will set my_bool_var as True and\n", - "# if you want it to be False, just do not pass it in the script_params.\n", - "script_params = {\n", - " '--numbers-in-sequence': 10,\n", - " '--my_bool_var': ''\n", - "}\n", - "est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n", - "run = exp.submit(est)\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also enable Docker and let estimator pick the default CPU image supplied by Azure ML for execution. You can target an AmlCompute cluster (or any other supported compute target types)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# use a conda environment on default Docker image in an AmlCompute cluster\n", - "script_params = {\n", - " '--numbers-in-sequence': 10\n", - "}\n", - "est = Estimator(source_directory='.', script_params=script_params, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)\n", - "run = exp.submit(est)\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can customize the conda environment by adding conda and/or pip packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# add a conda package\n", - "script_params = {\n", - " '--numbers-in-sequence': 10\n", - "}\n", - "est = Estimator(source_directory='.', \n", - " script_params=script_params, \n", - " compute_target='local', \n", - " entry_script='dummy_train.py', \n", - " use_docker=False, \n", - " conda_packages=['scikit-learn'])\n", - "run = exp.submit(est)\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also specify a custom Docker image for exeution. In this case, you probably want to tell the system not to build a new conda environment for you. Instead, you can specify the path to an existing Python environment in the custom Docker image.\n", - "\n", - "**Note**: since the below example points to the preinstalled Python environment in the miniconda3 image maintained by continuum.io on Docker Hub where Azure ML SDK is not present, the logging metric code is not triggered. But a run history record is still recorded. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# use a custom Docker image\n", - "from azureml.core.container_registry import ContainerRegistry\n", - "\n", - "# this is an image available in Docker Hub\n", - "image_name = 'continuumio/miniconda3'\n", - "\n", - "# you can also point to an image in a private ACR\n", - "image_registry_details = ContainerRegistry()\n", - "image_registry_details.address = \"myregistry.azurecr.io\"\n", - "image_registry_details.username = \"username\"\n", - "image_registry_details.password = \"password\"\n", - "\n", - "# don't let the system build a new conda environment\n", - "user_managed_dependencies = True\n", - "\n", - "# submit to a local Docker container. if you don't have Docker engine running locally, you can set compute_target to cpu_cluster.\n", - "script_params = {\n", - " '--numbers-in-sequence': 10\n", - "}\n", - "est = Estimator(source_directory='.', \n", - " script_params=script_params, \n", - " compute_target='local', \n", - " entry_script='dummy_train.py',\n", - " custom_docker_image=image_name,\n", - " # uncomment below line to use your private ACR\n", - " #image_registry_details=image_registry_details,\n", - " user_managed=user_managed_dependencies\n", - " )\n", - "\n", - "run = exp.submit(est)\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In addition to passing in a python file, you can also pass in a Jupyter notebook as the `entry_script`. [notebook_example.ipynb](notebook_example.ipynb) uses pm.record() to log key-value pairs which will appear in Azure Portal and shown in below widget.\n", - "\n", - "In order to run below, make sure `azureml-contrib-notebook` package is installed in current environment with `pip intall azureml-contrib-notebook`.\n", - "\n", - "This code snippet specifies the following parameters to the `Estimator` constructor. For more information on `Estimator`, please see [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) or [API doc](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py).\n", - "\n", - "| Parameter | Description |\n", - "| ------------- | ------------- |\n", - "| source_directory | (str) Local directory that contains all of your code needed for the training job. This folder gets copied from your local machine to the remote compute |\n", - "| compute_target | ([AbstractComputeTarget](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute_target.abstractcomputetarget?view=azure-ml-py) or str) Remote compute target that your training script will run on, in this case a previously created persistent compute cluster (cpu_cluster) |\n", - "| entry_script | (str) Filepath (relative to the source_directory) of the training script/notebook to be run on the remote compute. This file, and any additional files it depends on, should be located in this folder |\n", - "| script_params | (dict) A dictionary containing parameters to the `entry_script`. This is useful for passing datastore reference, for example, see [train-hyperparameter-tune-deploy-with-tensorflow.ipynb](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) |" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = Estimator(source_directory='.', compute_target=cpu_cluster, entry_script='notebook_example.ipynb', pip_packages=['nteract-scrapbook', 'azureml-contrib-notebook'])\n", - "run = exp.submit(est)\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Intelligent hyperparameter tuning\n", - "\n", - "The simple \"hello world\" script above lets the user fix the value of a parameter for the number of Fibonacci numbers in the sequence to log. Similarly, when training models, you can fix values of parameters of the training algorithm itself. E.g. the learning rate, the number of layers, the number of nodes in each layer in a neural network, etc. These adjustable parameters that govern the training process are referred to as the hyperparameters of the model. The goal of hyperparameter tuning is to search across various hyperparameter configurations and find the configuration that results in the best performance.\n", - "\n", - "\n", - "To demonstrate how Azure Machine Learning can help you automate the process of hyperarameter tuning, we will launch multiple runs with different values for numbers in the sequence. First let's define the parameter space using random sampling." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", - "from azureml.train.hyperdrive import choice\n", - "\n", - "ps = RandomParameterSampling(\n", - " {\n", - " '--numbers-in-sequence': choice(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will create a new estimator without the above numbers-in-sequence parameter since that will be passed in later. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = Estimator(source_directory='.', script_params={}, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will look at training metrics and early termination policies. When training a model, users are interested in logging and optimizing certain metrics of the model e.g. maximize the accuracy of the model, or minimize loss. This metric is logged by the training script for each run. In our simple script above, we are logging Fibonacci numbers in a sequence. But a training script could just as easily log other metrics like accuracy or loss, which can be used to evaluate the performance of a given training run.\n", - "\n", - "The intelligent hyperparameter tuning capability in Azure Machine Learning automatically terminates poorly performing runs using an early termination policy. Early termination reduces wastage of compute resources and instead uses these resources for exploring other hyperparameter configurations. In this example, we use the BanditPolicy. This basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML will terminate the training run. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we are ready to configure a run configuration object for hyperparameter tuning. We need to call out the primary metric that we want the experiment to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script and we specify that we are looking to maximize this value. Next, we control the resource budget for the experiment by setting the maximum total number of training runs to 10. We also set the maximum number of training runs to run concurrently at 4, which is the same as the number of nodes in our computer cluster." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hdc = HyperDriveConfig(estimator=est, \n", - " hyperparameter_sampling=ps, \n", - " policy=policy, \n", - " primary_metric_name='Fibonacci numbers', \n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", - " max_total_runs=10,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's launch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hdr = exp.submit(config=hdc)\n", - "RunDetails(hdr).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When all the runs complete, we can find the run with the best performance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hdr.get_best_run_by_primary_metric()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can register the model from the best run and use it to deploy a web service that can be used for Inferencing. Details on how how you can do this can be found in the sample folders for the ohter types of estimators.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next Steps\n", - "Now you can proceed to explore the other types of estimators, such as TensorFlow estimator, PyTorch estimator, etc. in the sample folder." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "maxluk" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, - "msauthor": "jingywa" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" + } + }, + "source": [ + "# How to use Estimator in Azure ML\n", + "\n", + "## Introduction\n", + "This tutorial shows how to use the Estimator pattern in Azure Machine Learning SDK. Estimator is a convenient object in Azure Machine Learning that wraps run configuration information to help simplify the tasks of specifying how a script is executed.\n", + "\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's get started. First let's import some Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" + } + }, + "outputs": [], + "source": [ + "import azureml.core\n", + "from azureml.core import Workspace\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" + } + }, + "source": [ + "## Create an Azure ML experiment\n", + "Let's create an experiment named \"estimator-test\". The script runs will be recorded under this experiment in Azure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" + } + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "exp = Experiment(workspace=ws, name='estimator-test')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"cpu-cluster\"\n", + "\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n", + "\n", + " # create the cluster\n", + " cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(cpu_cluster.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compute_targets = ws.compute_targets\n", + "for name, ct in compute_targets.items():\n", + " print(name, ct.type, ct.provisioning_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" + } + }, + "source": [ + "## Use a simple script\n", + "We have already created a simple \"hello world\" script. This is the script that we will submit through the estimator pattern. It prints a hello-world message, and if Azure ML SDK is installed, it will also logs an array of values ([Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number)). The script takes as input the number of Fibonacci numbers in the sequence to log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./dummy_train.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create A Generic Estimator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we import the Estimator class and also a widget to visualize a run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.estimator import Estimator\n", + "from azureml.widgets import RunDetails" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The simplest estimator is to submit the current folder to the local computer. Estimator by default will attempt to use Docker-based execution. Let's turn that off for now. It then builds a conda environment locally, installs Azure ML SDK in it, and runs your script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# use a conda environment, don't use Docker, on local computer\n", + "# Let's see how you can pass bool arguments in the script_params. Passing `'--my_bool_var': ''` will set my_bool_var as True and\n", + "# if you want it to be False, just do not pass it in the script_params.\n", + "script_params = {\n", + " '--numbers-in-sequence': 10,\n", + " '--my_bool_var': ''\n", + "}\n", + "est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n", + "run = exp.submit(est)\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also enable Docker and let estimator pick the default CPU image supplied by Azure ML for execution. You can target an AmlCompute cluster (or any other supported compute target types)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# use a conda environment on default Docker image in an AmlCompute cluster\n", + "script_params = {\n", + " '--numbers-in-sequence': 10\n", + "}\n", + "est = Estimator(source_directory='.', script_params=script_params, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)\n", + "run = exp.submit(est)\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can customize the conda environment by adding conda and/or pip packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# add a conda package\n", + "script_params = {\n", + " '--numbers-in-sequence': 10\n", + "}\n", + "est = Estimator(source_directory='.', \n", + " script_params=script_params, \n", + " compute_target='local', \n", + " entry_script='dummy_train.py', \n", + " use_docker=False, \n", + " conda_packages=['scikit-learn'])\n", + "run = exp.submit(est)\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also specify a custom Docker image for execution. In this case, you probably want to tell the system not to build a new conda environment for you. Instead, you can specify the path to an existing Python environment in the custom Docker image. If custom Docker image information is not specified, Azure ML uses the default Docker image to run your training. For more information about Docker containers used in Azure ML training, please see [Azure ML Containers repository](https://github.com/Azure/AzureML-Containers).\n", + "\n", + "**Note**: since the below example points to the preinstalled Python environment in the miniconda3 image maintained by continuum.io on Docker Hub where Azure ML SDK is not present, the logging metric code is not triggered. But a run history record is still recorded. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# use a custom Docker image\n", + "from azureml.core.container_registry import ContainerRegistry\n", + "\n", + "# this is an image available in Docker Hub\n", + "image_name = 'continuumio/miniconda3'\n", + "\n", + "# you can also point to an image in a private ACR\n", + "image_registry_details = ContainerRegistry()\n", + "image_registry_details.address = \"myregistry.azurecr.io\"\n", + "image_registry_details.username = \"username\"\n", + "image_registry_details.password = \"password\"\n", + "\n", + "# don't let the system build a new conda environment\n", + "user_managed_dependencies = True\n", + "\n", + "# submit to a local Docker container. if you don't have Docker engine running locally, you can set compute_target to cpu_cluster.\n", + "script_params = {\n", + " '--numbers-in-sequence': 10\n", + "}\n", + "est = Estimator(source_directory='.', \n", + " script_params=script_params, \n", + " compute_target='local', \n", + " entry_script='dummy_train.py',\n", + " custom_docker_image=image_name,\n", + " # uncomment below line to use your private ACR\n", + " #image_registry_details=image_registry_details,\n", + " user_managed=user_managed_dependencies\n", + " )\n", + "\n", + "run = exp.submit(est)\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In addition to passing in a python file, you can also pass in a Jupyter notebook as the `entry_script`. [notebook_example.ipynb](notebook_example.ipynb) uses pm.record() to log key-value pairs which will appear in Azure Portal and shown in below widget.\n", + "\n", + "In order to run below, make sure `azureml-contrib-notebook` package is installed in current environment with `pip intall azureml-contrib-notebook`.\n", + "\n", + "This code snippet specifies the following parameters to the `Estimator` constructor. For more information on `Estimator`, please see [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) or [API doc](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py).\n", + "\n", + "| Parameter | Description |\n", + "| ------------- | ------------- |\n", + "| source_directory | (str) Local directory that contains all of your code needed for the training job. This folder gets copied from your local machine to the remote compute |\n", + "| compute_target | ([AbstractComputeTarget](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute_target.abstractcomputetarget?view=azure-ml-py) or str) Remote compute target that your training script will run on, in this case a previously created persistent compute cluster (cpu_cluster) |\n", + "| entry_script | (str) Filepath (relative to the source_directory) of the training script/notebook to be run on the remote compute. This file, and any additional files it depends on, should be located in this folder |\n", + "| script_params | (dict) A dictionary containing parameters to the `entry_script`. This is useful for passing datastore reference, for example, see [train-hyperparameter-tune-deploy-with-tensorflow.ipynb](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = Estimator(source_directory='.', compute_target=cpu_cluster, entry_script='notebook_example.ipynb', pip_packages=['nteract-scrapbook', 'azureml-contrib-notebook'])\n", + "run = exp.submit(est)\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Intelligent hyperparameter tuning\n", + "\n", + "The simple \"hello world\" script above lets the user fix the value of a parameter for the number of Fibonacci numbers in the sequence to log. Similarly, when training models, you can fix values of parameters of the training algorithm itself. E.g. the learning rate, the number of layers, the number of nodes in each layer in a neural network, etc. These adjustable parameters that govern the training process are referred to as the hyperparameters of the model. The goal of hyperparameter tuning is to search across various hyperparameter configurations and find the configuration that results in the best performance.\n", + "\n", + "\n", + "To demonstrate how Azure Machine Learning can help you automate the process of hyperarameter tuning, we will launch multiple runs with different values for numbers in the sequence. First let's define the parameter space using random sampling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", + "from azureml.train.hyperdrive import choice\n", + "\n", + "ps = RandomParameterSampling(\n", + " {\n", + " '--numbers-in-sequence': choice(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will create a new estimator without the above numbers-in-sequence parameter since that will be passed in later. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = Estimator(source_directory='.', script_params={}, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will look at training metrics and early termination policies. When training a model, users are interested in logging and optimizing certain metrics of the model e.g. maximize the accuracy of the model, or minimize loss. This metric is logged by the training script for each run. In our simple script above, we are logging Fibonacci numbers in a sequence. But a training script could just as easily log other metrics like accuracy or loss, which can be used to evaluate the performance of a given training run.\n", + "\n", + "The intelligent hyperparameter tuning capability in Azure Machine Learning automatically terminates poorly performing runs using an early termination policy. Early termination reduces wastage of compute resources and instead uses these resources for exploring other hyperparameter configurations. In this example, we use the BanditPolicy. This basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML will terminate the training run. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we are ready to configure a run configuration object for hyperparameter tuning. We need to call out the primary metric that we want the experiment to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script and we specify that we are looking to maximize this value. Next, we control the resource budget for the experiment by setting the maximum total number of training runs to 10. We also set the maximum number of training runs to run concurrently at 4, which is the same as the number of nodes in our computer cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hdc = HyperDriveConfig(estimator=est, \n", + " hyperparameter_sampling=ps, \n", + " policy=policy, \n", + " primary_metric_name='Fibonacci numbers', \n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", + " max_total_runs=10,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's launch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hdr = exp.submit(config=hdc)\n", + "RunDetails(hdr).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When all the runs complete, we can find the run with the best performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run = hdr.get_best_run_by_primary_metric()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can register the model from the best run and use it to deploy a web service that can be used for Inferencing. Details on how how you can do this can be found in the sample folders for the ohter types of estimators.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next Steps\n", + "Now you can proceed to explore the other types of estimators, such as TensorFlow estimator, PyTorch estimator, etc. in the sample folder." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "maxluk" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + }, + "msauthor": "jingywa" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb index 04fb0170..dbd2e274 100644 --- a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb @@ -1,57 +1,57 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import scrapbook as sb\n", - "sb.glue('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "jingywa" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, - "msauthor": "jingywa" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import scrapbook as sb\n", + "sb.glue('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "authors": [ + { + "name": "jingywa" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + }, + "msauthor": "jingywa" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb b/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb index 751f6671..79c2f0e2 100644 --- a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb @@ -1,558 +1,558 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tensorboard Integration with Run History\n", - "\n", - "1. Run a Tensorflow job locally and view its TB output live.\n", - "2. The same, for a DSVM.\n", - "3. And once more, with an AmlCompute cluster.\n", - "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name and create project\n", - "Choose a name for your run history container in the workspace, and create a folder for the project." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from os import path, makedirs\n", - "experiment_name = 'tensorboard-demo'\n", - "\n", - "# experiment folder\n", - "exp_dir = './sample_projects/' + experiment_name\n", - "\n", - "if not path.exists(exp_dir):\n", - " makedirs(exp_dir)\n", - "\n", - "# runs we started in this session, for the finale\n", - "runs = []" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download Tensorflow Tensorboard demo code\n", - "\n", - "Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n", - "\n", - "Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import os\n", - "\n", - "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", - "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", - " file.write(tf_code.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure and run locally\n", - "\n", - "We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n", - "\n", - "Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "\n", - "# Create a run configuration.\n", - "run_config = RunConfiguration()\n", - "run_config.environment.python.user_managed_dependencies = True\n", - "\n", - "# You can choose a specific Python environment by pointing to a Python path \n", - "#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "from azureml.core.script_run_config import ScriptRunConfig\n", - "\n", - "logs_dir = os.path.join(os.curdir, \"logs\")\n", - "data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n", - "\n", - "if not path.exists(data_dir):\n", - " makedirs(data_dir)\n", - "\n", - "os.environ[\"TEST_TMPDIR\"] = data_dir\n", - "\n", - "# Writing logs to ./logs results in their being uploaded to Artifact Service,\n", - "# and thus, made accessible to our Tensorboard instance.\n", - "arguments_list = [\"--log_dir\", logs_dir]\n", - "\n", - "# Create an experiment\n", - "exp = Experiment(ws, experiment_name)\n", - "\n", - "# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n", - "# arguments_list += [\"--max_steps\", \"5000\"]\n", - "\n", - "script = ScriptRunConfig(exp_dir,\n", - " script=\"mnist_with_summaries.py\",\n", - " run_config=run_config,\n", - " arguments=arguments_list)\n", - "\n", - "run = exp.submit(script)\n", - "# You can also wait for the run to complete\n", - "# run.wait_for_completion(show_output=True)\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard\n", - "\n", - "Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.tensorboard import Tensorboard\n", - "\n", - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Now, with a DSVM\n", - "\n", - "Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n", - "Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n", - "\n", - "If you are unfamiliar with DSVM configuration, check [Train in a remote VM](../../training/train-on-remote-vm/train-on-remote-vm.ipynb) for a more detailed breakdown.\n", - "\n", - "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The `DSVMCompute` class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n", - "\n", - "```shell\n", - "# create a DSVM in your resource group\n", - "# note you need to be at least a contributor to the resource group in order to execute this command successfully.\n", - "(myenv) $ az vm create --resource-group --name --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username --admin-password --generate-ssh-keys --authentication-type password\n", - "```\n", - "You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, RemoteCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "username = os.getenv('AZUREML_DSVM_USERNAME', default='')\n", - "address = os.getenv('AZUREML_DSVM_ADDRESS', default='')\n", - "\n", - "compute_target_name = 'cpudsvm'\n", - "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", - "try:\n", - " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", - " print('found existing:', attached_dsvm_compute.name)\n", - "except ComputeTargetException:\n", - " config = RemoteCompute.attach_configuration(username=username,\n", - " address=address,\n", - " ssh_port=22,\n", - " private_key_file='./.ssh/id_rsa')\n", - " attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n", - " \n", - " attached_dsvm_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=attached_dsvm_compute,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Just like before." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Once more, with an AmlCompute cluster\n", - "\n", - "Just to prove we can, let's create an AmlCompute CPU cluster, and run our demo there, as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"cpucluster\"\n", - "\n", - "cts = ws.compute_targets\n", - "found = False\n", - "if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[cluster_name]\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - "compute_target.wait_for_completion(show_output=True, min_node_count=None)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "# print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Again, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=compute_target,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Once more..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Finale\n", - "\n", - "If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs...\n", - "# and it turns out that we have been building one of those all along.\n", - "tb = Tensorboard(runs)\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tensorboard Integration with Run History\n", + "\n", + "1. Run a Tensorflow job locally and view its TB output live.\n", + "2. The same, for a DSVM.\n", + "3. And once more, with an AmlCompute cluster.\n", + "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set experiment name and create project\n", + "Choose a name for your run history container in the workspace, and create a folder for the project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from os import path, makedirs\n", + "experiment_name = 'tensorboard-demo'\n", + "\n", + "# experiment folder\n", + "exp_dir = './sample_projects/' + experiment_name\n", + "\n", + "if not path.exists(exp_dir):\n", + " makedirs(exp_dir)\n", + "\n", + "# runs we started in this session, for the finale\n", + "runs = []" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download Tensorflow Tensorboard demo code\n", + "\n", + "Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n", + "\n", + "Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import os\n", + "\n", + "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", + "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", + " file.write(tf_code.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure and run locally\n", + "\n", + "We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n", + "\n", + "Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "\n", + "# Create a run configuration.\n", + "run_config = RunConfiguration()\n", + "run_config.environment.python.user_managed_dependencies = True\n", + "\n", + "# You can choose a specific Python environment by pointing to a Python path \n", + "#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "from azureml.core.script_run_config import ScriptRunConfig\n", + "\n", + "logs_dir = os.path.join(os.curdir, \"logs\")\n", + "data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n", + "\n", + "if not path.exists(data_dir):\n", + " makedirs(data_dir)\n", + "\n", + "os.environ[\"TEST_TMPDIR\"] = data_dir\n", + "\n", + "# Writing logs to ./logs results in their being uploaded to Artifact Service,\n", + "# and thus, made accessible to our Tensorboard instance.\n", + "arguments_list = [\"--log_dir\", logs_dir]\n", + "\n", + "# Create an experiment\n", + "exp = Experiment(ws, experiment_name)\n", + "\n", + "# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n", + "# arguments_list += [\"--max_steps\", \"5000\"]\n", + "\n", + "script = ScriptRunConfig(exp_dir,\n", + " script=\"mnist_with_summaries.py\",\n", + " run_config=run_config,\n", + " arguments=arguments_list)\n", + "\n", + "run = exp.submit(script)\n", + "# You can also wait for the run to complete\n", + "# run.wait_for_completion(show_output=True)\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard\n", + "\n", + "Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.tensorboard import Tensorboard\n", + "\n", + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Now, with a DSVM\n", + "\n", + "Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n", + "Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n", + "\n", + "If you are unfamiliar with DSVM configuration, check [Train in a remote VM](../../training/train-on-remote-vm/train-on-remote-vm.ipynb) for a more detailed breakdown.\n", + "\n", + "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The `DSVMCompute` class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n", + "\n", + "```shell\n", + "# create a DSVM in your resource group\n", + "# note you need to be at least a contributor to the resource group in order to execute this command successfully.\n", + "(myenv) $ az vm create --resource-group --name --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username --admin-password --generate-ssh-keys --authentication-type password\n", + "```\n", + "You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, RemoteCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "username = os.getenv('AZUREML_DSVM_USERNAME', default='')\n", + "address = os.getenv('AZUREML_DSVM_ADDRESS', default='')\n", + "\n", + "compute_target_name = 'cpudsvm'\n", + "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", + "try:\n", + " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", + " print('found existing:', attached_dsvm_compute.name)\n", + "except ComputeTargetException:\n", + " config = RemoteCompute.attach_configuration(username=username,\n", + " address=address,\n", + " ssh_port=22,\n", + " private_key_file='./.ssh/id_rsa')\n", + " attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n", + " \n", + " attached_dsvm_compute.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit run using TensorFlow estimator\n", + "\n", + "Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params = {\"--log_dir\": \"./logs\"}\n", + "\n", + "# If you want the run to go longer, set --max-steps to a higher number.\n", + "# script_params[\"--max_steps\"] = \"5000\"\n", + "\n", + "tf_estimator = TensorFlow(source_directory=exp_dir,\n", + " compute_target=attached_dsvm_compute,\n", + " entry_script='mnist_with_summaries.py',\n", + " script_params=script_params)\n", + "\n", + "run = exp.submit(tf_estimator)\n", + "\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard with this run\n", + "\n", + "Just like before." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Once more, with an AmlCompute cluster\n", + "\n", + "Just to prove we can, let's create an AmlCompute CPU cluster, and run our demo there, as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"cpucluster\"\n", + "\n", + "cts = ws.compute_targets\n", + "found = False\n", + "if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[cluster_name]\n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + "compute_target.wait_for_completion(show_output=True, min_node_count=None)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "# print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit run using TensorFlow estimator\n", + "\n", + "Again, we can use the TensorFlow estimator and everything is set up automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "script_params = {\"--log_dir\": \"./logs\"}\n", + "\n", + "# If you want the run to go longer, set --max-steps to a higher number.\n", + "# script_params[\"--max_steps\"] = \"5000\"\n", + "\n", + "tf_estimator = TensorFlow(source_directory=exp_dir,\n", + " compute_target=compute_target,\n", + " entry_script='mnist_with_summaries.py',\n", + " script_params=script_params)\n", + "\n", + "run = exp.submit(tf_estimator)\n", + "\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard with this run\n", + "\n", + "Once more..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Finale\n", + "\n", + "If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs...\n", + "# and it turns out that we have been building one of those all along.\n", + "tb = Tensorboard(runs)\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb index 31282d86..a03057a2 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb @@ -1,432 +1,432 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train and hyperparameter tune with Chainer\n", - "\n", - "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './chainer-mnist'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n", - "\n", - "In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n", - "```Python\n", - "run.log('Batch size', np.int(args.batchsize))\n", - "run.log('Epochs', np.int(args.epochs))\n", - "\n", - "run.log('Accuracy', np.float(val_accuracy))\n", - "```\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `chainer_mnist.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('chainer_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'chainer-mnist'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Chainer estimator\n", - "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs. The following code will define a single-node Chainer job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import Chainer\n", - "\n", - "script_params = {\n", - " '--epochs': 10,\n", - " '--batchsize': 128,\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = Chainer(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " pip_packages=['numpy', 'pytest'],\n", - " entry_script='chainer_mnist.py',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to get more details of your run\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters\n", - "Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep\n", - "First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n", - "\n", - "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n", - "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n", - "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", - "from azureml.train.hyperdrive.policy import BanditPolicy\n", - "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", - "from azureml.train.hyperdrive.parameter_expressions import choice\n", - " \n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " \"--batchsize\": choice(128, 256),\n", - " \"--epochs\": choice(5, 10, 20, 40)\n", - " }\n", - ")\n", - "\n", - "hyperdrive_config = HyperDriveConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " primary_metric_name='Accuracy',\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=8,\n", - " max_concurrent_runs=4)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor HyperDrive runs\n", - "You can monitor the progress of the runs with the following Jupyter widget. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train and hyperparameter tune with Chainer\n", + "\n", + "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './chainer-mnist'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare training script\n", + "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", + "\n", + "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n", + "\n", + "In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n", + "```Python\n", + "from azureml.core.run import Run\n", + "run = Run.get_context()\n", + "```\n", + "Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n", + "```Python\n", + "run.log('Batch size', np.int(args.batchsize))\n", + "run.log('Epochs', np.int(args.epochs))\n", + "\n", + "run.log('Accuracy', np.float(val_accuracy))\n", + "```\n", + "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once your script is ready, copy the training script `chainer_mnist.py` into your project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('chainer_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'chainer-mnist'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Chainer estimator\n", + "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs. The following code will define a single-node Chainer job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import Chainer\n", + "\n", + "script_params = {\n", + " '--epochs': 10,\n", + " '--batchsize': 128,\n", + " '--output_dir': './outputs'\n", + "}\n", + "\n", + "estimator = Chainer(source_directory=project_folder, \n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " pip_packages=['numpy', 'pytest'],\n", + " entry_script='chainer_mnist.py',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# to get more details of your run\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tune model hyperparameters\n", + "Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start a hyperparameter sweep\n", + "First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n", + "\n", + "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n", + "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n", + "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", + "from azureml.train.hyperdrive.policy import BanditPolicy\n", + "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", + "from azureml.train.hyperdrive.parameter_expressions import choice\n", + " \n", + "\n", + "param_sampling = RandomParameterSampling( {\n", + " \"--batchsize\": choice(128, 256),\n", + " \"--epochs\": choice(5, 10, 20, 40)\n", + " }\n", + ")\n", + "\n", + "hyperdrive_config = HyperDriveConfig(estimator=estimator,\n", + " hyperparameter_sampling=param_sampling, \n", + " primary_metric_name='Accuracy',\n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", + " max_total_runs=8,\n", + " max_concurrent_runs=4)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, lauch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# start the HyperDrive run\n", + "hyperdrive_run = experiment.submit(hyperdrive_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor HyperDrive runs\n", + "You can monitor the progress of the runs with the following Jupyter widget. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(hyperdrive_run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb index afe3a2f3..42e0f230 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb @@ -1,1178 +1,1178 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" - } - }, - "source": [ - "# Training, hyperparameter tune, and deploy with Keras\n", - "\n", - "## Introduction\n", - "This tutorial shows how to train a simple deep neural network using the MNIST dataset and Keras on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", - "\n", - "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", - "\n", - "## Prerequisite:\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* For local scoring test, you will also need to have `tensorflow` and `keras` installed in the current Jupyter kernel." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's get started. First let's import some Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" - } - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import os\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" - } - }, - "outputs": [], - "source": [ - "import azureml\n", - "from azureml.core import Workspace\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" - } - }, - "source": [ - "## Create an Azure ML experiment\n", - "Let's create an experiment named \"keras-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" - } - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "script_folder = './keras-mnist'\n", - "os.makedirs(script_folder, exist_ok=True)\n", - "\n", - "exp = Experiment(workspace=ws, name='keras-mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "defe921f-8097-44c3-8336-8af6700804a7" - } - }, - "source": [ - "## Download MNIST dataset\n", - "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "## Show some sample images\n", - "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "from utils import load_data, one_hot_encode\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", - "\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload MNIST dataset to default datastore \n", - "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpu-cluster\" of type `AmlCompute`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compute_targets = ws.compute_targets\n", - "for name, ct in compute_targets.items():\n", - " print(name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy the training files into the script folder\n", - "The Keras training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "# the training logic is in the keras_mnist.py file.\n", - "shutil.copy('./keras_mnist.py', script_folder)\n", - "\n", - "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", - "shutil.copy('./utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" - } - }, - "source": [ - "## Construct neural network in Keras\n", - "In the training script `keras_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", - "\n", - "![DNN](nn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Azure ML concepts \n", - "Please note the following three things in the code below:\n", - "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", - "```\n", - " parser = argparse.ArgumentParser()\n", - " parser.add_argument('--data_folder')\n", - "```\n", - "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the loss and accuracy at the end of each epoch via callback.\n", - "```\n", - " run.log('Loss', log['loss'])\n", - " run.log('Accuracy', log['acc'])\n", - "```\n", - "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next cell will print out the training code for you to inspect." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(os.path.join(script_folder, './keras_mnist.py'), 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create TensorFlow estimator & add Keras\n", - "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpu-cluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", - "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework obviously), and `matplotlib` package for plotting a \"Loss vs. Accuracy\" chart and record it in run history." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\n", - " '--data-folder': ds.path('mnist').as_mount(),\n", - " '--batch-size': 50,\n", - " '--first-layer-neurons': 300,\n", - " '--second-layer-neurons': 100,\n", - " '--learning-rate': 0.001\n", - "}\n", - "\n", - "est = TensorFlow(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target, \n", - " pip_packages=['keras', 'matplotlib'],\n", - " entry_script='keras_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And if you are curious, this is what the mounting point looks like:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(ds.path('mnist').as_mount())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit job to run\n", - "Submit the estimator to the Azure ML experiment to kick off the execution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(est)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the Run\n", - "As the Run is executed, it will go through the following stages:\n", - "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", - "\n", - "2. Scaling: If the compute needs to be scaled up (i.e. the AmlCompute cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", - "\n", - "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", - "\n", - "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", - "\n", - "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", - "\n", - "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the outputs of the training script, it prints out the Keras version number. Please make a note of it." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The Run object\n", - "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", - "* `run.get_details()`: Provides a rich set of properties of the run\n", - "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", - "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", - "\n", - "Below are some examples -- please run through them and inspect their output. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_details()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download the saved model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpu-cluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a model folder in the current directory\n", - "os.makedirs('./model', exist_ok=True)\n", - "\n", - "for f in run.get_file_names():\n", - " if f.startswith('outputs/model'):\n", - " output_file_path = os.path.join('./model', f.split('/')[-1])\n", - " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", - " run.download_file(name=f, output_file_path=output_file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Predict on the test set\n", - "Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import keras\n", - "import tensorflow as tf\n", - "\n", - "print(\"Keras version:\", keras.__version__)\n", - "print(\"Tensorflow version:\", tf.__version__)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's load the downloaded model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from keras.models import model_from_json\n", - "\n", - "# load json and create model\n", - "json_file = open('model/model.json', 'r')\n", - "loaded_model_json = json_file.read()\n", - "json_file.close()\n", - "loaded_model = model_from_json(loaded_model_json)\n", - "# load weights into new model\n", - "loaded_model.load_weights(\"model/model.h5\")\n", - "print(\"Model loaded from disk.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feed test dataset to the persisted model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# evaluate loaded model on test data\n", - "loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])\n", - "y_test_ohe = one_hot_encode(y_test, 10)\n", - "y_hat = np.argmax(loaded_model.predict(X_test), axis=1)\n", - "\n", - "# print the first 30 labels and predictions\n", - "print('labels: \\t', y_test[:30])\n", - "print('predictions:\\t', y_hat[:30])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calculate the overall accuracy by comparing the predicted value against the test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Intelligent hyperparameter tuning\n", - "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", - "from azureml.train.hyperdrive import choice, loguniform\n", - "\n", - "ps = RandomParameterSampling(\n", - " {\n", - " '--batch-size': choice(25, 50, 100),\n", - " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", - " '--second-layer-neurons': choice(10, 50, 200, 500),\n", - " '--learning-rate': loguniform(-6, -1)\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will create a new estimator without the above parameters since they will be passed in later by Hyperdrive configuration. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = TensorFlow(source_directory=script_folder,\n", - " script_params={'--data-folder': ds.path('mnist').as_mount()},\n", - " compute_target=compute_target,\n", - " conda_packages=['keras', 'matplotlib'],\n", - " entry_script='keras_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we are ready to configure a run configuration object, and specify the primary metric `Accuracy` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hdc = HyperDriveConfig(estimator=est, \n", - " hyperparameter_sampling=ps, \n", - " policy=policy, \n", - " primary_metric_name='Accuracy', \n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", - " max_total_runs=20,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's launch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hdr = exp.submit(config=hdc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can use a run history widget to show the progress. Be patient as this might take a while to complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hdr).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hdr.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Find and register best model\n", - "When all the jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hdr.get_best_run_by_primary_metric()\n", - "print(best_run.get_details()['runDefinition']['arguments'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `keras-dnn-mnist` under the workspace for deployment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='keras-mlp-mnist', model_path='outputs/model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model in ACI\n", - "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", - "### Create score.py\n", - "First, we will create a scoring script that will be invoked by the web service call. \n", - "\n", - "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", - " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", - " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "from keras.models import model_from_json\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " \n", - " model_root = Model.get_model_path('keras-mlp-mnist')\n", - " # load json and create model\n", - " json_file = open(os.path.join(model_root, 'model.json'), 'r')\n", - " model_json = json_file.read()\n", - " json_file.close()\n", - " model = model_from_json(model_json)\n", - " # load weights into new model\n", - " model.load_weights(os.path.join(model_root, \"model.h5\")) \n", - " model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])\n", - " \n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " y_hat = np.argmax(model.predict(data), axis=1)\n", - " return y_hat.tolist()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `tensorflow` and `keras`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import CondaDependencies\n", - "\n", - "cd = CondaDependencies.create()\n", - "cd.add_conda_package('tensorflow')\n", - "cd.add_conda_package('keras')\n", - "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", - "\n", - "print(cd.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " auth_enabled=True, # this flag generates API keys to secure access\n", - " memory_gb=1, \n", - " tags={'name':'mnist', 'framework': 'Keras'},\n", - " description='Keras MLP on MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Deployment Process\n", - "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", - "1. **Build Docker image** \n", - "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` object. \n", - "2. **Register image** \n", - "Register that image under the workspace. \n", - "3. **Ship to ACI** \n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='keras-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.get_logs())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is the scoring web service endpoint:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the deployed model\n", - "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", - "\n", - "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding='utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = service.run(input_data=test_samples)\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can retreive the API keys used for accessing the HTTP endpoint." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# retreive the API keys. two keys were generated.\n", - "key1, Key2 = service.get_keys()\n", - "print(key1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can now send construct raw HTTP request and send to the service. Don't forget to add key to the HTTP header." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the workspace after the web service was deployed. You should see \n", - "* a registered model named 'keras-mlp-mnist' and with the id 'model:1'\n", - "* an image called 'keras-mnist-svc' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", - "* a webservice called 'keras-mnist-svc' with some scoring URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", - " \n", - "images = ws.images\n", - "for name, image in images.items():\n", - " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", - " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up\n", - "You can delete the ACI deployment with a simple delete API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "maxluk" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - }, - "msauthor": "maxluk" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" + } + }, + "source": [ + "# Training, hyperparameter tune, and deploy with Keras\n", + "\n", + "## Introduction\n", + "This tutorial shows how to train a simple deep neural network using the MNIST dataset and Keras on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", + "\n", + "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* For local scoring test, you will also need to have `tensorflow` and `keras` installed in the current Jupyter kernel." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's get started. First let's import some Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import os\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" + } + }, + "outputs": [], + "source": [ + "import azureml\n", + "from azureml.core import Workspace\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" + } + }, + "source": [ + "## Create an Azure ML experiment\n", + "Let's create an experiment named \"keras-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" + } + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "script_folder = './keras-mnist'\n", + "os.makedirs(script_folder, exist_ok=True)\n", + "\n", + "exp = Experiment(workspace=ws, name='keras-mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "defe921f-8097-44c3-8336-8af6700804a7" + } + }, + "source": [ + "## Download MNIST dataset\n", + "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib\n", + "\n", + "os.makedirs('./data/mnist', exist_ok=True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/mnist/train-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/mnist/train-labels.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/mnist/test-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "## Show some sample images\n", + "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "from utils import load_data, one_hot_encode\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", + "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", + "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", + "\n", + "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", + "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", + "\n", + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload MNIST dataset to default datastore \n", + "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpu-cluster\" of type `AmlCompute`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compute_targets = ws.compute_targets\n", + "for name, ct in compute_targets.items():\n", + " print(name, ct.type, ct.provisioning_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Copy the training files into the script folder\n", + "The Keras training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "# the training logic is in the keras_mnist.py file.\n", + "shutil.copy('./keras_mnist.py', script_folder)\n", + "\n", + "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", + "shutil.copy('./utils.py', script_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" + } + }, + "source": [ + "## Construct neural network in Keras\n", + "In the training script `keras_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", + "\n", + "![DNN](nn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Azure ML concepts \n", + "Please note the following three things in the code below:\n", + "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", + "```\n", + " parser = argparse.ArgumentParser()\n", + " parser.add_argument('--data_folder')\n", + "```\n", + "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the loss and accuracy at the end of each epoch via callback.\n", + "```\n", + " run.log('Loss', log['loss'])\n", + " run.log('Accuracy', log['acc'])\n", + "```\n", + "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next cell will print out the training code for you to inspect." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(os.path.join(script_folder, './keras_mnist.py'), 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create TensorFlow estimator & add Keras\n", + "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpu-cluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", + "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework obviously), and `matplotlib` package for plotting a \"Loss vs. Accuracy\" chart and record it in run history." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params = {\n", + " '--data-folder': ds.path('mnist').as_mount(),\n", + " '--batch-size': 50,\n", + " '--first-layer-neurons': 300,\n", + " '--second-layer-neurons': 100,\n", + " '--learning-rate': 0.001\n", + "}\n", + "\n", + "est = TensorFlow(source_directory=script_folder,\n", + " script_params=script_params,\n", + " compute_target=compute_target, \n", + " pip_packages=['keras', 'matplotlib'],\n", + " entry_script='keras_mnist.py', \n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And if you are curious, this is what the mounting point looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(ds.path('mnist').as_mount())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit job to run\n", + "Submit the estimator to the Azure ML experiment to kick off the execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(est)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the Run\n", + "As the Run is executed, it will go through the following stages:\n", + "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", + "\n", + "2. Scaling: If the compute needs to be scaled up (i.e. the AmlCompute cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", + "\n", + "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", + "\n", + "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", + "\n", + "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", + "\n", + "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the outputs of the training script, it prints out the Keras version number. Please make a note of it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Run object\n", + "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", + "* `run.get_details()`: Provides a rich set of properties of the run\n", + "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", + "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", + "\n", + "Below are some examples -- please run through them and inspect their output. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_details()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download the saved model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpu-cluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create a model folder in the current directory\n", + "os.makedirs('./model', exist_ok=True)\n", + "\n", + "for f in run.get_file_names():\n", + " if f.startswith('outputs/model'):\n", + " output_file_path = os.path.join('./model', f.split('/')[-1])\n", + " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", + " run.download_file(name=f, output_file_path=output_file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predict on the test set\n", + "Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import keras\n", + "import tensorflow as tf\n", + "\n", + "print(\"Keras version:\", keras.__version__)\n", + "print(\"Tensorflow version:\", tf.__version__)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's load the downloaded model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from keras.models import model_from_json\n", + "\n", + "# load json and create model\n", + "json_file = open('model/model.json', 'r')\n", + "loaded_model_json = json_file.read()\n", + "json_file.close()\n", + "loaded_model = model_from_json(loaded_model_json)\n", + "# load weights into new model\n", + "loaded_model.load_weights(\"model/model.h5\")\n", + "print(\"Model loaded from disk.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feed test dataset to the persisted model to get predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# evaluate loaded model on test data\n", + "loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])\n", + "y_test_ohe = one_hot_encode(y_test, 10)\n", + "y_hat = np.argmax(loaded_model.predict(X_test), axis=1)\n", + "\n", + "# print the first 30 labels and predictions\n", + "print('labels: \\t', y_test[:30])\n", + "print('predictions:\\t', y_hat[:30])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calculate the overall accuracy by comparing the predicted value against the test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Intelligent hyperparameter tuning\n", + "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", + "from azureml.train.hyperdrive import choice, loguniform\n", + "\n", + "ps = RandomParameterSampling(\n", + " {\n", + " '--batch-size': choice(25, 50, 100),\n", + " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", + " '--second-layer-neurons': choice(10, 50, 200, 500),\n", + " '--learning-rate': loguniform(-6, -1)\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will create a new estimator without the above parameters since they will be passed in later by Hyperdrive configuration. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = TensorFlow(source_directory=script_folder,\n", + " script_params={'--data-folder': ds.path('mnist').as_mount()},\n", + " compute_target=compute_target,\n", + " conda_packages=['keras', 'matplotlib'],\n", + " entry_script='keras_mnist.py', \n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we are ready to configure a run configuration object, and specify the primary metric `Accuracy` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hdc = HyperDriveConfig(estimator=est, \n", + " hyperparameter_sampling=ps, \n", + " policy=policy, \n", + " primary_metric_name='Accuracy', \n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", + " max_total_runs=20,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's launch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hdr = exp.submit(config=hdc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use a run history widget to show the progress. Be patient as this might take a while to complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(hdr).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hdr.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Find and register best model\n", + "When all the jobs finish, we can find out the one that has the highest accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run = hdr.get_best_run_by_primary_metric()\n", + "print(best_run.get_details()['runDefinition']['arguments'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's list the model files uploaded during the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(best_run.get_file_names())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then register the folder (and all files in it) as a model named `keras-dnn-mnist` under the workspace for deployment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = best_run.register_model(model_name='keras-mlp-mnist', model_path='outputs/model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy the model in ACI\n", + "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", + "### Create score.py\n", + "First, we will create a scoring script that will be invoked by the web service call. \n", + "\n", + "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", + " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", + " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import os\n", + "from keras.models import model_from_json\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " \n", + " model_root = Model.get_model_path('keras-mlp-mnist')\n", + " # load json and create model\n", + " json_file = open(os.path.join(model_root, 'model.json'), 'r')\n", + " model_json = json_file.read()\n", + " json_file.close()\n", + " model = model_from_json(model_json)\n", + " # load weights into new model\n", + " model.load_weights(os.path.join(model_root, \"model.h5\")) \n", + " model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])\n", + " \n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " y_hat = np.argmax(model.predict(data), axis=1)\n", + " return y_hat.tolist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create myenv.yml\n", + "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `tensorflow` and `keras`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import CondaDependencies\n", + "\n", + "cd = CondaDependencies.create()\n", + "cd.add_conda_package('tensorflow')\n", + "cd.add_conda_package('keras')\n", + "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", + "\n", + "print(cd.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy to ACI\n", + "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " auth_enabled=True, # this flag generates API keys to secure access\n", + " memory_gb=1, \n", + " tags={'name':'mnist', 'framework': 'Keras'},\n", + " description='Keras MLP on MNIST')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deployment Process\n", + "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", + "1. **Build Docker image** \n", + "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` object. \n", + "2. **Register image** \n", + "Register that image under the workspace. \n", + "3. **Ship to ACI** \n", + "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "\n", + "service = Webservice.deploy_from_model(workspace=ws,\n", + " name='keras-mnist-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=imgconfig)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.get_logs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the scoring web service endpoint:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the deployed model\n", + "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", + "\n", + "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "\n", + "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", + "test_samples = bytes(test_samples, encoding='utf8')\n", + "\n", + "# predict using the deployed model\n", + "result = service.run(input_data=test_samples)\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n", + " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can retreive the API keys used for accessing the HTTP endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# retreive the API keys. two keys were generated.\n", + "key1, Key2 = service.get_keys()\n", + "print(key1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now send construct raw HTTP request and send to the service. Don't forget to add key to the HTTP header." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "\n", + "# send a random row from the test set to score\n", + "random_index = np.random.randint(0, len(X_test)-1)\n", + "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", + "\n", + "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", + "\n", + "print(\"POST to url\", service.scoring_uri)\n", + "#print(\"input data:\", input_data)\n", + "print(\"label:\", y_test[random_index])\n", + "print(\"prediction:\", resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the workspace after the web service was deployed. You should see \n", + "* a registered model named 'keras-mlp-mnist' and with the id 'model:1'\n", + "* an image called 'keras-mnist-svc' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", + "* a webservice called 'keras-mnist-svc' with some scoring URL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, model in models.items():\n", + " print(\"Model: {}, ID: {}\".format(name, model.id))\n", + " \n", + "images = ws.images\n", + "for name, image in images.items():\n", + " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", + " \n", + "webservices = ws.webservices\n", + "for name, webservice in webservices.items():\n", + " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up\n", + "You can delete the ACI deployment with a simple delete API call." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "maxluk" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "msauthor": "maxluk" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb index da497aae..8aa00f03 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb @@ -1,1178 +1,1178 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" - } - }, - "source": [ - "# Training, hyperparameter tune, and deploy with TensorFlow\n", - "\n", - "## Introduction\n", - "This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", - "\n", - "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", - "\n", - "## Prerequisite:\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's get started. First let's import some Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" - } - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import os\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" - } - }, - "outputs": [], - "source": [ - "import azureml\n", - "from azureml.core import Workspace\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" - } - }, - "source": [ - "## Create an Azure ML experiment\n", - "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" - } - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "script_folder = './tf-mnist'\n", - "os.makedirs(script_folder, exist_ok=True)\n", - "\n", - "exp = Experiment(workspace=ws, name='tf-mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "defe921f-8097-44c3-8336-8af6700804a7" - } - }, - "source": [ - "## Download MNIST dataset\n", - "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "## Show some sample images\n", - "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", - "\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload MNIST dataset to default datastore \n", - "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compute_targets = ws.compute_targets\n", - "for name, ct in compute_targets.items():\n", - " print(name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy the training files into the script folder\n", - "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "# the training logic is in the tf_mnist.py file.\n", - "shutil.copy('./tf_mnist.py', script_folder)\n", - "\n", - "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", - "shutil.copy('./utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" - } - }, - "source": [ - "## Construct neural network in TensorFlow\n", - "In the training script `tf_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", - "\n", - "![DNN](nn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Azure ML concepts \n", - "Please note the following three things in the code below:\n", - "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", - "```\n", - " parser = argparse.ArgumentParser()\n", - " parser.add_argument('--data_folder')\n", - "```\n", - "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n", - "```\n", - " run.log('training_acc', np.float(acc_train))\n", - " run.log('validation_acc', np.float(acc_val))\n", - "```\n", - "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next cell will print out the training code for you to inspect it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(os.path.join(script_folder, './tf_mnist.py'), 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create TensorFlow estimator\n", - "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", - "\n", - "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n", - "\n", - "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\n", - " '--data-folder': ws.get_default_datastore().as_mount(),\n", - " '--batch-size': 50,\n", - " '--first-layer-neurons': 300,\n", - " '--second-layer-neurons': 100,\n", - " '--learning-rate': 0.01\n", - "}\n", - "\n", - "est = TensorFlow(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True, \n", - " framework_version='1.13')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit job to run\n", - "Submit the estimator to an Azure ML experiment to kick off the execution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(est)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the Run \n", - "As the Run is executed, it will go through the following stages:\n", - "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", - "\n", - "2. Scaling: If the compute needs to be scaled up (i.e. the Batch AI cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", - "\n", - "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", - "\n", - "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", - "\n", - "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", - "\n", - "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The Run object \n", - "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", - "* `run.get_details()`: Provides a rich set of properties of the run\n", - "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", - "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", - "\n", - "Below are some examples -- please run through them and inspect their output. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_details()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Plot accuracy over epochs\n", - "Since we can retrieve the metrics from the run, we can easily make plots using `matplotlib` in the notebook. Then we can add the plotted image to the run using `run.log_image()`, so all information about the run is kept together." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "os.makedirs('./imgs', exist_ok=True)\n", - "metrics = run.get_metrics()\n", - "\n", - "plt.figure(figsize = (13,5))\n", - "plt.plot(metrics['validation_acc'], 'r-', lw=4, alpha=.6)\n", - "plt.plot(metrics['training_acc'], 'b--', alpha=0.5)\n", - "plt.legend(['Full evaluation set', 'Training set mini-batch'])\n", - "plt.xlabel('epochs', fontsize=14)\n", - "plt.ylabel('accuracy', fontsize=14)\n", - "plt.title('Accuracy over Epochs', fontsize=16)\n", - "run.log_image(name='acc_over_epochs.png', plot=plt)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download the saved model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the training script, a TensorFlow `saver` object is used to persist the model in a local folder (local to the compute target). The model was saved to the `./outputs` folder on the disk of the Batch AI cluster node where the job is run. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `Run` object to download the model files the `saver` object saved. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`. Note the TensorFlow model consists of four files in binary format and they are not human-readable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a model folder in the current directory\n", - "os.makedirs('./model', exist_ok=True)\n", - "\n", - "for f in run.get_file_names():\n", - " if f.startswith('outputs/model'):\n", - " output_file_path = os.path.join('./model', f.split('/')[-1])\n", - " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", - " run.download_file(name=f, output_file_path=output_file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Predict on the test set\n", - "Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n", - "\n", - "Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "\n", - "tf.reset_default_graph()\n", - "\n", - "saver = tf.train.import_meta_graph(\"./model/mnist-tf.model.meta\")\n", - "graph = tf.get_default_graph()\n", - "\n", - "for op in graph.get_operations():\n", - " if op.name.startswith('network'):\n", - " print(op.name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feed test dataset to the persisted model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\n", - "X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - "# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\n", - "output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, './model/mnist-tf.model')\n", - " k = output.eval(feed_dict={X : X_test})\n", - "# get the prediction, which is the index of the element that has the largest probability value.\n", - "y_hat = np.argmax(k, axis=1)\n", - "\n", - "# print the first 30 labels and predictions\n", - "print('labels: \\t', y_test[:30])\n", - "print('predictions:\\t', y_hat[:30])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calculate the overall accuracy by comparing the predicted value against the test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Intelligent hyperparameter tuning\n", - "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", - "from azureml.train.hyperdrive import choice, loguniform\n", - "\n", - "ps = RandomParameterSampling(\n", - " {\n", - " '--batch-size': choice(25, 50, 100),\n", - " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", - " '--second-layer-neurons': choice(10, 50, 200, 500),\n", - " '--learning-rate': loguniform(-6, -1)\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = TensorFlow(source_directory=script_folder,\n", - " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htc = HyperDriveConfig(estimator=est, \n", - " hyperparameter_sampling=ps, \n", - " policy=policy, \n", - " primary_metric_name='validation_acc', \n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", - " max_total_runs=8,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's launch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htr = exp.submit(config=htc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can use a run history widget to show the progress. Be patient as this might take a while to complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(htr).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htr.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Find and register best model \n", - "When all the jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = htr.get_best_run_by_primary_metric()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model in ACI\n", - "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", - "### Create score.py\n", - "First, we will create a scoring script that will be invoked by the web service call. \n", - "\n", - "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", - " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", - " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import tensorflow as tf\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global X, output, sess\n", - " tf.reset_default_graph()\n", - " model_root = Model.get_model_path('tf-dnn-mnist')\n", - " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", - " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - " \n", - " sess = tf.Session()\n", - " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " out = output.eval(session=sess, feed_dict={X: data})\n", - " y_hat = np.argmax(out, axis=1)\n", - " return y_hat.tolist()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import CondaDependencies\n", - "\n", - "cd = CondaDependencies.create()\n", - "cd.add_conda_package('numpy')\n", - "cd.add_tensorflow_conda_package()\n", - "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", - "\n", - "print(cd.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", - " description='Tensorflow DNN on MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Deployment Process\n", - "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", - "1. **Register model** \n", - "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", - "2. **Build Docker image** \n", - "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", - "3. **Register image** \n", - "Register that image under the workspace. \n", - "4. **Ship to ACI** \n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='tf-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.get_logs())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is the scoring web service endpoint:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the deployed model\n", - "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", - "\n", - "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding='utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = service.run(input_data=test_samples)\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also send raw HTTP request to the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the workspace after the web service was deployed. You should see \n", - "* a registered model named 'model' and with the id 'model:1'\n", - "* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", - "* a webservice called 'tf-mnist' with some scoring URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", - " \n", - "images = ws.images\n", - "for name, image in images.items():\n", - " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", - " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up\n", - "You can delete the ACI deployment with a simple delete API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "minxia" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, - "msauthor": "minxia" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" + } + }, + "source": [ + "# Training, hyperparameter tune, and deploy with TensorFlow\n", + "\n", + "## Introduction\n", + "This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", + "\n", + "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's get started. First let's import some Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import os\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" + } + }, + "outputs": [], + "source": [ + "import azureml\n", + "from azureml.core import Workspace\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "Diagnostics" + ] + }, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" + } + }, + "source": [ + "## Create an Azure ML experiment\n", + "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" + } + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "script_folder = './tf-mnist'\n", + "os.makedirs(script_folder, exist_ok=True)\n", + "\n", + "exp = Experiment(workspace=ws, name='tf-mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "defe921f-8097-44c3-8336-8af6700804a7" + } + }, + "source": [ + "## Download MNIST dataset\n", + "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib\n", + "\n", + "os.makedirs('./data/mnist', exist_ok=True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "## Show some sample images\n", + "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "from utils import load_data\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", + "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", + "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", + "\n", + "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", + "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", + "\n", + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload MNIST dataset to default datastore \n", + "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compute_targets = ws.compute_targets\n", + "for name, ct in compute_targets.items():\n", + " print(name, ct.type, ct.provisioning_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Copy the training files into the script folder\n", + "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "# the training logic is in the tf_mnist.py file.\n", + "shutil.copy('./tf_mnist.py', script_folder)\n", + "\n", + "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", + "shutil.copy('./utils.py', script_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" + } + }, + "source": [ + "## Construct neural network in TensorFlow\n", + "In the training script `tf_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", + "\n", + "![DNN](nn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Azure ML concepts \n", + "Please note the following three things in the code below:\n", + "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", + "```\n", + " parser = argparse.ArgumentParser()\n", + " parser.add_argument('--data_folder')\n", + "```\n", + "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n", + "```\n", + " run.log('training_acc', np.float(acc_train))\n", + " run.log('validation_acc', np.float(acc_val))\n", + "```\n", + "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next cell will print out the training code for you to inspect it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(os.path.join(script_folder, './tf_mnist.py'), 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create TensorFlow estimator\n", + "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", + "\n", + "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n", + "\n", + "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params = {\n", + " '--data-folder': ws.get_default_datastore().as_mount(),\n", + " '--batch-size': 50,\n", + " '--first-layer-neurons': 300,\n", + " '--second-layer-neurons': 100,\n", + " '--learning-rate': 0.01\n", + "}\n", + "\n", + "est = TensorFlow(source_directory=script_folder,\n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " entry_script='tf_mnist.py', \n", + " use_gpu=True, \n", + " framework_version='1.13')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit job to run\n", + "Submit the estimator to an Azure ML experiment to kick off the execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(est)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the Run \n", + "As the Run is executed, it will go through the following stages:\n", + "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", + "\n", + "2. Scaling: If the compute needs to be scaled up (i.e. the Batch AI cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", + "\n", + "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", + "\n", + "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", + "\n", + "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", + "\n", + "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Run object \n", + "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", + "* `run.get_details()`: Provides a rich set of properties of the run\n", + "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", + "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", + "\n", + "Below are some examples -- please run through them and inspect their output. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_details()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Plot accuracy over epochs\n", + "Since we can retrieve the metrics from the run, we can easily make plots using `matplotlib` in the notebook. Then we can add the plotted image to the run using `run.log_image()`, so all information about the run is kept together." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "os.makedirs('./imgs', exist_ok=True)\n", + "metrics = run.get_metrics()\n", + "\n", + "plt.figure(figsize = (13,5))\n", + "plt.plot(metrics['validation_acc'], 'r-', lw=4, alpha=.6)\n", + "plt.plot(metrics['training_acc'], 'b--', alpha=0.5)\n", + "plt.legend(['Full evaluation set', 'Training set mini-batch'])\n", + "plt.xlabel('epochs', fontsize=14)\n", + "plt.ylabel('accuracy', fontsize=14)\n", + "plt.title('Accuracy over Epochs', fontsize=16)\n", + "run.log_image(name='acc_over_epochs.png', plot=plt)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download the saved model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the training script, a TensorFlow `saver` object is used to persist the model in a local folder (local to the compute target). The model was saved to the `./outputs` folder on the disk of the Batch AI cluster node where the job is run. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `Run` object to download the model files the `saver` object saved. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`. Note the TensorFlow model consists of four files in binary format and they are not human-readable." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create a model folder in the current directory\n", + "os.makedirs('./model', exist_ok=True)\n", + "\n", + "for f in run.get_file_names():\n", + " if f.startswith('outputs/model'):\n", + " output_file_path = os.path.join('./model', f.split('/')[-1])\n", + " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", + " run.download_file(name=f, output_file_path=output_file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predict on the test set\n", + "Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n", + "\n", + "Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "\n", + "tf.reset_default_graph()\n", + "\n", + "saver = tf.train.import_meta_graph(\"./model/mnist-tf.model.meta\")\n", + "graph = tf.get_default_graph()\n", + "\n", + "for op in graph.get_operations():\n", + " if op.name.startswith('network'):\n", + " print(op.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feed test dataset to the persisted model to get predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\n", + "X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", + "# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\n", + "output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", + "\n", + "with tf.Session() as sess:\n", + " saver.restore(sess, './model/mnist-tf.model')\n", + " k = output.eval(feed_dict={X : X_test})\n", + "# get the prediction, which is the index of the element that has the largest probability value.\n", + "y_hat = np.argmax(k, axis=1)\n", + "\n", + "# print the first 30 labels and predictions\n", + "print('labels: \\t', y_test[:30])\n", + "print('predictions:\\t', y_hat[:30])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calculate the overall accuracy by comparing the predicted value against the test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Intelligent hyperparameter tuning\n", + "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", + "from azureml.train.hyperdrive import choice, loguniform\n", + "\n", + "ps = RandomParameterSampling(\n", + " {\n", + " '--batch-size': choice(25, 50, 100),\n", + " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", + " '--second-layer-neurons': choice(10, 50, 200, 500),\n", + " '--learning-rate': loguniform(-6, -1)\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = TensorFlow(source_directory=script_folder,\n", + " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", + " compute_target=compute_target,\n", + " entry_script='tf_mnist.py', \n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "htc = HyperDriveConfig(estimator=est, \n", + " hyperparameter_sampling=ps, \n", + " policy=policy, \n", + " primary_metric_name='validation_acc', \n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", + " max_total_runs=8,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's launch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "htr = exp.submit(config=htc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use a run history widget to show the progress. Be patient as this might take a while to complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(htr).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "htr.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Find and register best model \n", + "When all the jobs finish, we can find out the one that has the highest accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run = htr.get_best_run_by_primary_metric()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's list the model files uploaded during the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(best_run.get_file_names())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy the model in ACI\n", + "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", + "### Create score.py\n", + "First, we will create a scoring script that will be invoked by the web service call. \n", + "\n", + "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", + " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", + " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import os\n", + "import tensorflow as tf\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global X, output, sess\n", + " tf.reset_default_graph()\n", + " model_root = Model.get_model_path('tf-dnn-mnist')\n", + " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", + " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", + " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", + " \n", + " sess = tf.Session()\n", + " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", + "\n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " out = output.eval(session=sess, feed_dict={X: data})\n", + " y_hat = np.argmax(out, axis=1)\n", + " return y_hat.tolist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create myenv.yml\n", + "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import CondaDependencies\n", + "\n", + "cd = CondaDependencies.create()\n", + "cd.add_conda_package('numpy')\n", + "cd.add_tensorflow_conda_package()\n", + "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", + "\n", + "print(cd.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy to ACI\n", + "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", + " description='Tensorflow DNN on MNIST')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deployment Process\n", + "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", + "1. **Register model** \n", + "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", + "2. **Build Docker image** \n", + "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", + "3. **Register image** \n", + "Register that image under the workspace. \n", + "4. **Ship to ACI** \n", + "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "\n", + "service = Webservice.deploy_from_model(workspace=ws,\n", + " name='tf-mnist-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=imgconfig)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.get_logs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the scoring web service endpoint:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the deployed model\n", + "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", + "\n", + "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "\n", + "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", + "test_samples = bytes(test_samples, encoding='utf8')\n", + "\n", + "# predict using the deployed model\n", + "result = service.run(input_data=test_samples)\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n", + " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also send raw HTTP request to the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "\n", + "# send a random row from the test set to score\n", + "random_index = np.random.randint(0, len(X_test)-1)\n", + "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", + "\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", + "\n", + "print(\"POST to url\", service.scoring_uri)\n", + "#print(\"input data:\", input_data)\n", + "print(\"label:\", y_test[random_index])\n", + "print(\"prediction:\", resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the workspace after the web service was deployed. You should see \n", + "* a registered model named 'model' and with the id 'model:1'\n", + "* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", + "* a webservice called 'tf-mnist' with some scoring URL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models\n", + "for name, model in models.items():\n", + " print(\"Model: {}, ID: {}\".format(name, model.id))\n", + " \n", + "images = ws.images\n", + "for name, image in images.items():\n", + " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", + " \n", + "webservices = ws.webservices\n", + "for name, webservice in webservices.items():\n", + " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up\n", + "You can delete the ACI deployment with a simple delete API call." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "minxia" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/README.md b/how-to-use-azureml/training/README.md index d2855843..a24ee002 100644 --- a/how-to-use-azureml/training/README.md +++ b/how-to-use-azureml/training/README.md @@ -5,9 +5,10 @@ Follow these sample notebooks to learn: 1. [Train within notebook](train-within-notebook): train a simple scikit-learn model using the Jupyter kernel and deploy the model to Azure Container Service. 2. [Train on local](train-on-local): train a model using local computer as compute target. 3. [Train on remote VM](train-on-remote-vm): train a model using a remote Azure VM as compute target. -4. [Train on AmlCompute](train-on-amlcompute): train a model using an AmlCompute cluster as compute target. +4. [Train on ML Compute](train-on-amlcompute): train a model using an ML Compute cluster as compute target. 5. [Train in an HDI Spark cluster](train-in-spark): train a Spark ML model using an HDInsight Spark cluster as compute target. 6. [Logging API](logging-api): experiment with various logging functions to create runs and automatically generate graphs. -7. [Train and hyperparameter tune on Iris Dataset with Scikit-learn](train-hyperparameter-tune-deploy-with-sklearn): train a model using the Scikit-learn estimator and tune hyperparameters with Hyperdrive. +7. [Manage runs](manage-runs): learn different ways how to start runs and child runs, monitor them, and cancel them. +8. [Train and hyperparameter tune on Iris Dataset with Scikit-learn](train-hyperparameter-tune-deploy-with-sklearn): train a model using the Scikit-learn estimator and tune hyperparameters with Hyperdrive. ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/README.png) \ No newline at end of file diff --git a/how-to-use-azureml/training/logging-api/logging-api.ipynb b/how-to-use-azureml/training/logging-api/logging-api.ipynb index 7d6fe6f9..d1ea04fa 100644 --- a/how-to-use-azureml/training/logging-api/logging-api.ipynb +++ b/how-to-use-azureml/training/logging-api/logging-api.ipynb @@ -1,545 +1,545 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/logging-api/logging-api.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Logging\n", - "\n", - "_**This notebook showcases various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal.**_\n", - "\n", - "---\n", - "---\n", - "\n", - "## Table of Contents\n", - "\n", - "1. [Introduction](#Introduction)\n", - "1. [Setup](#Setup)\n", - " 1. Validate Azure ML SDK installation\n", - " 1. Initialize workspace\n", - " 1. Set experiment\n", - "1. [Logging](#Logging)\n", - " 1. Starting a run\n", - " 1. Viewing a run in the portal\n", - " 1. Viewing the experiment in the portal\n", - " 1. Logging metrics\n", - " 1. Logging string metrics\n", - " 1. Logging numeric metrics\n", - " 1. Logging vectors\n", - " 1. Logging tables\n", - " 1. Uploading files\n", - "1. [Analyzing results](#Analyzing-results)\n", - " 1. Tagging a run\n", - "1. [Next steps](#Next-steps)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction\n", - "\n", - "Logging metrics from runs in your experiments allows you to track results from one run to another, determining trends in your outputs and understand how your inputs correspond to your model and script performance. Azure Machine Learning services (AzureML) allows you to track various types of metrics including images and arbitrary files in order to understand, analyze, and audit your experimental progress. \n", - "\n", - "Typically you should log all parameters for your experiment and all numerical and string outputs of your experiment. This will allow you to analyze the performance of your experiments across multiple runs, correlate inputs to outputs, and filter runs based on interesting criteria.\n", - "\n", - "The experiment's Run History report page automatically creates a report that can be customized to show the KPI's, charts, and column sets that are interesting to you. \n", - "\n", - "| ![Run Details](./img/run_details.PNG) | ![Run History](./img/run_history.PNG) |\n", - "|:--:|:--:|\n", - "| *Run Details* | *Run History* |\n", - "\n", - "---\n", - "\n", - "## Setup\n", - "\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also make sure you have tqdm and matplotlib installed in the current kernel.\n", - "\n", - "```\n", - "(myenv) $ conda install -y tqdm matplotlib\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Validate Azure ML SDK installation and get version number for debugging purposes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "install" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment, Workspace, Run\n", - "import azureml.core\n", - "import numpy as np\n", - "from tqdm import tqdm\n", - "\n", - "# Check core SDK version number\n", - "\n", - "print(\"This notebook was created using SDK version 1.0.43, you are currently running version\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Initialize workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set experiment\n", - "Create a new experiment (or get the one with the specified name). An *experiment* is a container for an arbitrary set of *runs*. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment = Experiment(workspace=ws, name='logging-api-test')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## Logging\n", - "In this section we will explore the various logging mechanisms.\n", - "\n", - "### Starting a run\n", - "\n", - "A *run* is a singular experimental trial. In this notebook we will create a run directly on the experiment by calling `run = exp.start_logging()`. If you were experimenting by submitting a script file as an experiment using ``experiment.submit()``, you would call `run = Run.get_context()` in your script to access the run context of your code. In either case, the logging methods on the returned run object work the same.\n", - "\n", - "This cell also stores the run id for use later in this notebook. The run_id is not necessary for logging." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start logging for the run\n", - "run = experiment.start_logging()\n", - "\n", - "# access the run id for use later\n", - "run_id = run.id\n", - "\n", - "# change the scale factor on different runs to see how you can compare multiple runs\n", - "scale_factor = 2\n", - "\n", - "# change the category on different runs to see how to organize data in reports\n", - "category = 'Red'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Viewing a run in the Portal\n", - "Once a run is started you can see the run in the portal by simply typing ``run``. Clicking on the \"Link to Portal\" link will take you to the Run Details page that shows the metrics you have logged and other run properties. You can refresh this page after each logging statement to see the updated results." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Viewing an experiment in the portal\n", - "You can also view an experiement similarly by typing `experiment`. The portal link will take you to the experiment's Run History page that shows all runs and allows you to analyze trends across multiple runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Logging metrics\n", - "Metrics are visible in the run details page in the AzureML portal and also can be analyzed in experiment reports. The run details page looks as below and contains tabs for Details, Outputs, Logs, and Snapshot. \n", - "* The Details page displays attributes about the run, plus logged metrics and images. Metrics that are vectors appear as charts. \n", - "* The Outputs page contains any files, such as models, you uploaded into the \"outputs\" directory from your run into storage. If you place files in the \"outputs\" directory locally, the files are automatically uploaded on your behald when the run is completed.\n", - "* The Logs page allows you to view any log files created by your run. Logging runs created in notebooks typically do not generate log files.\n", - "* The Snapshot page contains a snapshot of the directory specified in the ''start_logging'' statement, plus the notebook at the time of the ''start_logging'' call. This snapshot and notebook can be downloaded from the Run Details page to continue or reproduce an experiment.\n", - "\n", - "### Logging string metrics\n", - "The following cell logs a string metric. A string metric is simply a string value associated with a name. A string metric String metrics are useful for labelling runs and to organize your data. Typically you should log all string parameters as metrics for later analysis - even information such as paths can help to understand how individual experiements perform differently.\n", - "\n", - "String metrics can be used in the following ways:\n", - "* Plot in hitograms\n", - "* Group by indicators for numerical plots\n", - "* Filtering runs\n", - "\n", - "String metrics appear in the **Tracked Metrics** section of the Run Details page and can be added as a column in Run History reports." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# log a string metric\n", - "run.log(name='Category', value=category)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Logging numerical metrics\n", - "The following cell logs some numerical metrics. Numerical metrics can include metrics such as AUC or MSE. You should log any parameter or significant output measure in order to understand trends across multiple experiments. Numerical metrics appear in the **Tracked Metrics** section of the Run Details page, and can be used in charts or KPI's in experiment Run History reports." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# log numerical values\n", - "run.log(name=\"scale factor\", value = scale_factor)\n", - "run.log(name='Magic Number', value=42 * scale_factor)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Logging vectors\n", - "Vectors are good for recording information such as loss curves. You can log a vector by creating a list of numbers, calling ``log_list()`` and supplying a name and the list, or by repeatedly logging a value using the same name.\n", - "\n", - "Vectors are presented in Run Details as a chart, and are directly comparable in experiment reports when placed in a chart. \n", - "\n", - "**Note:** vectors logged into the run are expected to be relatively small. Logging very large vectors into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fibonacci_values = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]\n", - "scaled_values = (i * scale_factor for i in fibonacci_values)\n", - "\n", - "# Log a list of values. Note this will generate a single-variable line chart.\n", - "run.log_list(name='Fibonacci', value=scaled_values)\n", - "\n", - "for i in tqdm(range(-10, 10)):\n", - " # log a metric value repeatedly, this will generate a single-variable line chart.\n", - " run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Logging tables\n", - "Tables are good for recording related sets of information such as accuracy tables, confusion matrices, etc. \n", - "You can log a table in two ways:\n", - "* Create a dictionary of lists where each list represents a column in the table and call ``log_table()``\n", - "* Repeatedly call ``log_row()`` providing the same table name with a consistent set of named args as the column values\n", - "\n", - "Tables are presented in Run Details as a chart using the first two columns of the table \n", - "\n", - "**Note:** tables logged into the run are expected to be relatively small. Logging very large tables into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a dictionary to hold a table of values\n", - "sines = {}\n", - "sines['angle'] = []\n", - "sines['sine'] = []\n", - "\n", - "for i in tqdm(range(-10, 10)):\n", - " angle = i / 2.0 * scale_factor\n", - " \n", - " # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n", - " run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n", - " \n", - " sines['angle'].append(angle)\n", - " sines['sine'].append(np.sin(angle))\n", - "\n", - "# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n", - "run.log_table(name='Sine Wave', value=sines)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Logging images\n", - "You can directly log _matplotlib_ plots and arbitrary images to your run record. This code logs a _matplotlib_ pyplot object. Images show up in the run details page in the Azure ML Portal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "\n", - "# Create a plot\n", - "import matplotlib.pyplot as plt\n", - "angle = np.linspace(-3, 3, 50) * scale_factor\n", - "plt.plot(angle,np.tanh(angle), label='tanh')\n", - "plt.legend(fontsize=12)\n", - "plt.title('Hyperbolic Tangent', fontsize=16)\n", - "plt.grid(True)\n", - "\n", - "# Log the plot to the run. To log an arbitrary image, use the form run.log_image(name, path='./image_path.png')\n", - "run.log_image(name='Hyperbolic Tangent', plot=plt)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Uploading files\n", - "\n", - "Files can also be uploaded explicitly and stored as artifacts along with the run record. These files are also visible in the *Outputs* tab of the Run Details page.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "file_name = 'outputs/myfile.txt'\n", - "\n", - "with open(file_name, \"w\") as f:\n", - " f.write('This is an output file that will be uploaded.\\n')\n", - "\n", - "# Upload the file explicitly into artifacts \n", - "run.upload_file(name = file_name, path_or_stream = file_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Completing the run\n", - "\n", - "Calling `run.complete()` marks the run as completed and triggers the output file collection. If for any reason you need to indicate the run failed or simply need to cancel the run you can call `run.fail()` or `run.cancel()`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.complete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## Analyzing results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can refresh the run in the Azure portal to see all of your results. In many cases you will want to analyze runs that were performed previously to inspect the contents or compare results. Runs can be fetched from their parent Experiment object using the ``Run()`` constructor or the ``experiment.get_runs()`` method. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fetched_run = Run(experiment, run_id)\n", - "fetched_run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Call ``run.get_metrics()`` to retrieve all the metrics from a run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fetched_run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "See the files uploaded for this run by calling ``run.get_file_names()``" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fetched_run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once you know the file names in a run, you can download the files using the ``run.download_file()`` method" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "os.makedirs('files', exist_ok=True)\n", - "\n", - "for f in run.get_file_names():\n", - " dest = os.path.join('files', f.split('/')[-1])\n", - " print('Downloading file {} to {}...'.format(f, dest))\n", - " fetched_run.download_file(f, dest) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Tagging a run\n", - "Often when you analyze the results of a run, you may need to tag that run with important personal or external information. You can add a tag to a run using the ``run.tag()`` method. AzureML supports valueless and valued tags." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fetched_run.tag(\"My Favorite Run\")\n", - "fetched_run.tag(\"Competition Rank\", 1)\n", - "\n", - "fetched_run.get_tags()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps\n", - "To experiment more with logging and to understand how metrics can be visualized, go back to the *Start a run* section, try changing the category and scale_factor values and going through the notebook several times. Play with the KPI, charting, and column selection options on the experiment's Run History reports page to see how the various metrics can be combined and visualized.\n", - "\n", - "After learning about all of the logging options, go to the [train on remote vm](..\\train-on-remote-vm\\train-on-remote-vm.ipynb) notebook and experiment with logging from remote compute contexts." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/logging-api/logging-api.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Logging\n", + "\n", + "_**This notebook showcases various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal.**_\n", + "\n", + "---\n", + "---\n", + "\n", + "## Table of Contents\n", + "\n", + "1. [Introduction](#Introduction)\n", + "1. [Setup](#Setup)\n", + " 1. Validate Azure ML SDK installation\n", + " 1. Initialize workspace\n", + " 1. Set experiment\n", + "1. [Logging](#Logging)\n", + " 1. Starting a run\n", + " 1. Viewing a run in the portal\n", + " 1. Viewing the experiment in the portal\n", + " 1. Logging metrics\n", + " 1. Logging string metrics\n", + " 1. Logging numeric metrics\n", + " 1. Logging vectors\n", + " 1. Logging tables\n", + " 1. Uploading files\n", + "1. [Analyzing results](#Analyzing-results)\n", + " 1. Tagging a run\n", + "1. [Next steps](#Next-steps)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "Logging metrics from runs in your experiments allows you to track results from one run to another, determining trends in your outputs and understand how your inputs correspond to your model and script performance. Azure Machine Learning services (AzureML) allows you to track various types of metrics including images and arbitrary files in order to understand, analyze, and audit your experimental progress. \n", + "\n", + "Typically you should log all parameters for your experiment and all numerical and string outputs of your experiment. This will allow you to analyze the performance of your experiments across multiple runs, correlate inputs to outputs, and filter runs based on interesting criteria.\n", + "\n", + "The experiment's Run History report page automatically creates a report that can be customized to show the KPI's, charts, and column sets that are interesting to you. \n", + "\n", + "| ![Run Details](./img/run_details.PNG) | ![Run History](./img/run_history.PNG) |\n", + "|:--:|:--:|\n", + "| *Run Details* | *Run History* |\n", + "\n", + "---\n", + "\n", + "## Setup\n", + "\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also make sure you have tqdm and matplotlib installed in the current kernel.\n", + "\n", + "```\n", + "(myenv) $ conda install -y tqdm matplotlib\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validate Azure ML SDK installation and get version number for debugging purposes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "install" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment, Workspace, Run\n", + "import azureml.core\n", + "import numpy as np\n", + "from tqdm import tqdm\n", + "\n", + "# Check core SDK version number\n", + "\n", + "print(\"This notebook was created using SDK version AZUREML-SDK-VERSION, you are currently running version\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set experiment\n", + "Create a new experiment (or get the one with the specified name). An *experiment* is a container for an arbitrary set of *runs*. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment = Experiment(workspace=ws, name='logging-api-test')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Logging\n", + "In this section we will explore the various logging mechanisms.\n", + "\n", + "### Starting a run\n", + "\n", + "A *run* is a singular experimental trial. In this notebook we will create a run directly on the experiment by calling `run = exp.start_logging()`. If you were experimenting by submitting a script file as an experiment using ``experiment.submit()``, you would call `run = Run.get_context()` in your script to access the run context of your code. In either case, the logging methods on the returned run object work the same.\n", + "\n", + "This cell also stores the run id for use later in this notebook. The run_id is not necessary for logging." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# start logging for the run\n", + "run = experiment.start_logging()\n", + "\n", + "# access the run id for use later\n", + "run_id = run.id\n", + "\n", + "# change the scale factor on different runs to see how you can compare multiple runs\n", + "scale_factor = 2\n", + "\n", + "# change the category on different runs to see how to organize data in reports\n", + "category = 'Red'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Viewing a run in the Portal\n", + "Once a run is started you can see the run in the portal by simply typing ``run``. Clicking on the \"Link to Portal\" link will take you to the Run Details page that shows the metrics you have logged and other run properties. You can refresh this page after each logging statement to see the updated results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Viewing an experiment in the portal\n", + "You can also view an experiement similarly by typing `experiment`. The portal link will take you to the experiment's Run History page that shows all runs and allows you to analyze trends across multiple runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Logging metrics\n", + "Metrics are visible in the run details page in the AzureML portal and also can be analyzed in experiment reports. The run details page looks as below and contains tabs for Details, Outputs, Logs, and Snapshot. \n", + "* The Details page displays attributes about the run, plus logged metrics and images. Metrics that are vectors appear as charts. \n", + "* The Outputs page contains any files, such as models, you uploaded into the \"outputs\" directory from your run into storage. If you place files in the \"outputs\" directory locally, the files are automatically uploaded on your behald when the run is completed.\n", + "* The Logs page allows you to view any log files created by your run. Logging runs created in notebooks typically do not generate log files.\n", + "* The Snapshot page contains a snapshot of the directory specified in the ''start_logging'' statement, plus the notebook at the time of the ''start_logging'' call. This snapshot and notebook can be downloaded from the Run Details page to continue or reproduce an experiment.\n", + "\n", + "### Logging string metrics\n", + "The following cell logs a string metric. A string metric is simply a string value associated with a name. A string metric String metrics are useful for labelling runs and to organize your data. Typically you should log all string parameters as metrics for later analysis - even information such as paths can help to understand how individual experiements perform differently.\n", + "\n", + "String metrics can be used in the following ways:\n", + "* Plot in hitograms\n", + "* Group by indicators for numerical plots\n", + "* Filtering runs\n", + "\n", + "String metrics appear in the **Tracked Metrics** section of the Run Details page and can be added as a column in Run History reports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# log a string metric\n", + "run.log(name='Category', value=category)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logging numerical metrics\n", + "The following cell logs some numerical metrics. Numerical metrics can include metrics such as AUC or MSE. You should log any parameter or significant output measure in order to understand trends across multiple experiments. Numerical metrics appear in the **Tracked Metrics** section of the Run Details page, and can be used in charts or KPI's in experiment Run History reports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# log numerical values\n", + "run.log(name=\"scale factor\", value = scale_factor)\n", + "run.log(name='Magic Number', value=42 * scale_factor)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logging vectors\n", + "Vectors are good for recording information such as loss curves. You can log a vector by creating a list of numbers, calling ``log_list()`` and supplying a name and the list, or by repeatedly logging a value using the same name.\n", + "\n", + "Vectors are presented in Run Details as a chart, and are directly comparable in experiment reports when placed in a chart. \n", + "\n", + "**Note:** vectors logged into the run are expected to be relatively small. Logging very large vectors into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fibonacci_values = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]\n", + "scaled_values = (i * scale_factor for i in fibonacci_values)\n", + "\n", + "# Log a list of values. Note this will generate a single-variable line chart.\n", + "run.log_list(name='Fibonacci', value=scaled_values)\n", + "\n", + "for i in tqdm(range(-10, 10)):\n", + " # log a metric value repeatedly, this will generate a single-variable line chart.\n", + " run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logging tables\n", + "Tables are good for recording related sets of information such as accuracy tables, confusion matrices, etc. \n", + "You can log a table in two ways:\n", + "* Create a dictionary of lists where each list represents a column in the table and call ``log_table()``\n", + "* Repeatedly call ``log_row()`` providing the same table name with a consistent set of named args as the column values\n", + "\n", + "Tables are presented in Run Details as a chart using the first two columns of the table \n", + "\n", + "**Note:** tables logged into the run are expected to be relatively small. Logging very large tables into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create a dictionary to hold a table of values\n", + "sines = {}\n", + "sines['angle'] = []\n", + "sines['sine'] = []\n", + "\n", + "for i in tqdm(range(-10, 10)):\n", + " angle = i / 2.0 * scale_factor\n", + " \n", + " # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n", + " run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n", + " \n", + " sines['angle'].append(angle)\n", + " sines['sine'].append(np.sin(angle))\n", + "\n", + "# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n", + "run.log_table(name='Sine Wave', value=sines)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logging images\n", + "You can directly log _matplotlib_ plots and arbitrary images to your run record. This code logs a _matplotlib_ pyplot object. Images show up in the run details page in the Azure ML Portal." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# Create a plot\n", + "import matplotlib.pyplot as plt\n", + "angle = np.linspace(-3, 3, 50) * scale_factor\n", + "plt.plot(angle,np.tanh(angle), label='tanh')\n", + "plt.legend(fontsize=12)\n", + "plt.title('Hyperbolic Tangent', fontsize=16)\n", + "plt.grid(True)\n", + "\n", + "# Log the plot to the run. To log an arbitrary image, use the form run.log_image(name, path='./image_path.png')\n", + "run.log_image(name='Hyperbolic Tangent', plot=plt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Uploading files\n", + "\n", + "Files can also be uploaded explicitly and stored as artifacts along with the run record. These files are also visible in the *Outputs* tab of the Run Details page.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "file_name = 'outputs/myfile.txt'\n", + "\n", + "with open(file_name, \"w\") as f:\n", + " f.write('This is an output file that will be uploaded.\\n')\n", + "\n", + "# Upload the file explicitly into artifacts \n", + "run.upload_file(name = file_name, path_or_stream = file_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Completing the run\n", + "\n", + "Calling `run.complete()` marks the run as completed and triggers the output file collection. If for any reason you need to indicate the run failed or simply need to cancel the run you can call `run.fail()` or `run.cancel()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.complete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Analyzing results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can refresh the run in the Azure portal to see all of your results. In many cases you will want to analyze runs that were performed previously to inspect the contents or compare results. Runs can be fetched from their parent Experiment object using the ``Run()`` constructor or the ``experiment.get_runs()`` method. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fetched_run = Run(experiment, run_id)\n", + "fetched_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call ``run.get_metrics()`` to retrieve all the metrics from a run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fetched_run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "See the files uploaded for this run by calling ``run.get_file_names()``" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fetched_run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once you know the file names in a run, you can download the files using the ``run.download_file()`` method" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.makedirs('files', exist_ok=True)\n", + "\n", + "for f in run.get_file_names():\n", + " dest = os.path.join('files', f.split('/')[-1])\n", + " print('Downloading file {} to {}...'.format(f, dest))\n", + " fetched_run.download_file(f, dest) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Tagging a run\n", + "Often when you analyze the results of a run, you may need to tag that run with important personal or external information. You can add a tag to a run using the ``run.tag()`` method. AzureML supports valueless and valued tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fetched_run.tag(\"My Favorite Run\")\n", + "fetched_run.tag(\"Competition Rank\", 1)\n", + "\n", + "fetched_run.get_tags()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "To experiment more with logging and to understand how metrics can be visualized, go back to the *Start a run* section, try changing the category and scale_factor values and going through the notebook several times. Play with the KPI, charting, and column selection options on the experiment's Run History reports page to see how the various metrics can be combined and visualized.\n", + "\n", + "After learning about all of the logging options, go to the [train on remote vm](..\\train-on-remote-vm\\train-on-remote-vm.ipynb) notebook and experiment with logging from remote compute contexts." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/manage-runs/manage-runs.ipynb b/how-to-use-azureml/training/manage-runs/manage-runs.ipynb index ff618ef2..7afc0c29 100644 --- a/how-to-use-azureml/training/manage-runs/manage-runs.ipynb +++ b/how-to-use-azureml/training/manage-runs/manage-runs.ipynb @@ -1,602 +1,602 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Manage runs\n", - "\n", - "## Table of contents\n", - "\n", - "1. [Introduction](#Introduction)\n", - "1. [Setup](#Setup)\n", - "1. [Start, monitor and complete a run](#Start,-monitor-and-complete-a-run)\n", - "1. [Add properties and tags](#Add-properties-and-tags)\n", - "1. [Query properties and tags](#Query-properties-and-tags)\n", - "1. [Start and query child runs](#Start-and-query-child-runs)\n", - "1. [Cancel or fail runs](#Cancel-or-fail-runs)\n", - "1. [Reproduce a run](#Reproduce-a-run)\n", - "1. [Next steps](#Next-steps)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction\n", - "\n", - "When you're building enterprise-grade machine learning models, it is important to track, organize, monitor and reproduce your training runs. For example, you might want to trace the lineage behind a model deployed to production, and re-run the training experiment to troubleshoot issues. \n", - "\n", - "This notebooks shows examples how to use Azure Machine Learning services to manage your training runs." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setup\n", - "\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also, if you're new to Azure ML, we recommend that you go through [the tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) first to learn the basic concepts.\n", - "\n", - "Let's first import required packages, check Azure ML SDK version, connect to your workspace and create an Experiment to hold the runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "from azureml.core import Workspace, Experiment, Run\n", - "from azureml.core import ScriptRunConfig\n", - "\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "exp = Experiment(workspace=ws, name=\"explore-runs\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start, monitor and complete a run\n", - "\n", - "A run is an unit of execution, typically to train a model, but for other purposes as well, such as loading or transforming data. Runs are tracked by Azure ML service, and can be instrumented with metrics and artifact logging.\n", - "\n", - "A simplest way to start a run in your interactive Python session is to call *Experiment.start_logging* method. You can then log metrics from within the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "notebook_run = exp.start_logging()\n", - "\n", - "notebook_run.log(name=\"message\", value=\"Hello from run!\")\n", - "\n", - "print(notebook_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use *get_status method* to get the status of the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(notebook_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Also, you can simply enter the run to get a link to Azure Portal details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "notebook_run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Method *get_details* gives you more details on the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "notebook_run.get_details()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use *complete* method to end the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "notebook_run.complete()\n", - "print(notebook_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also use Python's *with...as* pattern. The run will automatically complete when moving out of scope. This way you don't need to manually complete the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with exp.start_logging() as notebook_run:\n", - " notebook_run.log(name=\"message\", value=\"Hello from run!\")\n", - " print(\"Is it still running?\",notebook_run.get_status())\n", - " \n", - "print(\"Has it completed?\",notebook_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, let's look at submitting a run as a separate Python process. To keep the example simple, we submit the run on local computer. Other targets could include remote VMs and Machine Learning Compute clusters in your Azure ML Workspace.\n", - "\n", - "We use *hello.py* script as an example. To perform logging, we need to get a reference to the Run instance from within the scope of the script. We do this using *Run.get_context* method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!more hello.py" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's submit the run on a local computer. A standard pattern in Azure ML SDK is to create run configuration, and then use *Experiment.submit* method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_config = ScriptRunConfig(source_directory='.', script='hello.py')\n", - "\n", - "local_script_run = exp.submit(run_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can view the status of the run as before" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(local_script_run.get_status())\n", - "local_script_run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Submitted runs have additional log files you can inspect using *get_details_with_logs*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.get_details_with_logs()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use *wait_for_completion* method to block the local execution until remote run is complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.wait_for_completion(show_output=True)\n", - "print(local_script_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Add properties and tags\n", - "\n", - "Properties and tags help you organize your runs. You can use them to describe, for example, who authored the run, what the results were, and what machine learning approach was used. And as you'll later learn, properties and tags can be used to query the history of your runs to find the important ones.\n", - "\n", - "For example, let's add \"author\" property to the run:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.add_properties({\"author\":\"azureml-user\"})\n", - "print(local_script_run.get_properties())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Properties are immutable. Once you assign a value it cannot be changed, making them useful as a permanent record for auditing purposes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "try:\n", - " local_script_run.add_properties({\"author\":\"different-user\"})\n", - "except Exception as e:\n", - " print(e)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Tags on the other hand can be changed:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.tag(\"quality\", \"great run\")\n", - "print(local_script_run.get_tags())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.tag(\"quality\", \"fantastic run\")\n", - "print(local_script_run.get_tags())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also add a simple string tag. It appears in the tag dictionary with value of None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.tag(\"worth another look\")\n", - "print(local_script_run.get_tags())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Query properties and tags\n", - "\n", - "You can quary runs within an experiment that match specific properties and tags. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags={\"quality\":\"fantastic run\"}))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags=\"worth another look\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start and query child runs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use child runs to group together related runs, for example different hyperparameter tuning iterations.\n", - "\n", - "Let's use *hello_with_children* script to create a batch of 5 child runs from within a submitted run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!more hello_with_children.py" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_config = ScriptRunConfig(source_directory='.', script='hello_with_children.py')\n", - "\n", - "local_script_run = exp.submit(run_config)\n", - "local_script_run.wait_for_completion(show_output=True)\n", - "print(local_script_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can start child runs one by one. Note that this is less efficient than submitting a batch of runs, because each creation results in a network call.\n", - "\n", - "Child runs too complete automatically as they move out of scope." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with exp.start_logging() as parent_run:\n", - " for c,count in enumerate(range(5)):\n", - " with parent_run.child_run() as child:\n", - " child.log(name=\"Hello from child run\", value=c)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To query the child runs belonging to specific parent, use *get_children* method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "list(parent_run.get_children())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cancel or fail runs\n", - "\n", - "Sometimes, you realize that the run is not performing as intended, and you want to cancel it instead of waiting for it to complete.\n", - "\n", - "As an example, let's create a Python script with a delay in the middle." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!more hello_with_delay.py" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use *cancel* method to cancel a run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_config = ScriptRunConfig(source_directory='.', script='hello_with_delay.py')\n", - "\n", - "local_script_run = exp.submit(run_config)\n", - "print(\"Did the run start?\",local_script_run.get_status())\n", - "local_script_run.cancel()\n", - "print(\"Did the run cancel?\",local_script_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also mark an unsuccessful run as failed." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run = exp.submit(run_config)\n", - "local_script_run.fail()\n", - "print(local_script_run.get_status())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Reproduce a run\n", - "\n", - "When updating or troubleshooting on a model deployed to production, you sometimes need to revisit the original training run that produced the model. To help you with this, Azure ML service by default creates snapshots of your scripts a the time of run submission:\n", - "\n", - "You can use *restore_snapshot* to obtain a zip package of the latest snapshot of the script folder. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_script_run.restore_snapshot(path=\"snapshots\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can then extract the zip package, examine the code, and submit your run again." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps\n", - "\n", - " * To learn more about logging APIs, see [logging API notebook](./logging-api/logging-api.ipynb)\n", - " * To learn more about remote runs, see [train on AML compute notebook](./train-on-amlcompute/train-on-amlcompute.ipynb)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Manage runs\n", + "\n", + "## Table of contents\n", + "\n", + "1. [Introduction](#Introduction)\n", + "1. [Setup](#Setup)\n", + "1. [Start, monitor and complete a run](#Start,-monitor-and-complete-a-run)\n", + "1. [Add properties and tags](#Add-properties-and-tags)\n", + "1. [Query properties and tags](#Query-properties-and-tags)\n", + "1. [Start and query child runs](#Start-and-query-child-runs)\n", + "1. [Cancel or fail runs](#Cancel-or-fail-runs)\n", + "1. [Reproduce a run](#Reproduce-a-run)\n", + "1. [Next steps](#Next-steps)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "When you're building enterprise-grade machine learning models, it is important to track, organize, monitor and reproduce your training runs. For example, you might want to trace the lineage behind a model deployed to production, and re-run the training experiment to troubleshoot issues. \n", + "\n", + "This notebooks shows examples how to use Azure Machine Learning services to manage your training runs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also, if you're new to Azure ML, we recommend that you go through [the tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) first to learn the basic concepts.\n", + "\n", + "Let's first import required packages, check Azure ML SDK version, connect to your workspace and create an Experiment to hold the runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "from azureml.core import Workspace, Experiment, Run\n", + "from azureml.core import ScriptRunConfig\n", + "\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "exp = Experiment(workspace=ws, name=\"explore-runs\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start, monitor and complete a run\n", + "\n", + "A run is an unit of execution, typically to train a model, but for other purposes as well, such as loading or transforming data. Runs are tracked by Azure ML service, and can be instrumented with metrics and artifact logging.\n", + "\n", + "A simplest way to start a run in your interactive Python session is to call *Experiment.start_logging* method. You can then log metrics from within the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_run = exp.start_logging()\n", + "\n", + "notebook_run.log(name=\"message\", value=\"Hello from run!\")\n", + "\n", + "print(notebook_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use *get_status method* to get the status of the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(notebook_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also, you can simply enter the run to get a link to Azure Portal details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Method *get_details* gives you more details on the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_run.get_details()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use *complete* method to end the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_run.complete()\n", + "print(notebook_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also use Python's *with...as* pattern. The run will automatically complete when moving out of scope. This way you don't need to manually complete the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with exp.start_logging() as notebook_run:\n", + " notebook_run.log(name=\"message\", value=\"Hello from run!\")\n", + " print(\"Is it still running?\",notebook_run.get_status())\n", + " \n", + "print(\"Has it completed?\",notebook_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, let's look at submitting a run as a separate Python process. To keep the example simple, we submit the run on local computer. Other targets could include remote VMs and Machine Learning Compute clusters in your Azure ML Workspace.\n", + "\n", + "We use *hello.py* script as an example. To perform logging, we need to get a reference to the Run instance from within the scope of the script. We do this using *Run.get_context* method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!more hello.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's submit the run on a local computer. A standard pattern in Azure ML SDK is to create run configuration, and then use *Experiment.submit* method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_config = ScriptRunConfig(source_directory='.', script='hello.py')\n", + "\n", + "local_script_run = exp.submit(run_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can view the status of the run as before" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(local_script_run.get_status())\n", + "local_script_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Submitted runs have additional log files you can inspect using *get_details_with_logs*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.get_details_with_logs()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use *wait_for_completion* method to block the local execution until remote run is complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.wait_for_completion(show_output=True)\n", + "print(local_script_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add properties and tags\n", + "\n", + "Properties and tags help you organize your runs. You can use them to describe, for example, who authored the run, what the results were, and what machine learning approach was used. And as you'll later learn, properties and tags can be used to query the history of your runs to find the important ones.\n", + "\n", + "For example, let's add \"author\" property to the run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.add_properties({\"author\":\"azureml-user\"})\n", + "print(local_script_run.get_properties())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Properties are immutable. Once you assign a value it cannot be changed, making them useful as a permanent record for auditing purposes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " local_script_run.add_properties({\"author\":\"different-user\"})\n", + "except Exception as e:\n", + " print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Tags on the other hand can be changed:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.tag(\"quality\", \"great run\")\n", + "print(local_script_run.get_tags())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.tag(\"quality\", \"fantastic run\")\n", + "print(local_script_run.get_tags())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also add a simple string tag. It appears in the tag dictionary with value of None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.tag(\"worth another look\")\n", + "print(local_script_run.get_tags())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Query properties and tags\n", + "\n", + "You can quary runs within an experiment that match specific properties and tags. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags={\"quality\":\"fantastic run\"}))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags=\"worth another look\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start and query child runs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use child runs to group together related runs, for example different hyperparameter tuning iterations.\n", + "\n", + "Let's use *hello_with_children* script to create a batch of 5 child runs from within a submitted run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!more hello_with_children.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_config = ScriptRunConfig(source_directory='.', script='hello_with_children.py')\n", + "\n", + "local_script_run = exp.submit(run_config)\n", + "local_script_run.wait_for_completion(show_output=True)\n", + "print(local_script_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can start child runs one by one. Note that this is less efficient than submitting a batch of runs, because each creation results in a network call.\n", + "\n", + "Child runs too complete automatically as they move out of scope." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with exp.start_logging() as parent_run:\n", + " for c,count in enumerate(range(5)):\n", + " with parent_run.child_run() as child:\n", + " child.log(name=\"Hello from child run\", value=c)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To query the child runs belonging to specific parent, use *get_children* method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(parent_run.get_children())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cancel or fail runs\n", + "\n", + "Sometimes, you realize that the run is not performing as intended, and you want to cancel it instead of waiting for it to complete.\n", + "\n", + "As an example, let's create a Python script with a delay in the middle." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!more hello_with_delay.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use *cancel* method to cancel a run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_config = ScriptRunConfig(source_directory='.', script='hello_with_delay.py')\n", + "\n", + "local_script_run = exp.submit(run_config)\n", + "print(\"Did the run start?\",local_script_run.get_status())\n", + "local_script_run.cancel()\n", + "print(\"Did the run cancel?\",local_script_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also mark an unsuccessful run as failed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run = exp.submit(run_config)\n", + "local_script_run.fail()\n", + "print(local_script_run.get_status())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reproduce a run\n", + "\n", + "When updating or troubleshooting on a model deployed to production, you sometimes need to revisit the original training run that produced the model. To help you with this, Azure ML service by default creates snapshots of your scripts a the time of run submission:\n", + "\n", + "You can use *restore_snapshot* to obtain a zip package of the latest snapshot of the script folder. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_script_run.restore_snapshot(path=\"snapshots\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then extract the zip package, examine the code, and submit your run again." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + " * To learn more about logging APIs, see [logging API notebook](./logging-api/logging-api.ipynb)\n", + " * To learn more about remote runs, see [train on AML compute notebook](./train-on-amlcompute/train-on-amlcompute.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb index 8a8ab037..b7feed04 100644 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb +++ b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb @@ -1,501 +1,501 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train and hyperparameter tune on Iris Dataset with Scikit-learn\n", - "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a support vector machine (SVM) on a single-node CPU with Scikit-learn to perform classification on the popular [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris). We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create AmlCompute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './sklearn-iris'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `train_iris`.py. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script.\n", - "\n", - "In `train_iris.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML Run object within the script:\n", - "\n", - "```python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "\n", - "Further within `train_iris.py`, we log the kernel and penalty parameters, and the highest accuracy the model achieves:\n", - "\n", - "```python\n", - "run.log('Kernel type', np.string(args.kernel))\n", - "run.log('Penalty', np.float(args.penalty))\n", - "\n", - "run.log('Accuracy', np.float(accuracy))\n", - "```\n", - "\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section.\n", - "\n", - "Once your script is ready, copy the training script `train_iris.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('train_iris.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Scikit-learn tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'train_iris'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Scikit-learn estimator" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The Azure ML SDK's Scikit-learn estimator enables you to easily submit Scikit-learn training jobs for single-node runs. The following code will define a single-node Scikit-learn job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.sklearn import SKLearn\n", - "\n", - "script_params = {\n", - " '--kernel': 'linear',\n", - " '--penalty': 1.0,\n", - "}\n", - "\n", - "estimator = SKLearn(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='train_iris.py'\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Monitor your run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.cancel()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that we've seen how to do a simple Scikit-learn training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we will define the hyperparameter space to sweep over. Let's tune the `kernel` and `penalty` parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, `Accuracy`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n", - "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", - "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", - "from azureml.train.hyperdrive.parameter_expressions import choice\n", - " \n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " \"--kernel\": choice('linear', 'rbf', 'poly', 'sigmoid'),\n", - " \"--penalty\": choice(0.5, 1, 1.5)\n", - " }\n", - ")\n", - "\n", - "hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " primary_metric_name='Accuracy',\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=12,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_run_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Monitor HyperDrive runs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can monitor the progress of the runs with the following Jupyter widget." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "dipeck" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - }, - "msauthor": "dipeck" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train and hyperparameter tune on Iris Dataset with Scikit-learn\n", + "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a support vector machine (SVM) on a single-node CPU with Scikit-learn to perform classification on the popular [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris). We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create AmlCompute" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './sklearn-iris'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare training script" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `train_iris`.py. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", + "\n", + "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script.\n", + "\n", + "In `train_iris.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML Run object within the script:\n", + "\n", + "```python\n", + "from azureml.core.run import Run\n", + "run = Run.get_context()\n", + "```\n", + "\n", + "Further within `train_iris.py`, we log the kernel and penalty parameters, and the highest accuracy the model achieves:\n", + "\n", + "```python\n", + "run.log('Kernel type', np.string(args.kernel))\n", + "run.log('Penalty', np.float(args.penalty))\n", + "\n", + "run.log('Accuracy', np.float(accuracy))\n", + "```\n", + "\n", + "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section.\n", + "\n", + "Once your script is ready, copy the training script `train_iris.py` into your project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "shutil.copy('train_iris.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Scikit-learn tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'train_iris'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Scikit-learn estimator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Azure ML SDK's Scikit-learn estimator enables you to easily submit Scikit-learn training jobs for single-node runs. The following code will define a single-node Scikit-learn job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.sklearn import SKLearn\n", + "\n", + "script_params = {\n", + " '--kernel': 'linear',\n", + " '--penalty': 1.0,\n", + "}\n", + "\n", + "estimator = SKLearn(source_directory=project_folder, \n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " entry_script='train_iris.py'\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Monitor your run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.cancel()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tune model hyperparameters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we've seen how to do a simple Scikit-learn training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start a hyperparameter sweep" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, we will define the hyperparameter space to sweep over. Let's tune the `kernel` and `penalty` parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, `Accuracy`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n", + "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", + "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", + "from azureml.train.hyperdrive.parameter_expressions import choice\n", + " \n", + "\n", + "param_sampling = RandomParameterSampling( {\n", + " \"--kernel\": choice('linear', 'rbf', 'poly', 'sigmoid'),\n", + " \"--penalty\": choice(0.5, 1, 1.5)\n", + " }\n", + ")\n", + "\n", + "hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n", + " hyperparameter_sampling=param_sampling, \n", + " primary_metric_name='Accuracy',\n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", + " max_total_runs=12,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, lauch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# start the HyperDrive run\n", + "hyperdrive_run = experiment.submit(hyperdrive_run_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Monitor HyperDrive runs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can monitor the progress of the runs with the following Jupyter widget." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(hyperdrive_run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "dipeck" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + }, + "msauthor": "dipeck" + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb index d7b17dbf..10b02543 100644 --- a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb +++ b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb @@ -1,285 +1,285 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-in-spark/train-in-spark.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 05. Train in Spark\n", - "* Create Workspace\n", - "* Create Experiment\n", - "* Copy relevant files to the script folder\n", - "* Configure and Run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'train-on-spark'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View `train-spark.py`\n", - "\n", - "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train-spark.py` in a cell to show the file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('train-spark.py', 'r') as training_script:\n", - " print(training_script.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note** You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the `train-in-remote-vm` notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as `microsoft/mmlspark:0.12`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Attach an HDI cluster\n", - "Here we will use a actual Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n", - " 1. Create a Spark for HDI cluster in Azure. Here are some [quick instructions](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql). Make sure you use the Ubuntu flavor, NOT CentOS.\n", - " 2. Enter the IP address, username and password below" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, HDInsightCompute\n", - "from azureml.exceptions import ComputeTargetException\n", - "import os\n", - "\n", - "try:\n", - " # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n", - " attach_config = HDInsightCompute.attach_configuration(address=os.environ.get('hdiservername', '-ssh.azurehdinsight.net'), \n", - " ssh_port=22, \n", - " username=os.environ.get('hdiusername', ''), \n", - " password=os.environ.get('hdipassword', ''))\n", - " hdi_compute = ComputeTarget.attach(workspace=ws, \n", - " name='myhdi', \n", - " attach_configuration=attach_config)\n", - "\n", - "except ComputeTargetException as e:\n", - " print(\"Caught = {}\".format(e.message))\n", - " \n", - " \n", - "hdi_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure HDI run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Configure an execution using the HDInsight cluster with a conda environment that has `numpy`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# use pyspark framework\n", - "hdi_run_config = RunConfiguration(framework=\"pyspark\")\n", - "\n", - "# Set compute target to the HDI cluster\n", - "hdi_run_config.target = hdi_compute.name\n", - "\n", - "# specify CondaDependencies object to ask system installing numpy\n", - "cd = CondaDependencies()\n", - "cd.add_conda_package('numpy')\n", - "hdi_run_config.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the script to HDI" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "script_run_config = ScriptRunConfig(source_directory = '.',\n", - " script= 'train-spark.py',\n", - " run_config = hdi_run_config)\n", - "run = exp.submit(config=script_run_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Monitor the run using a Juypter widget" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "After the run is succesfully finished, you can check the metrics logged." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "metrics = run.get_metrics()\n", - "print(metrics)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "aashishb" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-in-spark/train-in-spark.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 05. Train in Spark\n", + "* Create Workspace\n", + "* Create Experiment\n", + "* Copy relevant files to the script folder\n", + "* Configure and Run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Experiment\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'train-on-spark'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## View `train-spark.py`\n", + "\n", + "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train-spark.py` in a cell to show the file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('train-spark.py', 'r') as training_script:\n", + " print(training_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note** You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the `train-in-remote-vm` notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as `microsoft/mmlspark:0.12`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attach an HDI cluster\n", + "Here we will use a actual Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n", + " 1. Create a Spark for HDI cluster in Azure. Here are some [quick instructions](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql). Make sure you use the Ubuntu flavor, NOT CentOS.\n", + " 2. Enter the IP address, username and password below" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, HDInsightCompute\n", + "from azureml.exceptions import ComputeTargetException\n", + "import os\n", + "\n", + "try:\n", + " # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n", + " attach_config = HDInsightCompute.attach_configuration(address=os.environ.get('hdiservername', '-ssh.azurehdinsight.net'), \n", + " ssh_port=22, \n", + " username=os.environ.get('hdiusername', ''), \n", + " password=os.environ.get('hdipassword', ''))\n", + " hdi_compute = ComputeTarget.attach(workspace=ws, \n", + " name='myhdi', \n", + " attach_configuration=attach_config)\n", + "\n", + "except ComputeTargetException as e:\n", + " print(\"Caught = {}\".format(e.message))\n", + " \n", + " \n", + "hdi_compute.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure HDI run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Configure an execution using the HDInsight cluster with a conda environment that has `numpy`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# use pyspark framework\n", + "hdi_run_config = RunConfiguration(framework=\"pyspark\")\n", + "\n", + "# Set compute target to the HDI cluster\n", + "hdi_run_config.target = hdi_compute.name\n", + "\n", + "# specify CondaDependencies object to ask system installing numpy\n", + "cd = CondaDependencies()\n", + "cd.add_conda_package('numpy')\n", + "hdi_run_config.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the script to HDI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory = '.',\n", + " script= 'train-spark.py',\n", + " run_config = hdi_run_config)\n", + "run = exp.submit(config=script_run_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Monitor the run using a Juypter widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After the run is succesfully finished, you can check the metrics logged." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "metrics = run.get_metrics()\n", + "print(metrics)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "aashishb" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb index de79e052..f1aa918c 100644 --- a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb +++ b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb @@ -1,448 +1,448 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train using Azure Machine Learning Compute\n", - "\n", - "* Initialize a Workspace\n", - "* Create an Experiment\n", - "* Introduction to AmlCompute\n", - "* Submit an AmlCompute run in a few different ways\n", - " - Provision as a run based compute target \n", - " - Provision as a persistent compute target (Basic)\n", - " - Provision as a persistent compute target (Advanced)\n", - "* Additional operations to perform on AmlCompute\n", - "* Find the best model in the run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize a Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create An Experiment\n", - "\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "experiment_name = 'train-on-amlcompute'\n", - "experiment = Experiment(workspace = ws, name = experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction to AmlCompute\n", - "\n", - "Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n", - "\n", - "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n", - "\n", - "For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n", - "\n", - "If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n", - "\n", - "**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", - "\n", - "\n", - "The training script `train.py` is already created for you. Let's have a look." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit an AmlCompute run in a few different ways\n", - "\n", - "First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n", - "\n", - "You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "\n", - "AmlCompute.supported_vmsizes(workspace = ws)\n", - "#AmlCompute.supported_vmsizes(workspace = ws, location='southcentralus')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create project directory\n", - "\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import shutil\n", - "\n", - "project_folder = './train-on-amlcompute'\n", - "os.makedirs(project_folder, exist_ok=True)\n", - "shutil.copy('train.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create environment\n", - "\n", - "Create Docker based environment with scikit-learn installed." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Environment\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "myenv = Environment(\"myenv\")\n", - "\n", - "myenv.docker.enabled = True\n", - "myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Provision as a persistent compute target (Basic)\n", - "\n", - "You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n", - "\n", - "* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n", - "* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpu-cluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", - " print('Found existing cluster, use it.')\n", - "except ComputeTargetException:\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", - " max_nodes=4)\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", - "\n", - "cpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure & Run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", - "\n", - "src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n", - "\n", - "# Set compute target to the one created in previous step\n", - "src.run_config.target = cpu_cluster.name\n", - "\n", - "# Set environment\n", - "src.run_config.environment = myenv\n", - " \n", - "run = experiment.submit(config=src)\n", - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Provision as a persistent compute target (Advanced)\n", - "\n", - "You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n", - "\n", - "In addition to `vm_size` and `max_nodes`, you can specify:\n", - "* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n", - "* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n", - "* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n", - "* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n", - "* `vnet_name`: Name of VNet\n", - "* `subnet_name`: Name of SubNet within the VNet" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpu-cluster\"\n", - "\n", - "# Verify that cluster does not exist already\n", - "try:\n", - " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", - " print('Found existing cluster, use it.')\n", - "except ComputeTargetException:\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", - " vm_priority='lowpriority',\n", - " min_nodes=2,\n", - " max_nodes=4,\n", - " idle_seconds_before_scaledown='300',\n", - " vnet_resourcegroup_name='',\n", - " vnet_name='',\n", - " subnet_name='')\n", - " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", - "\n", - "cpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure & Run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set compute target to the one created in previous step\n", - "src.run_config.target = cpu_cluster.name\n", - " \n", - "run = experiment.submit(config=src)\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Additional operations to perform on AmlCompute\n", - "\n", - "You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Get_status () gets the latest status of the AmlCompute target\n", - "cpu_cluster.get_status().serialize()\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n", - "#cpu_cluster.update(min_nodes=1)\n", - "#cpu_cluster.update(max_nodes=10)\n", - "cpu_cluster.update(idle_seconds_before_scaledown=300)\n", - "#cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", - "#'cpu-cluster' in this case but use a different VM family for instance.\n", - "\n", - "#cpu_cluster.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "Great, you are ready to move on to the remaining notebooks." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "nigup" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train using Azure Machine Learning Compute\n", + "\n", + "* Initialize a Workspace\n", + "* Create an Experiment\n", + "* Introduction to AmlCompute\n", + "* Submit an AmlCompute run in a few different ways\n", + " - Provision as a run based compute target \n", + " - Provision as a persistent compute target (Basic)\n", + " - Provision as a persistent compute target (Advanced)\n", + "* Additional operations to perform on AmlCompute\n", + "* Find the best model in the run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize a Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create An Experiment\n", + "\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "experiment_name = 'train-on-amlcompute'\n", + "experiment = Experiment(workspace = ws, name = experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction to AmlCompute\n", + "\n", + "Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n", + "\n", + "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n", + "\n", + "For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n", + "\n", + "If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n", + "\n", + "**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", + "\n", + "\n", + "The training script `train.py` is already created for you. Let's have a look." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit an AmlCompute run in a few different ways\n", + "\n", + "First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n", + "\n", + "You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "\n", + "AmlCompute.supported_vmsizes(workspace = ws)\n", + "#AmlCompute.supported_vmsizes(workspace = ws, location='southcentralus')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create project directory\n", + "\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import shutil\n", + "\n", + "project_folder = './train-on-amlcompute'\n", + "os.makedirs(project_folder, exist_ok=True)\n", + "shutil.copy('train.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create environment\n", + "\n", + "Create Docker based environment with scikit-learn installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "myenv = Environment(\"myenv\")\n", + "\n", + "myenv.docker.enabled = True\n", + "myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Provision as a persistent compute target (Basic)\n", + "\n", + "You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n", + "\n", + "* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n", + "* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=4)\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + "\n", + "cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure & Run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", + "\n", + "src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n", + "\n", + "# Set compute target to the one created in previous step\n", + "src.run_config.target = cpu_cluster.name\n", + "\n", + "# Set environment\n", + "src.run_config.environment = myenv\n", + " \n", + "run = experiment.submit(config=src)\n", + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Provision as a persistent compute target (Advanced)\n", + "\n", + "You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n", + "\n", + "In addition to `vm_size` and `max_nodes`, you can specify:\n", + "* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n", + "* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n", + "* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n", + "* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n", + "* `vnet_name`: Name of VNet\n", + "* `subnet_name`: Name of SubNet within the VNet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " vm_priority='lowpriority',\n", + " min_nodes=2,\n", + " max_nodes=4,\n", + " idle_seconds_before_scaledown='300',\n", + " vnet_resourcegroup_name='',\n", + " vnet_name='',\n", + " subnet_name='')\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + "\n", + "cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure & Run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set compute target to the one created in previous step\n", + "src.run_config.target = cpu_cluster.name\n", + " \n", + "run = experiment.submit(config=src)\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Additional operations to perform on AmlCompute\n", + "\n", + "You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Get_status () gets the latest status of the AmlCompute target\n", + "cpu_cluster.get_status().serialize()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n", + "#cpu_cluster.update(min_nodes=1)\n", + "#cpu_cluster.update(max_nodes=10)\n", + "cpu_cluster.update(idle_seconds_before_scaledown=300)\n", + "#cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", + "#'cpu-cluster' in this case but use a different VM family for instance.\n", + "\n", + "#cpu_cluster.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "Great, you are ready to move on to the remaining notebooks." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "nigup" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb index 917e14c9..e0023b47 100644 --- a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb +++ b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb @@ -1,674 +1,674 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-local/train-on-local.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 02. Train locally\n", - "_**Train a model locally: Directly on your machine and within a Docker container**_\n", - "\n", - "---\n", - "\n", - "\n", - "## Table of contents\n", - "1. [Introduction](#intro)\n", - "1. [Pre-requisites](#pre-reqs)\n", - "1. [Initialize Workspace](#init)\n", - "1. [Create An Experiment](#exp)\n", - "1. [View training and auxiliary scripts](#view)\n", - "1. [Configure & Run](#config-run)\n", - " 1. User-managed environment\n", - " 1. Set the environment up\n", - " 1. Submit the script to run in the user-managed environment\n", - " 1. Get run history details\n", - " 1. System-managed environment\n", - " 1. Set the environment up\n", - " 1. Submit the script to run in the system-managed environment\n", - " 1. Get run history details\n", - " 1. Docker-based execution\n", - " 1. Set the environment up\n", - " 1. Submit the script to run in the system-managed environment\n", - " 1. Get run history details\n", - " 1. Use a custom Docker image\n", - "1. [Query run metrics](#query)\n", - "\n", - "---\n", - "\n", - "## 1. Introduction \n", - "\n", - "In this notebook, we will learn how to:\n", - "\n", - "* Connect to our AML workspace\n", - "* Create or load a workspace\n", - "* Configure & execute a local run in:\n", - " - a user-managed Python environment\n", - " - a system-managed Python environment\n", - " - a Docker environment\n", - "* Query run metrics to find the best model trained in the run\n", - "* Register that model for operationalization" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Pre-requisites \n", - "In this notebook, we assume that you have set your Azure Machine Learning workspace. If you have not, make sure you go through the [configuration notebook](../../../configuration.ipynb) first. In the end, you should have configuration file that contains the subscription ID, resource group and name of your workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Initialize Workspace \n", - "\n", - "Initialize your workspace object from configuration file" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. Create An Experiment \n", - "An experiment is a logical container in an Azure ML Workspace. It contains a series of trials called `Runs`. As such, it hosts run records such as run metrics, logs, and other output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "experiment_name = 'train-on-local'\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. View training and auxiliary scripts \n", - "\n", - "For convenience, we already created the training (`train.py`) script and supportive libraries (`mylib.py`) for you. Take a few minutes to examine both files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./train.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./mylib.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 6. Configure & Run \n", - "### 6.A User-managed environment\n", - "\n", - "#### 6.A.a Set the environment up\n", - "When using a user-managed environment, you are responsible for ensuring that all the necessary packages are available in the Python environment you choose to run the script in." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Environment\n", - "\n", - "# Editing a run configuration property on-fly.\n", - "user_managed_env = Environment(\"user-managed-env\")\n", - "\n", - "user_managed_env.python.user_managed_dependencies = True\n", - "\n", - "# You can choose a specific Python environment by pointing to a Python path \n", - "#user_managed_env.python.interpreter_path = '/home/johndoe/miniconda3/envs/myenv/bin/python'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.A.b Submit the script to run in the user-managed environment\n", - "Whatever the way you manage your environment, you need to use the `ScriptRunConfig` class. It allows you to further configure your run by pointing to the `train.py` script and to the working directory, which also contains the `mylib.py` file. These inputs indeed provide the commands to execute in the run. Once the run is configured, you submit it to your experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory='./', script='train.py')\n", - "src.run_config.environment = user_managed_env" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(src)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.A.c Get run history details\n", - "\n", - "While all calculations were run on your machine (cf. below), by using a `run` you also captured the results of your calculations into your run and experiment. You can then see them on the Azure portal, through the link displayed as output of the following cell.\n", - "\n", - "**Note**: The recording of the computation results into your run was made possible by the `run.log()` commands in the `train.py` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Block any execution to wait until the run finishes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note:** All these calculations were run on your local machine, in the conda environment you defined above. You can find the results in:\n", - "- `~/.azureml/envs/azureml_xxxx` for the conda environment you just created\n", - "- `~/AppData/Local/Temp/azureml_runs/train-on-local_xxxx` for the machine learning models you trained (this path may differ depending on the platform you use). This folder also contains\n", - " - Logs (under azureml_logs/)\n", - " - Output pickled files (under outputs/)\n", - " - The configuration files (credentials, local and docker image setups)\n", - " - The train.py and mylib.py scripts\n", - " - The current notebook\n", - "\n", - "Take a few minutes to examine the output of the cell above. It shows the content of some of the log files, and extra information on the conda environment used." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 6.B System-managed environment\n", - "#### 6.B.a Set the environment up\n", - "Now, instead of managing the setup of the environment yourself, you can ask the system to build a new conda environment for you. The environment is built once, and will be reused in subsequent executions as long as the conda dependencies remain unchanged." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "system_managed_env = Environment(\"system-managed-env\")\n", - "\n", - "system_managed_env.python.user_managed_dependencies = False\n", - "\n", - "# Specify conda dependencies with scikit-learn\n", - "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", - "system_managed_env.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.B.b Submit the script to run in the system-managed environment\n", - "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 minutes.\n", - "\n", - "The commands used to execute the run are then the same as the ones you used above." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src.run_config.environment = system_managed_env\n", - "run = exp.submit(src)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.B.c Get run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 6.C Docker-based execution\n", - "In this section, you will train the same models, but you will do so in a Docker container, on your local machine. For this, you then need to have the Docker engine installed locally. If you don't have it yet, please follow the instructions below.\n", - "\n", - "#### How to install Docker\n", - "\n", - "- [Linux](https://docs.docker.com/install/linux/docker-ce/ubuntu/)\n", - "- [MacOs](https://docs.docker.com/docker-for-mac/install/)\n", - "- [Windows](https://docs.docker.com/docker-for-windows/install/)\n", - "\n", - " In case of issues, troubleshooting documentation can be found [here](https://docs.docker.com/docker-for-windows/troubleshoot/#running-docker-for-windows-in-nested-virtualization-scenarios). Additionally, you can follow the steps below, if Virtualization is not enabled on your machine:\n", - " - Go to Task Manager > Performance\n", - " - Check that Virtualization is enabled\n", - " - If it is not, go to `Start > Settings > Update and security > Recovery > Advanced Startup - Restart now > Troubleshoot > Advanced options > UEFI firmware settings - restart`\n", - " - In the BIOS, go to `Advanced > System options > Click the \"Virtualization Technology (VTx)\" only > Save > Exit > Save all changes` -- This will restart the machine\n", - "\n", - "**Notes**: \n", - "- If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n", - "- If you use a GPU base image, it needs to be used on Microsoft Azure Services such as ACI, AML Compute, Azure VMs, or AKS.\n", - "\n", - "You can also ask the system to pull down a Docker image and execute your scripts in it.\n", - "\n", - "#### 6.C.a Set the environment up\n", - "\n", - "In the cell below, you will configure your run to execute in a Docker container. It will:\n", - "- run on a CPU\n", - "- contain a conda environment in which the scikit-learn library will be installed.\n", - "\n", - "As before, you will finish configuring your run by pointing to the `train.py` and `mylib.py` files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "docker_env = Environment(\"docker-env\")\n", - "\n", - "docker_env.python.user_managed_dependencies = False\n", - "docker_env.docker.enabled = True\n", - "\n", - "# use the default CPU-based Docker image from Azure ML\n", - "print(docker_env.docker.base_image)\n", - "\n", - "# Specify conda dependencies with scikit-learn\n", - "docker_env.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.C.b Submit the script to run in the system-managed environment\n", - "\n", - "The run is now configured and ready to be executed in a Docker container. If you are running this for the first time, the Docker container will get created, as well as the conda environment inside it. This will take several minutes. Once all this is generated, however, this conda environment will be reused as long as you don't change the conda dependencies." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import subprocess\n", - "\n", - "src.run_config.environment = docker_env\n", - "\n", - "# Check if Docker is installed and Linux containers are enabled\n", - "if subprocess.run(\"docker -v\", shell=True).returncode == 0:\n", - " out = subprocess.check_output(\"docker system info\", shell=True).decode('ascii')\n", - " if not \"OSType: linux\" in out:\n", - " print(\"Switch Docker engine to use Linux containers.\")\n", - " else:\n", - " run = exp.submit(src)\n", - "else:\n", - " print(\"Docker engine is not installed.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### Potential issue on Windows and how to solve it\n", - "\n", - "If you are using a Windows machine, the creation of the Docker image may fail, and you may see the following error message\n", - "`docker: Error response from daemon: Drive has not been shared. Failed to launch docker container. Check that docker is running and that C:\\ on Windows and /tmp elsewhere is shared.`\n", - "\n", - "This is because the process above tries to create a linux-based, i.e. non-windows-based, Docker image. To fix this, you can:\n", - "- Open the Docker user interface\n", - "- Navigate to Settings > Shared drives\n", - "- Select C (or both C and D, if you have one)\n", - "- Apply\n", - "\n", - "When this is done, you can try and re-run the command above.\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.C.c Get run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get run history details\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The results obtained here should be the same as those obtained before. However, take a look at the \"Execution summary\" section in the output of the cell above. Look for \"docker\". There, you should see the \"enabled\" field set to True. Compare this to the 2 prior runs (\"enabled\" was then set to False)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6.C.d Use a custom Docker image\n", - "\n", - "You can also specify a custom Docker image, if you don't want to use the default image provided by Azure ML.\n", - "\n", - "You can either pull an image directly from Anaconda:\n", - "```python\n", - "# Use an image available in Docker Hub without authentication\n", - "run_config_docker.environment.docker.base_image = \"continuumio/miniconda3\"\n", - "```\n", - "\n", - "Or one of the images you may already have created:\n", - "```python\n", - "# or, use an image available in your private Azure Container Registry\n", - "run_config_docker.environment.docker.base_image = \"mycustomimage:1.0\"\n", - "run_config_docker.environment.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n", - "run_config_docker.environment.docker.base_image_registry.username = \"username\"\n", - "run_config_docker.environment.docker.base_image_registry.password = \"password\"\n", - "```\n", - "\n", - "##### Where to find my Docker image name and registry credentials\n", - " If you do not know what the name of your Docker image or container registry is, or if you don't know how to access the username and password needed above, proceed as follows:\n", - " - Docker image name:\n", - " - In the portal, under your resource group, click on your current workspace\n", - " - Click on Experiments\n", - " - Click on Images\n", - " - Click on the image of your choice\n", - " - Copy the \"ID\" string\n", - " - In this notebook, replace \"mycustomimage:1/0\" with that ID string\n", - " - Username and password:\n", - " - In the portal, under your resource group, click on the container registry associated with your workspace\n", - " - If you have several and don't know which one you need, click on your workspace, go to Overview and click on the \"Registry\" name on the upper right of the screen\n", - " - There, go to \"Access keys\"\n", - " - Copy the username and one of the passwords\n", - " - In this notebook, replace \"username\" and \"password\" by these values\n", - "\n", - "In any case, you will need to use the lines above in place of the line marked as `# Reference Docker image` in section 6.C.a. \n", - "\n", - "When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, you can skip specifying conda dependencies, and just use the `user_managed_dependencies` option instead:\n", - "```python\n", - "run_config_docker.environment.python.user_managed_dependencies = True\n", - "# path to the Python environment in the custom Docker image\n", - "run_config.environment.python.interpreter_path = '/opt/conda/bin/python'\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 7. Query run metrics \n", - "\n", - "Once your run has completed, you can now extract the metrics you captured by using the `get_metrics` method. As shown in the `train.py` file, these metrics are \"alpha\" and \"mse\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history", - "get metrics" - ] - }, - "outputs": [], - "source": [ - "# Get all metris logged in the run\n", - "run.get_metrics()\n", - "metrics = run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's find the model that has the lowest MSE value logged." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n", - "\n", - "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", - " min(metrics['mse']), \n", - " best_alpha\n", - "))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's compare it to the others" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt\n", - "\n", - "plt.plot(metrics['alpha'], metrics['mse'], marker='o')\n", - "plt.ylabel(\"MSE\")\n", - "plt.xlabel(\"Alpha\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also list all the files that are associated with this run record" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the results obtained above, `ridge_0.40.pkl` is the best performing model. You can now register that particular model with the workspace. Once you have done so, go back to the portal and click on \"Models\". You should see it there." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Supply a model name, and the full path to the serialized model file.\n", - "model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Registered model:\\n --> Name: {}\\n --> Version: {}\\n --> URL: {}\".format(model.name, model.version, model.url))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can now deploy your model by following [this example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb)." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-local/train-on-local.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 02. Train locally\n", + "_**Train a model locally: Directly on your machine and within a Docker container**_\n", + "\n", + "---\n", + "\n", + "\n", + "## Table of contents\n", + "1. [Introduction](#intro)\n", + "1. [Pre-requisites](#pre-reqs)\n", + "1. [Initialize Workspace](#init)\n", + "1. [Create An Experiment](#exp)\n", + "1. [View training and auxiliary scripts](#view)\n", + "1. [Configure & Run](#config-run)\n", + " 1. User-managed environment\n", + " 1. Set the environment up\n", + " 1. Submit the script to run in the user-managed environment\n", + " 1. Get run history details\n", + " 1. System-managed environment\n", + " 1. Set the environment up\n", + " 1. Submit the script to run in the system-managed environment\n", + " 1. Get run history details\n", + " 1. Docker-based execution\n", + " 1. Set the environment up\n", + " 1. Submit the script to run in the system-managed environment\n", + " 1. Get run history details\n", + " 1. Use a custom Docker image\n", + "1. [Query run metrics](#query)\n", + "\n", + "---\n", + "\n", + "## 1. Introduction \n", + "\n", + "In this notebook, we will learn how to:\n", + "\n", + "* Connect to our AML workspace\n", + "* Create or load a workspace\n", + "* Configure & execute a local run in:\n", + " - a user-managed Python environment\n", + " - a system-managed Python environment\n", + " - a Docker environment\n", + "* Query run metrics to find the best model trained in the run\n", + "* Register that model for operationalization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Pre-requisites \n", + "In this notebook, we assume that you have set your Azure Machine Learning workspace. If you have not, make sure you go through the [configuration notebook](../../../configuration.ipynb) first. In the end, you should have configuration file that contains the subscription ID, resource group and name of your workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Initialize Workspace \n", + "\n", + "Initialize your workspace object from configuration file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Create An Experiment \n", + "An experiment is a logical container in an Azure ML Workspace. It contains a series of trials called `Runs`. As such, it hosts run records such as run metrics, logs, and other output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "experiment_name = 'train-on-local'\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. View training and auxiliary scripts \n", + "\n", + "For convenience, we already created the training (`train.py`) script and supportive libraries (`mylib.py`) for you. Take a few minutes to examine both files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./train.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./mylib.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Configure & Run \n", + "### 6.A User-managed environment\n", + "\n", + "#### 6.A.a Set the environment up\n", + "When using a user-managed environment, you are responsible for ensuring that all the necessary packages are available in the Python environment you choose to run the script in." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "\n", + "# Editing a run configuration property on-fly.\n", + "user_managed_env = Environment(\"user-managed-env\")\n", + "\n", + "user_managed_env.python.user_managed_dependencies = True\n", + "\n", + "# You can choose a specific Python environment by pointing to a Python path \n", + "#user_managed_env.python.interpreter_path = '/home/johndoe/miniconda3/envs/myenv/bin/python'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.A.b Submit the script to run in the user-managed environment\n", + "Whatever the way you manage your environment, you need to use the `ScriptRunConfig` class. It allows you to further configure your run by pointing to the `train.py` script and to the working directory, which also contains the `mylib.py` file. These inputs indeed provide the commands to execute in the run. Once the run is configured, you submit it to your experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory='./', script='train.py')\n", + "src.run_config.environment = user_managed_env" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(src)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.A.c Get run history details\n", + "\n", + "While all calculations were run on your machine (cf. below), by using a `run` you also captured the results of your calculations into your run and experiment. You can then see them on the Azure portal, through the link displayed as output of the following cell.\n", + "\n", + "**Note**: The recording of the computation results into your run was made possible by the `run.log()` commands in the `train.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Block any execution to wait until the run finishes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** All these calculations were run on your local machine, in the conda environment you defined above. You can find the results in:\n", + "- `~/.azureml/envs/azureml_xxxx` for the conda environment you just created\n", + "- `~/AppData/Local/Temp/azureml_runs/train-on-local_xxxx` for the machine learning models you trained (this path may differ depending on the platform you use). This folder also contains\n", + " - Logs (under azureml_logs/)\n", + " - Output pickled files (under outputs/)\n", + " - The configuration files (credentials, local and docker image setups)\n", + " - The train.py and mylib.py scripts\n", + " - The current notebook\n", + "\n", + "Take a few minutes to examine the output of the cell above. It shows the content of some of the log files, and extra information on the conda environment used." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 6.B System-managed environment\n", + "#### 6.B.a Set the environment up\n", + "Now, instead of managing the setup of the environment yourself, you can ask the system to build a new conda environment for you. The environment is built once, and will be reused in subsequent executions as long as the conda dependencies remain unchanged." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "system_managed_env = Environment(\"system-managed-env\")\n", + "\n", + "system_managed_env.python.user_managed_dependencies = False\n", + "\n", + "# Specify conda dependencies with scikit-learn\n", + "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", + "system_managed_env.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.B.b Submit the script to run in the system-managed environment\n", + "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 minutes.\n", + "\n", + "The commands used to execute the run are then the same as the ones you used above." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src.run_config.environment = system_managed_env\n", + "run = exp.submit(src)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.B.c Get run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 6.C Docker-based execution\n", + "In this section, you will train the same models, but you will do so in a Docker container, on your local machine. For this, you then need to have the Docker engine installed locally. If you don't have it yet, please follow the instructions below.\n", + "\n", + "#### How to install Docker\n", + "\n", + "- [Linux](https://docs.docker.com/install/linux/docker-ce/ubuntu/)\n", + "- [MacOs](https://docs.docker.com/docker-for-mac/install/)\n", + "- [Windows](https://docs.docker.com/docker-for-windows/install/)\n", + "\n", + " In case of issues, troubleshooting documentation can be found [here](https://docs.docker.com/docker-for-windows/troubleshoot/#running-docker-for-windows-in-nested-virtualization-scenarios). Additionally, you can follow the steps below, if Virtualization is not enabled on your machine:\n", + " - Go to Task Manager > Performance\n", + " - Check that Virtualization is enabled\n", + " - If it is not, go to `Start > Settings > Update and security > Recovery > Advanced Startup - Restart now > Troubleshoot > Advanced options > UEFI firmware settings - restart`\n", + " - In the BIOS, go to `Advanced > System options > Click the \"Virtualization Technology (VTx)\" only > Save > Exit > Save all changes` -- This will restart the machine\n", + "\n", + "**Notes**: \n", + "- If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n", + "- If you use a GPU base image, it needs to be used on Microsoft Azure Services such as ACI, AML Compute, Azure VMs, or AKS.\n", + "\n", + "You can also ask the system to pull down a Docker image and execute your scripts in it.\n", + "\n", + "#### 6.C.a Set the environment up\n", + "\n", + "In the cell below, you will configure your run to execute in a Docker container. It will:\n", + "- run on a CPU\n", + "- contain a conda environment in which the scikit-learn library will be installed.\n", + "\n", + "As before, you will finish configuring your run by pointing to the `train.py` and `mylib.py` files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "docker_env = Environment(\"docker-env\")\n", + "\n", + "docker_env.python.user_managed_dependencies = False\n", + "docker_env.docker.enabled = True\n", + "\n", + "# use the default CPU-based Docker image from Azure ML\n", + "print(docker_env.docker.base_image)\n", + "\n", + "# Specify conda dependencies with scikit-learn\n", + "docker_env.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.C.b Submit the script to run in the system-managed environment\n", + "\n", + "The run is now configured and ready to be executed in a Docker container. If you are running this for the first time, the Docker container will get created, as well as the conda environment inside it. This will take several minutes. Once all this is generated, however, this conda environment will be reused as long as you don't change the conda dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess\n", + "\n", + "src.run_config.environment = docker_env\n", + "\n", + "# Check if Docker is installed and Linux containers are enabled\n", + "if subprocess.run(\"docker -v\", shell=True).returncode == 0:\n", + " out = subprocess.check_output(\"docker system info\", shell=True).decode('ascii')\n", + " if not \"OSType: linux\" in out:\n", + " print(\"Switch Docker engine to use Linux containers.\")\n", + " else:\n", + " run = exp.submit(src)\n", + "else:\n", + " print(\"Docker engine is not installed.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Potential issue on Windows and how to solve it\n", + "\n", + "If you are using a Windows machine, the creation of the Docker image may fail, and you may see the following error message\n", + "`docker: Error response from daemon: Drive has not been shared. Failed to launch docker container. Check that docker is running and that C:\\ on Windows and /tmp elsewhere is shared.`\n", + "\n", + "This is because the process above tries to create a linux-based, i.e. non-windows-based, Docker image. To fix this, you can:\n", + "- Open the Docker user interface\n", + "- Navigate to Settings > Shared drives\n", + "- Select C (or both C and D, if you have one)\n", + "- Apply\n", + "\n", + "When this is done, you can try and re-run the command above.\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.C.c Get run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get run history details\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The results obtained here should be the same as those obtained before. However, take a look at the \"Execution summary\" section in the output of the cell above. Look for \"docker\". There, you should see the \"enabled\" field set to True. Compare this to the 2 prior runs (\"enabled\" was then set to False)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.C.d Use a custom Docker image\n", + "\n", + "You can also specify a custom Docker image, if you don't want to use the default image provided by Azure ML.\n", + "\n", + "You can either pull an image directly from Anaconda:\n", + "```python\n", + "# Use an image available in Docker Hub without authentication\n", + "run_config_docker.environment.docker.base_image = \"continuumio/miniconda3\"\n", + "```\n", + "\n", + "Or one of the images you may already have created:\n", + "```python\n", + "# or, use an image available in your private Azure Container Registry\n", + "run_config_docker.environment.docker.base_image = \"mycustomimage:1.0\"\n", + "run_config_docker.environment.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n", + "run_config_docker.environment.docker.base_image_registry.username = \"username\"\n", + "run_config_docker.environment.docker.base_image_registry.password = \"password\"\n", + "```\n", + "\n", + "##### Where to find my Docker image name and registry credentials\n", + " If you do not know what the name of your Docker image or container registry is, or if you don't know how to access the username and password needed above, proceed as follows:\n", + " - Docker image name:\n", + " - In the portal, under your resource group, click on your current workspace\n", + " - Click on Experiments\n", + " - Click on Images\n", + " - Click on the image of your choice\n", + " - Copy the \"ID\" string\n", + " - In this notebook, replace \"mycustomimage:1/0\" with that ID string\n", + " - Username and password:\n", + " - In the portal, under your resource group, click on the container registry associated with your workspace\n", + " - If you have several and don't know which one you need, click on your workspace, go to Overview and click on the \"Registry\" name on the upper right of the screen\n", + " - There, go to \"Access keys\"\n", + " - Copy the username and one of the passwords\n", + " - In this notebook, replace \"username\" and \"password\" by these values\n", + "\n", + "In any case, you will need to use the lines above in place of the line marked as `# Reference Docker image` in section 6.C.a. \n", + "\n", + "When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, you can skip specifying conda dependencies, and just use the `user_managed_dependencies` option instead:\n", + "```python\n", + "run_config_docker.environment.python.user_managed_dependencies = True\n", + "# path to the Python environment in the custom Docker image\n", + "run_config.environment.python.interpreter_path = '/opt/conda/bin/python'\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Query run metrics \n", + "\n", + "Once your run has completed, you can now extract the metrics you captured by using the `get_metrics` method. As shown in the `train.py` file, these metrics are \"alpha\" and \"mse\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history", + "get metrics" + ] + }, + "outputs": [], + "source": [ + "# Get all metris logged in the run\n", + "run.get_metrics()\n", + "metrics = run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's find the model that has the lowest MSE value logged." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n", + "\n", + "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", + " min(metrics['mse']), \n", + " best_alpha\n", + "))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's compare it to the others" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.plot(metrics['alpha'], metrics['mse'], marker='o')\n", + "plt.ylabel(\"MSE\")\n", + "plt.xlabel(\"Alpha\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also list all the files that are associated with this run record" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the results obtained above, `ridge_0.40.pkl` is the best performing model. You can now register that particular model with the workspace. Once you have done so, go back to the portal and click on \"Models\". You should see it there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Supply a model name, and the full path to the serialized model file.\n", + "model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Registered model:\\n --> Name: {}\\n --> Version: {}\\n --> URL: {}\".format(model.name, model.version, model.url))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can now deploy your model by following [this example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb)." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb index 7f3fd5ac..7971910b 100644 --- a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb +++ b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb @@ -1,607 +1,613 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 04. Train in a remote Linux VM\n", - "* Create Workspace\n", - "* Create `train.py` file\n", - "* Create and Attach a Remote VM (eg. DSVM) as compute resource.\n", - "* Upoad data files into default datastore\n", - "* Configure & execute a run in a few different ways\n", - " - Use system-built conda\n", - " - Use existing Python environment\n", - " - Use Docker \n", - "* Find the best model in the run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'train-on-remote-vm'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's also create a local folder to hold the training script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "script_folder = './vm-run'\n", - "os.makedirs(script_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data files into datastore\n", - "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the default datastore\n", - "ds = ws.get_default_datastore()\n", - "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load diabetes data from `scikit-learn` and save it as 2 local files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_diabetes\n", - "import numpy as np\n", - "\n", - "training_data = load_diabetes()\n", - "np.save(file='./features.npy', arr=training_data['data'])\n", - "np.save(file='./labels.npy', arr=training_data['target'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's upload the 2 files into the default datastore under a path named `diabetes`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload_files(['./features.npy', './labels.npy'], target_path='diabetes', overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View `train.py`\n", - "\n", - "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# copy train.py into the script folder\n", - "import shutil\n", - "shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n", - "\n", - "with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n", - " print(training_script.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create and Attach a DSVM as a compute target\n", - "\n", - "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n", - "\n", - "```shell\n", - "# create a DSVM in your resource group\n", - "# note you need to be at least a contributor to the resource group in order to execute this command successfully\n", - "(myenv) $ az vm create --resource-group --name --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username --admin-password --generate-ssh-keys --authentication-type password\n", - "```\n", - "\n", - "**Note**: You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal\n", - "\n", - "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, RemoteCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "username = os.getenv('AZUREML_DSVM_USERNAME', default='')\n", - "address = os.getenv('AZUREML_DSVM_ADDRESS', default='')\n", - "\n", - "compute_target_name = 'cpudsvm'\n", - "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", - "try:\n", - " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", - " print('found existing:', attached_dsvm_compute.name)\n", - "except ComputeTargetException:\n", - " attach_config = RemoteCompute.attach_configuration(address=address,\n", - " ssh_port=22,\n", - " username=username,\n", - " private_key_file='./.ssh/id_rsa')\n", - " attached_dsvm_compute = ComputeTarget.attach(workspace=ws,\n", - " name=compute_target_name,\n", - " attach_configuration=attach_config)\n", - " attached_dsvm_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run\n", - "First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the compute target." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import DataReferenceConfiguration\n", - "dr = DataReferenceConfiguration(datastore_name=ds.name, \n", - " path_on_datastore='diabetes', \n", - " mode='download', # download files from datastore to compute target\n", - " overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can try a few different ways to run the training script in the VM." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Conda run\n", - "You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Environment\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "conda_env = Environment(\"conda-env\")\n", - "conda_env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory=script_folder, \n", - " script='train.py', \n", - " # pass the datastore reference as a parameter to the training script\n", - " arguments=['--data-folder', str(ds.as_download())] \n", - " ) \n", - "\n", - "src.run_config.framework = \"python\"\n", - "src.run_config.environment = conda_env\n", - "src.run_config.target = attached_dsvm_compute.name\n", - "src.run_config.data_references = {ds.name: dr}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(config=src)\n", - "\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the run object. You can navigate to the Azure portal to see detailed information about the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Native VM run\n", - "You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "conda_env.python.user_managed_dependencies = True" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(config=src)\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to use another script `train2.py` that doesn't have azureml or data store dependencies, and submit it instead." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# copy train2.py into the script folder\n", - "shutil.copy('./train2.py', os.path.join(script_folder, 'train2.py'))\n", - "\n", - "with open(os.path.join(script_folder, './train2.py'), 'r') as training_script:\n", - " print(training_script.read())\n", - " \n", - "src.run_config.data_references = {}\n", - "src.script = \"train2.py\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's try again. And this time it should work fine." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(config=src)\n", - "\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note even in this case you get a run record with some basic statistics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure a Docker run with new conda environment on the VM\n", - "You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "conda_env.docker.enabled = True\n", - "conda_env.python.user_managed_dependencies = False\n", - "\n", - "print('Base Docker image is:', conda_env.docker.base_image)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the Experiment\n", - "Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src.script = \"train.py\"\n", - "src.run_config.data_references = {ds.name: dr}\n", - "\n", - "run = exp.submit(config=src)\n", - "\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Use a custom Docker image instead\n", - "\n", - "You can also specify a custom Docker image if you don't want to use the default image provided by Azure ML.\n", - "\n", - "```python\n", - "# use an image available in Docker Hub without authentication\n", - "conda_env.docker.base_image = \"continuumio/miniconda3\"\n", - "\n", - "# or, use an image available in a private Azure Container Registry\n", - "conda_env.docker.base_image = \"mycustomimage:1.0\"\n", - "conda_env.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n", - "conda_env.docker.base_image_registry.username = \"username\"\n", - "conda_env.docker.base_image_registry.password = \"password\"\n", - "```\n", - "\n", - "When you are using a custom Docker image, you might already have your environment setup properly in a Python environment in the Docker image. In that case, you can skip specifying conda dependencies, and just use `user_managed_dependencies` option instead:\n", - "```python\n", - "conda_env.python.user_managed_dependencies = True\n", - "# path to the Python environment in the custom Docker image\n", - "conda_env.python.interpreter_path = '/opt/conda/bin/python'\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the best model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we have tried various execution modes, we can find the best model from the last run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "run.get_metrics()\n", - "metrics = run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find the index where MSE is the smallest\n", - "indices = list(range(0, len(metrics['mse'])))\n", - "min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n", - "\n", - "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", - " metrics['mse'][min_mse_index], \n", - " metrics['alpha'][min_mse_index]\n", - "))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up compute resource\n", - "\n", - "Use ```detach()``` to detach an existing DSVM from Workspace without deleting it. Use ```delete()``` if you created a new ```DsvmCompute``` and want to delete it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# dsvm_compute.detach()\n", - "# dsvm_compute.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 04. Train in a remote Linux VM\n", + "* Create Workspace\n", + "* Create `train.py` file\n", + "* Create and Attach a Remote VM (eg. DSVM) as compute resource.\n", + "* Upoad data files into default datastore\n", + "* Configure & execute a run in a few different ways\n", + " - Use system-built conda\n", + " - Use existing Python environment\n", + " - Use Docker \n", + "* Find the best model in the run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Experiment\n", + "\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'train-on-remote-vm'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's also create a local folder to hold the training script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "script_folder = './vm-run'\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload data files into datastore\n", + "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the default datastore\n", + "ds = ws.get_default_datastore()\n", + "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load diabetes data from `scikit-learn` and save it as 2 local files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "import numpy as np\n", + "\n", + "training_data = load_diabetes()\n", + "np.save(file='./features.npy', arr=training_data['data'])\n", + "np.save(file='./labels.npy', arr=training_data['target'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's upload the 2 files into the default datastore under a path named `diabetes`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload_files(['./features.npy', './labels.npy'], target_path='diabetes', overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## View `train.py`\n", + "\n", + "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# copy train.py into the script folder\n", + "import shutil\n", + "shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n", + "\n", + "with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n", + " print(training_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create and Attach a DSVM as a compute target\n", + "\n", + "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n", + "\n", + "```shell\n", + "# create a DSVM in your resource group\n", + "# note you need to be at least a contributor to the resource group in order to execute this command successfully\n", + "(myenv) $ az vm create --resource-group --name --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username --admin-password --generate-ssh-keys --authentication-type password\n", + "```\n", + "\n", + "**Note**: You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal\n", + "\n", + "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, RemoteCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "username = os.getenv('AZUREML_DSVM_USERNAME', default='')\n", + "address = os.getenv('AZUREML_DSVM_ADDRESS', default='')\n", + "\n", + "compute_target_name = 'cpudsvm'\n", + "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", + "try:\n", + " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", + " print('found existing:', attached_dsvm_compute.name)\n", + "except ComputeTargetException:\n", + " attach_config = RemoteCompute.attach_configuration(address=address,\n", + " ssh_port=22,\n", + " username=username,\n", + " private_key_file='./.ssh/id_rsa')\n", + " attached_dsvm_compute = ComputeTarget.attach(workspace=ws,\n", + " name=compute_target_name,\n", + " attach_configuration=attach_config)\n", + " attached_dsvm_compute.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run\n", + "First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the compute target." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import DataReferenceConfiguration\n", + "dr = DataReferenceConfiguration(datastore_name=ds.name, \n", + " path_on_datastore='diabetes', \n", + " mode='download', # download files from datastore to compute target\n", + " overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can try a few different ways to run the training script in the VM." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conda run\n", + "You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "conda_env = Environment(\"conda-env\")\n", + "conda_env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory=script_folder, \n", + " script='train.py', \n", + " # pass the datastore reference as a parameter to the training script\n", + " arguments=['--data-folder', str(ds.as_download())] \n", + " ) \n", + "\n", + "src.run_config.framework = \"python\"\n", + "src.run_config.environment = conda_env\n", + "src.run_config.target = attached_dsvm_compute.name\n", + "src.run_config.data_references = {ds.name: dr}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(config=src)\n", + "\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the run object. You can navigate to the Azure portal to see detailed information about the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Native VM run\n", + "You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "conda_env.python.user_managed_dependencies = True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(config=src)\n", + "\n", + "from azureml.exceptions import ActivityFailedException\n", + "\n", + "try:\n", + " run.wait_for_completion(show_output=True)\n", + "except ActivityFailedException as ex:\n", + " print(ex)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to use another script `train2.py` that doesn't have azureml or data store dependencies, and submit it instead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# copy train2.py into the script folder\n", + "shutil.copy('./train2.py', os.path.join(script_folder, 'train2.py'))\n", + "\n", + "with open(os.path.join(script_folder, './train2.py'), 'r') as training_script:\n", + " print(training_script.read())\n", + " \n", + "src.run_config.data_references = {}\n", + "src.script = \"train2.py\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's try again. And this time it should work fine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(config=src)\n", + "\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note even in this case you get a run record with some basic statistics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure a Docker run with new conda environment on the VM\n", + "You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "conda_env.docker.enabled = True\n", + "conda_env.python.user_managed_dependencies = False\n", + "\n", + "print('Base Docker image is:', conda_env.docker.base_image)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the Experiment\n", + "Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src.script = \"train.py\"\n", + "src.run_config.data_references = {ds.name: dr}\n", + "\n", + "run = exp.submit(config=src)\n", + "\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Use a custom Docker image instead\n", + "\n", + "You can also specify a custom Docker image if you don't want to use the default image provided by Azure ML.\n", + "\n", + "```python\n", + "# use an image available in Docker Hub without authentication\n", + "conda_env.docker.base_image = \"continuumio/miniconda3\"\n", + "\n", + "# or, use an image available in a private Azure Container Registry\n", + "conda_env.docker.base_image = \"mycustomimage:1.0\"\n", + "conda_env.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n", + "conda_env.docker.base_image_registry.username = \"username\"\n", + "conda_env.docker.base_image_registry.password = \"password\"\n", + "```\n", + "\n", + "When you are using a custom Docker image, you might already have your environment setup properly in a Python environment in the Docker image. In that case, you can skip specifying conda dependencies, and just use `user_managed_dependencies` option instead:\n", + "```python\n", + "conda_env.python.user_managed_dependencies = True\n", + "# path to the Python environment in the custom Docker image\n", + "conda_env.python.interpreter_path = '/opt/conda/bin/python'\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### View run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Find the best model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we have tried various execution modes, we can find the best model from the last run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "run.get_metrics()\n", + "metrics = run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# find the index where MSE is the smallest\n", + "indices = list(range(0, len(metrics['mse'])))\n", + "min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n", + "\n", + "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", + " metrics['mse'][min_mse_index], \n", + " metrics['alpha'][min_mse_index]\n", + "))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up compute resource\n", + "\n", + "Use ```detach()``` to detach an existing DSVM from Workspace without deleting it. Use ```delete()``` if you created a new ```DsvmCompute``` and want to delete it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# dsvm_compute.detach()\n", + "# dsvm_compute.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb index 3b3ecdf8..f3362e84 100644 --- a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb +++ b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb @@ -1,705 +1,705 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-within-notebook/train-within-notebook.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train and deploy a model\n", - "_**Create and deploy a model directly from a notebook**_\n", - "\n", - "---\n", - "---\n", - "\n", - "## Contents\n", - "1. [Introduction](#Introduction)\n", - "1. [Setup](#Setup)\n", - "1. [Data](#Data)\n", - "1. [Train](#Train)\n", - " 1. Viewing run results\n", - " 1. Simple parameter sweep\n", - " 1. Viewing experiment results\n", - " 1. Select the best model\n", - "1. [Deploy](#Deploy)\n", - " 1. Register the model\n", - " 1. Create a scoring file\n", - " 1. Describe your environment\n", - " 1. Descrice your target compute\n", - " 1. Deploy your webservice\n", - " 1. Test your webservice\n", - " 1. Clean up\n", - "1. [Next Steps](#nextsteps)\n", - "\n", - "---\n", - "\n", - "## Introduction\n", - "Azure Machine Learning provides capabilities to control all aspects of model training and deployment directly from a notebook using the AML Python SDK. In this notebook we will\n", - "* connect to our AML Workspace\n", - "* create an experiment that contains multiple runs with tracked metrics\n", - "* choose the best model created across all runs\n", - "* deploy that model as a service\n", - "\n", - "In the end we will have a model deployed as a web service which we can call from an HTTP endpoint" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## Setup\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. From the configuration, the important sections are the workspace configuration and ACI regristration.\n", - "\n", - "We will also need the following libraries install to our conda environment. If these are not installed, use the following command to do so and restart the notebook.\n", - "```shell\n", - "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", - "```\n", - "\n", - "For this notebook we need the Azure ML SDK and access to our workspace. The following cell imports the SDK, checks the version, and accesses our already configured AzureML workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "install" - ] - }, - "outputs": [], - "source": [ - "import azureml.core\n", - "from azureml.core import Experiment, Workspace\n", - "\n", - "# Check core SDK version number\n", - "print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n", - "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n", - "print(\"\")\n", - "\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## Data\n", - "We will use the diabetes dataset for this experiement, a well-known small dataset that comes with scikit-learn. This cell loads the dataset and splits it into random training and testing sets.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_diabetes\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.metrics import mean_squared_error\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.externals import joblib\n", - "\n", - "X, y = load_diabetes(return_X_y = True)\n", - "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", - "data = {\n", - " \"train\":{\"X\": X_train, \"y\": y_train}, \n", - " \"test\":{\"X\": X_test, \"y\": y_test}\n", - "}\n", - "\n", - "print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "## Train\n", - "\n", - "Let's use scikit-learn to train a simple Ridge regression model. We use AML to record interesting information about the model in an Experiment. An Experiment contains a series of trials called Runs. During this trial we use AML in the following way:\n", - "* We access an experiment from our AML workspace by name, which will be created if it doesn't exist\n", - "* We use `start_logging` to create a new run in this experiment\n", - "* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n", - "* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n", - "* We use `run.complete()` to indicate that the run is over and results can be captured and finalized" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "local run", - "outputs upload" - ] - }, - "outputs": [], - "source": [ - "# Get an experiment object from Azure Machine Learning\n", - "experiment = Experiment(workspace=ws, name=\"train-within-notebook\")\n", - "\n", - "# Create a run object in the experiment\n", - "run = experiment.start_logging()\n", - "# Log the algorithm parameter alpha to the run\n", - "run.log('alpha', 0.03)\n", - "\n", - "# Create, fit, and test the scikit-learn Ridge regression model\n", - "regression_model = Ridge(alpha=0.03)\n", - "regression_model.fit(data['train']['X'], data['train']['y'])\n", - "preds = regression_model.predict(data['test']['X'])\n", - "\n", - "# Output the Mean Squared Error to the notebook and to the run\n", - "print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n", - "run.log('mse', mean_squared_error(data['test']['y'], preds))\n", - "\n", - "# Save the model to the outputs directory for capture\n", - "model_file_name = 'outputs/model.pkl'\n", - "\n", - "joblib.dump(value = regression_model, filename = model_file_name)\n", - "\n", - "# upload the model file explicitly into artifacts \n", - "run.upload_file(name = model_file_name, path_or_stream = model_file_name)\n", - "\n", - "# Complete the run\n", - "run.complete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Viewing run results\n", - "Azure Machine Learning stores all the details about the run in the Azure cloud. Let's access those details by retrieving a link to the run using the default run output. Clicking on the resulting link will take you to an interactive page presenting all run information." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Simple parameter sweep\n", - "Now let's take the same concept from above and modify the **alpha** parameter. For each value of alpha we will create a run that will store metrics and the resulting model. In the end we can use the captured run history to determine which model was the best for us to deploy. \n", - "\n", - "Note that by using `with experiment.start_logging() as run` AML will automatically call `run.complete()` at the end of each loop.\n", - "\n", - "This example also uses the **tqdm** library to provide a thermometer feedback" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from tqdm import tqdm\n", - "\n", - "# list of numbers from 0 to 1.0 with a 0.05 interval\n", - "alphas = np.arange(0.0, 1.0, 0.05)\n", - "\n", - "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", - "for alpha in tqdm(alphas):\n", - " # create a bunch of runs, each train a model with a different alpha value\n", - " with experiment.start_logging() as run:\n", - " # Use Ridge algorithm to build a regression model\n", - " regression_model = Ridge(alpha=alpha)\n", - " regression_model.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n", - " preds = regression_model.predict(X=data[\"test\"][\"X\"])\n", - " mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n", - "\n", - " # log alpha, mean_squared_error and feature names in run history\n", - " run.log(name=\"alpha\", value=alpha)\n", - " run.log(name=\"mse\", value=mse)\n", - "\n", - " # Save the model to the outputs directory for capture\n", - " joblib.dump(value=regression_model, filename='outputs/model.pkl')\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Viewing experiment results\n", - "Similar to viewing the run, we can also view the entire experiment. The experiment report view in the Azure portal lets us view all the runs in a table, and also allows us to customize charts. This way, we can see how the alpha parameter impacts the quality of the model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# now let's take a look at the experiment in Azure portal.\n", - "experiment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Select the best model \n", - "Now that we've created many runs with different parameters, we need to determine which model is the best for deployment. For this, we will iterate over the set of runs. From each run we will take the *run id* using the `id` property, and examine the metrics by calling `run.get_metrics()`. \n", - "\n", - "Since each run may be different, we do need to check if the run has the metric that we are looking for, in this case, **mse**. To find the best run, we create a dictionary mapping the run id's to the metrics.\n", - "\n", - "Finally, we use the `tag` method to mark the best run to make it easier to find later. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "runs = {}\n", - "run_metrics = {}\n", - "\n", - "# Create dictionaries containing the runs and the metrics for all runs containing the 'mse' metric\n", - "for r in tqdm(experiment.get_runs()):\n", - " metrics = r.get_metrics()\n", - " if 'mse' in metrics.keys():\n", - " runs[r.id] = r\n", - " run_metrics[r.id] = metrics\n", - "\n", - "# Find the run with the best (lowest) mean squared error and display the id and metrics\n", - "best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n", - "best_run = runs[best_run_id]\n", - "print('Best run is:', best_run_id)\n", - "print('Metrics:', run_metrics[best_run_id])\n", - "\n", - "# Tag the best run for identification later\n", - "best_run.tag(\"Best Run\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "## Deploy\n", - "Now that we have trained a set of models and identified the run containing the best model, we want to deploy the model for real time inferencing. The process of deploying a model involves\n", - "* registering a model in your workspace\n", - "* creating a scoring file containing init and run methods\n", - "* creating an environment dependency file describing packages necessary for your scoring file\n", - "* creating a docker image containing a properly described environment, your model, and your scoring file\n", - "* deploying that docker image as a web service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register a model\n", - "We have already identified which run contains the \"best model\" by our evaluation criteria. Each run has a file structure associated with it that contains various files collected during the run. Since a run can have many outputs we need to tell AML which file from those outputs represents the model that we want to use for our deployment. We can use the `run.get_file_names()` method to list the files associated with the run, and then use the `run.register_model()` method to place the model in the workspace's model registry.\n", - "\n", - "When using `run.register_model()` we supply a `model_name` that is meaningful for our scenario and the `model_path` of the model relative to the run. In this case, the model path is what is returned from `run.get_file_names()`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history" - ] - }, - "outputs": [], - "source": [ - "# View the files in the run\n", - "for f in best_run.get_file_names():\n", - " print(f)\n", - " \n", - "# Register the model with the workspace\n", - "model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once a model is registered, it is accessible from the list of models on the AML workspace. If you register models with the same name multiple times, AML keeps a version history of those models for you. The `Model.list()` lists all models in a workspace, and can be filtered by name, tags, or model properties. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from history" - ] - }, - "outputs": [], - "source": [ - "# Find all models called \"best_model\" and display their version numbers\n", - "from azureml.core.model import Model\n", - "models = Model.list(ws, name='best_model')\n", - "for m in models:\n", - " print(m.name, m.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a scoring file\n", - "\n", - "Since your model file can essentially be anything you want it to be, you need to supply a scoring script that can load your model and then apply the model to new data. This script is your 'scoring file'. This scoring file is a python program containing, at a minimum, two methods `init()` and `run()`. The `init()` method is called once when your deployment is started so you can load your model and any other required objects. This method uses the `get_model_path` function to locate the registered model inside the docker container. The `run()` method is called interactively when the web service is called with one or more data samples to predict.\n", - "\n", - "The scoring file used for this exercise is [here](score.py). \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Describe your environment\n", - "\n", - "Each modelling process may require a unique set of packages. Therefore we need to create a dependency file providing instructions to AML on how to contstruct a docker image that can support the models and any other objects required for inferencing. In the following cell, we create a environment dependency file, *myenv.yml* that specifies which libraries are needed by the scoring script. You can create this file manually, or use the `CondaDependencies` class to create it for you.\n", - "\n", - "Next we use this environment file to describe the docker container that we need to create in order to deploy our model. This container is created using our environment description and includes our scoring script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "from azureml.core.image import ContainerImage\n", - "\n", - "# Create an empty conda environment and add the scikit-learn package\n", - "env = CondaDependencies()\n", - "env.add_conda_package(\"scikit-learn\")\n", - "\n", - "# Display the environment\n", - "print(env.serialize_to_string())\n", - "\n", - "# Write the environment to disk\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(env.serialize_to_string())\n", - "\n", - "# Create a configuration object indicating how our deployment container needs to be created\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Describe your target compute\n", - "In addition to the container, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n", - "\n", - "For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'sample name': 'AML 101'}, \n", - " description='This is a great example.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy your webservice\n", - "The final step to deploying your webservice is to call `WebService.deploy_from_model()`. This function uses the deployment and image configurations created above to perform the following:\n", - "* Build a docker image\n", - "* Deploy to the docker image to an Azure Container Instance\n", - "* Copy your model files to the Azure Container Instance\n", - "* Call the `init()` function in your scoring file\n", - "* Provide an HTTP endpoint for scoring calls\n", - "\n", - "The `deploy_from_model` method requires the following parameters\n", - "* `workspace` - the workspace containing the service\n", - "* `name` - a unique named used to identify the service in the workspace\n", - "* `models` - an array of models to be deployed into the container\n", - "* `image_config` - a configuration object describing the image environment\n", - "* `deployment_config` - a configuration object describing the compute type\n", - " \n", - "**Note:** The web service creation can take several minutes. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "# Create the webservice using all of the precreated configurations and our best model\n", - "service = Webservice.deploy_from_model(name='my-aci-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=image_config,\n", - " workspace=ws)\n", - "\n", - "# Wait for the service deployment to complete while displaying log output\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "### Test your webservice" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that your web service is runing you can send JSON data directly to the service using the `run` method. This cell pulls the first test sample from the original dataset into JSON and then sends it to the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "service = ws.webservices['my-aci-svc']\n", - "\n", - "# scrape the first row from the test set.\n", - "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", - "\n", - "#score on our service\n", - "service.run(input_data = test_samples)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This cell shows how you can send multiple rows to the webservice at once. It then calculates the residuals - that is, the errors - by subtracting out the actual values from the results. These residuals are used later to show a plotted result." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "# score the entire test set.\n", - "test_samples = json.dumps({'data': X_test.tolist()})\n", - "\n", - "result = service.run(input_data = test_samples)\n", - "residual = result - y_test" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This cell shows how you can use the `service.scoring_uri` property to access the HTTP endpoint of the service and call it using standard POST operations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "# use the first row from the test set again\n", - "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", - "\n", - "# create the required header\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "# post the request to the service and display the result\n", - "resp = requests.post(service.scoring_uri, test_samples, headers = headers)\n", - "print(resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Residual graph\n", - "One way to understand the behavior of your model is to see how the data performs against data with known results. This cell uses matplotlib to create a histogram of the residual values, or errors, created from scoring the test samples.\n", - "\n", - "A good model should have residual values that cluster around 0 - that is, no error. Observing the resulting histogram can also show you if the model is skewed in any particular direction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", - "\n", - "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n", - "f.suptitle('Residual Values', fontsize = 18)\n", - "\n", - "f.set_figheight(6)\n", - "f.set_figwidth(14)\n", - "\n", - "a0.plot(residual, 'bo', alpha=0.4)\n", - "a0.plot([0,90], [0,0], 'r', lw=2)\n", - "a0.set_ylabel('residue values', fontsize=14)\n", - "a0.set_xlabel('test data set', fontsize=14)\n", - "\n", - "a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step')\n", - "a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10)\n", - "a1.set_yticklabels([])\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Clean up" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Delete the ACI instance to stop the compute and any associated billing." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Next Steps" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this example, you created a series of models inside the notebook using local data, stored them inside an AML experiment, found the best one and deployed it as a live service! From here you can continue to use Azure Machine Learning in this regard to run your own experiments and deploy your own models, or you can expand into further capabilities of AML!\n", - "\n", - "If you have a model that is difficult to process locally, either because the data is remote or the model is large, try the [train-on-remote-vm](../train-on-remote-vm) notebook to learn about submitting remote jobs.\n", - "\n", - "If you want to take advantage of multiple cloud machines to perform large parameter sweeps try the [train-hyperparameter-tune-deploy-with-pytorch](../../training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch\n", - ") sample.\n", - "\n", - "If you want to deploy models to a production cluster try the [production-deploy-to-aks](../../deployment/production-deploy-to-aks\n", - ") notebook." - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-within-notebook/train-within-notebook.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train and deploy a model\n", + "_**Create and deploy a model directly from a notebook**_\n", + "\n", + "---\n", + "---\n", + "\n", + "## Contents\n", + "1. [Introduction](#Introduction)\n", + "1. [Setup](#Setup)\n", + "1. [Data](#Data)\n", + "1. [Train](#Train)\n", + " 1. Viewing run results\n", + " 1. Simple parameter sweep\n", + " 1. Viewing experiment results\n", + " 1. Select the best model\n", + "1. [Deploy](#Deploy)\n", + " 1. Register the model\n", + " 1. Create a scoring file\n", + " 1. Describe your environment\n", + " 1. Descrice your target compute\n", + " 1. Deploy your webservice\n", + " 1. Test your webservice\n", + " 1. Clean up\n", + "1. [Next Steps](#nextsteps)\n", + "\n", + "---\n", + "\n", + "## Introduction\n", + "Azure Machine Learning provides capabilities to control all aspects of model training and deployment directly from a notebook using the AML Python SDK. In this notebook we will\n", + "* connect to our AML Workspace\n", + "* create an experiment that contains multiple runs with tracked metrics\n", + "* choose the best model created across all runs\n", + "* deploy that model as a service\n", + "\n", + "In the end we will have a model deployed as a web service which we can call from an HTTP endpoint" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Setup\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. From the configuration, the important sections are the workspace configuration and ACI regristration.\n", + "\n", + "We will also need the following libraries install to our conda environment. If these are not installed, use the following command to do so and restart the notebook.\n", + "```shell\n", + "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", + "```\n", + "\n", + "For this notebook we need the Azure ML SDK and access to our workspace. The following cell imports the SDK, checks the version, and accesses our already configured AzureML workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "install" + ] + }, + "outputs": [], + "source": [ + "import azureml.core\n", + "from azureml.core import Experiment, Workspace\n", + "\n", + "# Check core SDK version number\n", + "print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n", + "print(\"\")\n", + "\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Data\n", + "We will use the diabetes dataset for this experiement, a well-known small dataset that comes with scikit-learn. This cell loads the dataset and splits it into random training and testing sets.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.externals import joblib\n", + "\n", + "X, y = load_diabetes(return_X_y = True)\n", + "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", + "data = {\n", + " \"train\":{\"X\": X_train, \"y\": y_train}, \n", + " \"test\":{\"X\": X_test, \"y\": y_test}\n", + "}\n", + "\n", + "print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Train\n", + "\n", + "Let's use scikit-learn to train a simple Ridge regression model. We use AML to record interesting information about the model in an Experiment. An Experiment contains a series of trials called Runs. During this trial we use AML in the following way:\n", + "* We access an experiment from our AML workspace by name, which will be created if it doesn't exist\n", + "* We use `start_logging` to create a new run in this experiment\n", + "* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n", + "* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n", + "* We use `run.complete()` to indicate that the run is over and results can be captured and finalized" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "local run", + "outputs upload" + ] + }, + "outputs": [], + "source": [ + "# Get an experiment object from Azure Machine Learning\n", + "experiment = Experiment(workspace=ws, name=\"train-within-notebook\")\n", + "\n", + "# Create a run object in the experiment\n", + "run = experiment.start_logging()\n", + "# Log the algorithm parameter alpha to the run\n", + "run.log('alpha', 0.03)\n", + "\n", + "# Create, fit, and test the scikit-learn Ridge regression model\n", + "regression_model = Ridge(alpha=0.03)\n", + "regression_model.fit(data['train']['X'], data['train']['y'])\n", + "preds = regression_model.predict(data['test']['X'])\n", + "\n", + "# Output the Mean Squared Error to the notebook and to the run\n", + "print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n", + "run.log('mse', mean_squared_error(data['test']['y'], preds))\n", + "\n", + "# Save the model to the outputs directory for capture\n", + "model_file_name = 'outputs/model.pkl'\n", + "\n", + "joblib.dump(value = regression_model, filename = model_file_name)\n", + "\n", + "# upload the model file explicitly into artifacts \n", + "run.upload_file(name = model_file_name, path_or_stream = model_file_name)\n", + "\n", + "# Complete the run\n", + "run.complete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Viewing run results\n", + "Azure Machine Learning stores all the details about the run in the Azure cloud. Let's access those details by retrieving a link to the run using the default run output. Clicking on the resulting link will take you to an interactive page presenting all run information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Simple parameter sweep\n", + "Now let's take the same concept from above and modify the **alpha** parameter. For each value of alpha we will create a run that will store metrics and the resulting model. In the end we can use the captured run history to determine which model was the best for us to deploy. \n", + "\n", + "Note that by using `with experiment.start_logging() as run` AML will automatically call `run.complete()` at the end of each loop.\n", + "\n", + "This example also uses the **tqdm** library to provide a thermometer feedback" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "from tqdm import tqdm\n", + "\n", + "# list of numbers from 0 to 1.0 with a 0.05 interval\n", + "alphas = np.arange(0.0, 1.0, 0.05)\n", + "\n", + "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", + "for alpha in tqdm(alphas):\n", + " # create a bunch of runs, each train a model with a different alpha value\n", + " with experiment.start_logging() as run:\n", + " # Use Ridge algorithm to build a regression model\n", + " regression_model = Ridge(alpha=alpha)\n", + " regression_model.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n", + " preds = regression_model.predict(X=data[\"test\"][\"X\"])\n", + " mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n", + "\n", + " # log alpha, mean_squared_error and feature names in run history\n", + " run.log(name=\"alpha\", value=alpha)\n", + " run.log(name=\"mse\", value=mse)\n", + "\n", + " # Save the model to the outputs directory for capture\n", + " joblib.dump(value=regression_model, filename='outputs/model.pkl')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Viewing experiment results\n", + "Similar to viewing the run, we can also view the entire experiment. The experiment report view in the Azure portal lets us view all the runs in a table, and also allows us to customize charts. This way, we can see how the alpha parameter impacts the quality of the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# now let's take a look at the experiment in Azure portal.\n", + "experiment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Select the best model \n", + "Now that we've created many runs with different parameters, we need to determine which model is the best for deployment. For this, we will iterate over the set of runs. From each run we will take the *run id* using the `id` property, and examine the metrics by calling `run.get_metrics()`. \n", + "\n", + "Since each run may be different, we do need to check if the run has the metric that we are looking for, in this case, **mse**. To find the best run, we create a dictionary mapping the run id's to the metrics.\n", + "\n", + "Finally, we use the `tag` method to mark the best run to make it easier to find later. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "runs = {}\n", + "run_metrics = {}\n", + "\n", + "# Create dictionaries containing the runs and the metrics for all runs containing the 'mse' metric\n", + "for r in tqdm(experiment.get_runs()):\n", + " metrics = r.get_metrics()\n", + " if 'mse' in metrics.keys():\n", + " runs[r.id] = r\n", + " run_metrics[r.id] = metrics\n", + "\n", + "# Find the run with the best (lowest) mean squared error and display the id and metrics\n", + "best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n", + "best_run = runs[best_run_id]\n", + "print('Best run is:', best_run_id)\n", + "print('Metrics:', run_metrics[best_run_id])\n", + "\n", + "# Tag the best run for identification later\n", + "best_run.tag(\"Best Run\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Deploy\n", + "Now that we have trained a set of models and identified the run containing the best model, we want to deploy the model for real time inferencing. The process of deploying a model involves\n", + "* registering a model in your workspace\n", + "* creating a scoring file containing init and run methods\n", + "* creating an environment dependency file describing packages necessary for your scoring file\n", + "* creating a docker image containing a properly described environment, your model, and your scoring file\n", + "* deploying that docker image as a web service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register a model\n", + "We have already identified which run contains the \"best model\" by our evaluation criteria. Each run has a file structure associated with it that contains various files collected during the run. Since a run can have many outputs we need to tell AML which file from those outputs represents the model that we want to use for our deployment. We can use the `run.get_file_names()` method to list the files associated with the run, and then use the `run.register_model()` method to place the model in the workspace's model registry.\n", + "\n", + "When using `run.register_model()` we supply a `model_name` that is meaningful for our scenario and the `model_path` of the model relative to the run. In this case, the model path is what is returned from `run.get_file_names()`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "# View the files in the run\n", + "for f in best_run.get_file_names():\n", + " print(f)\n", + " \n", + "# Register the model with the workspace\n", + "model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once a model is registered, it is accessible from the list of models on the AML workspace. If you register models with the same name multiple times, AML keeps a version history of those models for you. The `Model.list()` lists all models in a workspace, and can be filtered by name, tags, or model properties. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from history" + ] + }, + "outputs": [], + "source": [ + "# Find all models called \"best_model\" and display their version numbers\n", + "from azureml.core.model import Model\n", + "models = Model.list(ws, name='best_model')\n", + "for m in models:\n", + " print(m.name, m.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a scoring file\n", + "\n", + "Since your model file can essentially be anything you want it to be, you need to supply a scoring script that can load your model and then apply the model to new data. This script is your 'scoring file'. This scoring file is a python program containing, at a minimum, two methods `init()` and `run()`. The `init()` method is called once when your deployment is started so you can load your model and any other required objects. This method uses the `get_model_path` function to locate the registered model inside the docker container. The `run()` method is called interactively when the web service is called with one or more data samples to predict.\n", + "\n", + "The scoring file used for this exercise is [here](score.py). \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Describe your environment\n", + "\n", + "Each modelling process may require a unique set of packages. Therefore we need to create a dependency file providing instructions to AML on how to contstruct a docker image that can support the models and any other objects required for inferencing. In the following cell, we create a environment dependency file, *myenv.yml* that specifies which libraries are needed by the scoring script. You can create this file manually, or use the `CondaDependencies` class to create it for you.\n", + "\n", + "Next we use this environment file to describe the docker container that we need to create in order to deploy our model. This container is created using our environment description and includes our scoring script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "from azureml.core.image import ContainerImage\n", + "\n", + "# Create an empty conda environment and add the scikit-learn package\n", + "env = CondaDependencies()\n", + "env.add_conda_package(\"scikit-learn\")\n", + "\n", + "# Display the environment\n", + "print(env.serialize_to_string())\n", + "\n", + "# Write the environment to disk\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(env.serialize_to_string())\n", + "\n", + "# Create a configuration object indicating how our deployment container needs to be created\n", + "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Describe your target compute\n", + "In addition to the container, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n", + "\n", + "For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'sample name': 'AML 101'}, \n", + " description='This is a great example.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy your webservice\n", + "The final step to deploying your webservice is to call `WebService.deploy_from_model()`. This function uses the deployment and image configurations created above to perform the following:\n", + "* Build a docker image\n", + "* Deploy to the docker image to an Azure Container Instance\n", + "* Copy your model files to the Azure Container Instance\n", + "* Call the `init()` function in your scoring file\n", + "* Provide an HTTP endpoint for scoring calls\n", + "\n", + "The `deploy_from_model` method requires the following parameters\n", + "* `workspace` - the workspace containing the service\n", + "* `name` - a unique named used to identify the service in the workspace\n", + "* `models` - an array of models to be deployed into the container\n", + "* `image_config` - a configuration object describing the image environment\n", + "* `deployment_config` - a configuration object describing the compute type\n", + " \n", + "**Note:** The web service creation can take several minutes. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "\n", + "# Create the webservice using all of the precreated configurations and our best model\n", + "service = Webservice.deploy_from_model(name='my-aci-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=image_config,\n", + " workspace=ws)\n", + "\n", + "# Wait for the service deployment to complete while displaying log output\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Test your webservice" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that your web service is runing you can send JSON data directly to the service using the `run` method. This cell pulls the first test sample from the original dataset into JSON and then sends it to the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "service = ws.webservices['my-aci-svc']\n", + "\n", + "# scrape the first row from the test set.\n", + "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", + "\n", + "#score on our service\n", + "service.run(input_data = test_samples)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This cell shows how you can send multiple rows to the webservice at once. It then calculates the residuals - that is, the errors - by subtracting out the actual values from the results. These residuals are used later to show a plotted result." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "# score the entire test set.\n", + "test_samples = json.dumps({'data': X_test.tolist()})\n", + "\n", + "result = service.run(input_data = test_samples)\n", + "residual = result - y_test" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This cell shows how you can use the `service.scoring_uri` property to access the HTTP endpoint of the service and call it using standard POST operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import requests\n", + "\n", + "# use the first row from the test set again\n", + "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", + "\n", + "# create the required header\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "# post the request to the service and display the result\n", + "resp = requests.post(service.scoring_uri, test_samples, headers = headers)\n", + "print(resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Residual graph\n", + "One way to understand the behavior of your model is to see how the data performs against data with known results. This cell uses matplotlib to create a histogram of the residual values, or errors, created from scoring the test samples.\n", + "\n", + "A good model should have residual values that cluster around 0 - that is, no error. Observing the resulting histogram can also show you if the model is skewed in any particular direction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "\n", + "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n", + "f.suptitle('Residual Values', fontsize = 18)\n", + "\n", + "f.set_figheight(6)\n", + "f.set_figwidth(14)\n", + "\n", + "a0.plot(residual, 'bo', alpha=0.4)\n", + "a0.plot([0,90], [0,0], 'r', lw=2)\n", + "a0.set_ylabel('residue values', fontsize=14)\n", + "a0.set_xlabel('test data set', fontsize=14)\n", + "\n", + "a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step')\n", + "a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10)\n", + "a1.set_yticklabels([])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Clean up" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Delete the ACI instance to stop the compute and any associated billing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Next Steps" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, you created a series of models inside the notebook using local data, stored them inside an AML experiment, found the best one and deployed it as a live service! From here you can continue to use Azure Machine Learning in this regard to run your own experiments and deploy your own models, or you can expand into further capabilities of AML!\n", + "\n", + "If you have a model that is difficult to process locally, either because the data is remote or the model is large, try the [train-on-remote-vm](../train-on-remote-vm) notebook to learn about submitting remote jobs.\n", + "\n", + "If you want to take advantage of multiple cloud machines to perform large parameter sweeps try the [train-hyperparameter-tune-deploy-with-pytorch](../../training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch\n", + ") sample.\n", + "\n", + "If you want to deploy models to a production cluster try the [production-deploy-to-aks](../../deployment/production-deploy-to-aks\n", + ") notebook." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/how-to-use-azureml/training/using-environments/using-environments.ipynb b/how-to-use-azureml/training/using-environments/using-environments.ipynb index 03691c36..f40e5c9a 100644 --- a/how-to-use-azureml/training/using-environments/using-environments.ipynb +++ b/how-to-use-azureml/training/using-environments/using-environments.ipynb @@ -1,372 +1,372 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/using-environments/using-environments.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Using environments\n", - "\n", - "\n", - "## Contents\n", - "\n", - "1. [Introduction](#Introduction)\n", - "1. [Setup](#Setup)\n", - "1. [Create environment](#Create-environment)\n", - " 1. Add Python packages\n", - " 1. Specify environment variables\n", - "1. [Submit run using environment](#Submit-run-using-environment)\n", - "1. [Register environment](#Register-environment)\n", - "1. [List and get existing environments](#List-and-get-existing-environments)\n", - "1. [Other ways to create environments](#Other-ways-to-create-environments)\n", - " 1. From existing Conda environment\n", - " 1. From Conda or pip files\n", - "1. [Docker settings](#Docker-settings)\n", - "1. [Spark and Azure Databricks settings](#Spark-and-Azure-Databricks-settings)\n", - "1. [Next steps](#Next-steps)\n", - "\n", - "## Introduction\n", - "\n", - "Azure ML environments are an encapsulation of the environment where your machine learning training happens. They define Python packages, environment variables, Docker settings and other attributes in declarative fashion. Environments are versioned: you can update them and retrieve old versions to revist and review your work.\n", - "\n", - "Environments allow you to:\n", - "* Encapsulate dependencies of your training process, such as Python packages and their versions.\n", - "* Reproduce the Python environment on your local computer in a remote run on VM or ML Compute cluster\n", - "* Reproduce your experimentation environment in production setting.\n", - "* Revisit and audit the environment in which an existing model was trained.\n", - "\n", - "Environment, compute target and training script together form run configuration: the full specification of training run.\n", - "\n", - "## Setup\n", - "\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't.\n", - "\n", - "First, let's validate Azure ML SDK version and connect to workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "ws = Workspace.from_config()\n", - "ws.get_details()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create environment\n", - "\n", - "You can create an environment by instantiating ```Environment``` object and then setting its attributes: set of Python packages, environment variables and others.\n", - "\n", - "### Add Python packages\n", - "\n", - "The recommended way is to specify Conda packages, as they typically come with complete set of pre-built binaries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Environment\n", - "from azureml.core.environment import CondaDependencies\n", - "\n", - "myenv = Environment(name=\"myenv\")\n", - "conda_dep = CondaDependencies()\n", - "conda_dep.add_conda_package(\"scikit-learn\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also add pip packages, and specify the version of package" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "conda_dep.add_pip_package(\"pillow==5.4.1\")\n", - "myenv.python.conda_dependencies=conda_dep" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify environment variables\n", - "\n", - "You can add environment variables to your environment. These then become available using ```os.environ.get``` in your training script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "myenv.environment_variables = {\"MESSAGE\":\"Hello from Azure Machine Learning\"}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using environment\n", - "\n", - "When you submit a run, you can specify which environment to use. \n", - "\n", - "On the first run in given environment, Azure ML spends some time building the environment. On the subsequent runs, Azure ML keeps track of changes and uses the existing environment, resulting in faster run completion." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig, Experiment\n", - "\n", - "myexp = Experiment(workspace=ws, name = \"environment-example\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To submit a run, create a run configuration that combines the script file and environment, and pass it to ```Experiment.submit```. In this example, the script is submitted to local computer, but you can specify other compute targets such as remote clusters as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "runconfig = ScriptRunConfig(source_directory=\".\", script=\"example.py\")\n", - "runconfig.run_config.target = \"local\"\n", - "runconfig.run_config.environment = myenv\n", - "run = myexp.submit(config=runconfig)\n", - "\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register environment\n", - "\n", - "You can manage environments by registering them. This allows you to track their versions, and reuse them in future runs. For example, once you've constructed an environment that meets your requirements, you can register it and use it in other experiments so as to standardize your workflow.\n", - "\n", - "If you register the environment with same name, the version number is increased by one. Note that Azure ML keeps track of differences between the version, so if you re-register an identical version, the version number is not increased." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "myenv.register(workspace=ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List and get existing environments\n", - "\n", - "Your workspace contains a dictionary of registered environments. You can then use ```Environment.get``` to retrieve a specific environment with specific version." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for name,env in ws.environments.items():\n", - " print(\"Name {} \\t version {}\".format(name,env.version))\n", - "\n", - "restored_environment = Environment.get(workspace=ws,name=\"myenv\",version=\"1\")\n", - "\n", - "print(\"Attributes of restored environment\")\n", - "restored_environment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Other ways to create environments\n", - "\n", - "### From existing Conda environment\n", - "\n", - "You can create an environment from existing conda environment. This make it easy to reuse your local interactive environment in Azure ML remote runs. For example, if you've created conda environment using\n", - "```\n", - "conda create -n mycondaenv\n", - "```\n", - "you can create Azure ML environment out of that conda environment using\n", - "```\n", - "myenv = Environment.from_existing_conda_environment(name=\"myenv\",conda_environment_name=\"mycondaenv\")\n", - "```\n", - "\n", - "### From conda or pip files\n", - "\n", - "You can create environments from conda specification or pip requirements files using\n", - "```\n", - "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"path-to-conda-specification-file\")\n", - "\n", - "myenv = Environment.from_pip_requirements(name=\"myenv\", file_path=\"path-to-pip-requirements-file\")\n", - "```\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Docker settings\n", - "\n", - "Docker container provides an efficient way to encapsulate the dependencies. When you enable Docker, Azure ML builds a Docker image and creates a Python environment within that container, given your specifications. The Docker images are reused: the first run in a new environment typically takes longer as the image is build.\n", - "\n", - "**Note:** For runs on local computer or attached virtual machine, that computer must have Docker installed and enabled. Machine Learning Compute has Docker pre-installed.\n", - "\n", - "Attribute ```docker.enabled``` controls whether to use Docker container or host OS for execution. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "myenv.docker.enabled = True" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can specify custom Docker base image and registry. This allows you to customize and control in detail the guest OS in which your training run executes. whether to use GPU, whether to use shared volumes, and shm size." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "myenv.docker.base_image\n", - "myenv.docker.base_image_registry" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also specify whether to use GPU or shared volumes, and shm size." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "myenv.docker.gpu_support\n", - "myenv.docker.shared_volumes\n", - "myenv.docker.shm_size" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Spark and Azure Databricks settings\n", - "\n", - "In addition to Python and Docker settings, Environment also contains attributes for Spark and Azure Databricks runs. These attributes become relevant when you submit runs on those compute targets." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps\n", - "\n", - "Learn more about remote runs on different compute targets:\n", - "\n", - "* [Train on ML Compute](../../train-on-amlcompute)\n", - "\n", - "* [Train on remote VM](../../train-on-remote-vm)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License" + ] }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/using-environments/using-environments.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using environments\n", + "\n", + "\n", + "## Contents\n", + "\n", + "1. [Introduction](#Introduction)\n", + "1. [Setup](#Setup)\n", + "1. [Create environment](#Create-environment)\n", + " 1. Add Python packages\n", + " 1. Specify environment variables\n", + "1. [Submit run using environment](#Submit-run-using-environment)\n", + "1. [Register environment](#Register-environment)\n", + "1. [List and get existing environments](#List-and-get-existing-environments)\n", + "1. [Other ways to create environments](#Other-ways-to-create-environments)\n", + " 1. From existing Conda environment\n", + " 1. From Conda or pip files\n", + "1. [Docker settings](#Docker-settings)\n", + "1. [Spark and Azure Databricks settings](#Spark-and-Azure-Databricks-settings)\n", + "1. [Next steps](#Next-steps)\n", + "\n", + "## Introduction\n", + "\n", + "Azure ML environments are an encapsulation of the environment where your machine learning training happens. They define Python packages, environment variables, Docker settings and other attributes in declarative fashion. Environments are versioned: you can update them and retrieve old versions to revist and review your work.\n", + "\n", + "Environments allow you to:\n", + "* Encapsulate dependencies of your training process, such as Python packages and their versions.\n", + "* Reproduce the Python environment on your local computer in a remote run on VM or ML Compute cluster\n", + "* Reproduce your experimentation environment in production setting.\n", + "* Revisit and audit the environment in which an existing model was trained.\n", + "\n", + "Environment, compute target and training script together form run configuration: the full specification of training run.\n", + "\n", + "## Setup\n", + "\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't.\n", + "\n", + "First, let's validate Azure ML SDK version and connect to workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "ws = Workspace.from_config()\n", + "ws.get_details()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create environment\n", + "\n", + "You can create an environment by instantiating ```Environment``` object and then setting its attributes: set of Python packages, environment variables and others.\n", + "\n", + "### Add Python packages\n", + "\n", + "The recommended way is to specify Conda packages, as they typically come with complete set of pre-built binaries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "from azureml.core.environment import CondaDependencies\n", + "\n", + "myenv = Environment(name=\"myenv\")\n", + "conda_dep = CondaDependencies()\n", + "conda_dep.add_conda_package(\"scikit-learn\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also add pip packages, and specify the version of package" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "conda_dep.add_pip_package(\"pillow==5.4.1\")\n", + "myenv.python.conda_dependencies=conda_dep" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify environment variables\n", + "\n", + "You can add environment variables to your environment. These then become available using ```os.environ.get``` in your training script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "myenv.environment_variables = {\"MESSAGE\":\"Hello from Azure Machine Learning\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit run using environment\n", + "\n", + "When you submit a run, you can specify which environment to use. \n", + "\n", + "On the first run in given environment, Azure ML spends some time building the environment. On the subsequent runs, Azure ML keeps track of changes and uses the existing environment, resulting in faster run completion." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig, Experiment\n", + "\n", + "myexp = Experiment(workspace=ws, name = \"environment-example\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To submit a run, create a run configuration that combines the script file and environment, and pass it to ```Experiment.submit```. In this example, the script is submitted to local computer, but you can specify other compute targets such as remote clusters as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "runconfig = ScriptRunConfig(source_directory=\".\", script=\"example.py\")\n", + "runconfig.run_config.target = \"local\"\n", + "runconfig.run_config.environment = myenv\n", + "run = myexp.submit(config=runconfig)\n", + "\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register environment\n", + "\n", + "You can manage environments by registering them. This allows you to track their versions, and reuse them in future runs. For example, once you've constructed an environment that meets your requirements, you can register it and use it in other experiments so as to standardize your workflow.\n", + "\n", + "If you register the environment with same name, the version number is increased by one. Note that Azure ML keeps track of differences between the version, so if you re-register an identical version, the version number is not increased." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "myenv.register(workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List and get existing environments\n", + "\n", + "Your workspace contains a dictionary of registered environments. You can then use ```Environment.get``` to retrieve a specific environment with specific version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for name,env in ws.environments.items():\n", + " print(\"Name {} \\t version {}\".format(name,env.version))\n", + "\n", + "restored_environment = Environment.get(workspace=ws,name=\"myenv\",version=\"1\")\n", + "\n", + "print(\"Attributes of restored environment\")\n", + "restored_environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Other ways to create environments\n", + "\n", + "### From existing Conda environment\n", + "\n", + "You can create an environment from existing conda environment. This make it easy to reuse your local interactive environment in Azure ML remote runs. For example, if you've created conda environment using\n", + "```\n", + "conda create -n mycondaenv\n", + "```\n", + "you can create Azure ML environment out of that conda environment using\n", + "```\n", + "myenv = Environment.from_existing_conda_environment(name=\"myenv\",conda_environment_name=\"mycondaenv\")\n", + "```\n", + "\n", + "### From conda or pip files\n", + "\n", + "You can create environments from conda specification or pip requirements files using\n", + "```\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"path-to-conda-specification-file\")\n", + "\n", + "myenv = Environment.from_pip_requirements(name=\"myenv\", file_path=\"path-to-pip-requirements-file\")\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Docker settings\n", + "\n", + "Docker container provides an efficient way to encapsulate the dependencies. When you enable Docker, Azure ML builds a Docker image and creates a Python environment within that container, given your specifications. The Docker images are reused: the first run in a new environment typically takes longer as the image is build.\n", + "\n", + "**Note:** For runs on local computer or attached virtual machine, that computer must have Docker installed and enabled. Machine Learning Compute has Docker pre-installed.\n", + "\n", + "Attribute ```docker.enabled``` controls whether to use Docker container or host OS for execution. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "myenv.docker.enabled = True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can specify custom Docker base image and registry. This allows you to customize and control in detail the guest OS in which your training run executes. whether to use GPU, whether to use shared volumes, and shm size." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "myenv.docker.base_image\n", + "myenv.docker.base_image_registry" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also specify whether to use GPU or shared volumes, and shm size." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "myenv.docker.gpu_support\n", + "myenv.docker.shared_volumes\n", + "myenv.docker.shm_size" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Spark and Azure Databricks settings\n", + "\n", + "In addition to Python and Docker settings, Environment also contains attributes for Spark and Azure Databricks runs. These attributes become relevant when you submit runs on those compute targets." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "Learn more about remote runs on different compute targets:\n", + "\n", + "* [Train on ML Compute](../../train-on-amlcompute)\n", + "\n", + "* [Train on remote VM](../../train-on-remote-vm)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}