mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
866 lines
33 KiB
Plaintext
866 lines
33 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Azure Machine Learning Pipeline with HyperDriveStep\n",
|
|
"\n",
|
|
"\n",
|
|
"This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites and Azure Machine Learning Basics\n",
|
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
|
"\n",
|
|
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"from azureml.core import Workspace, Experiment\n",
|
|
"from azureml.core.datastore import Datastore\n",
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.exceptions import ComputeTargetException\n",
|
|
"from azureml.data.data_reference import DataReference\n",
|
|
"from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun\n",
|
|
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
|
"from azureml.train.dnn import TensorFlow\n",
|
|
"# from azureml.train.hyperdrive import *\n",
|
|
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
|
|
"from azureml.train.hyperdrive import choice, loguniform\n",
|
|
"\n",
|
|
"import os\n",
|
|
"import shutil\n",
|
|
"import urllib\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"# Check core SDK version number\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize workspace\n",
|
|
"\n",
|
|
"Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ws = Workspace.from_config()\n",
|
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create an Azure ML experiment\n",
|
|
"Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. \n",
|
|
"\n",
|
|
"> The best practice is to use separate folders for scripts and its dependent files for each step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step. \n",
|
|
"\n",
|
|
"> The script runs will be recorded under the experiment in Azure."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"script_folder = './tf-mnist'\n",
|
|
"os.makedirs(script_folder, exist_ok=True)\n",
|
|
"\n",
|
|
"exp = Experiment(workspace=ws, name='Hyperdrive_sample')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Download MNIST dataset\n",
|
|
"In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"os.makedirs('./data/mnist', exist_ok=True)\n",
|
|
"\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Show some sample images\n",
|
|
"Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from utils import load_data\n",
|
|
"\n",
|
|
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n",
|
|
"X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n",
|
|
"y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n",
|
|
"\n",
|
|
"X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n",
|
|
"y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n",
|
|
"\n",
|
|
"count = 0\n",
|
|
"sample_size = 30\n",
|
|
"plt.figure(figsize = (16, 6))\n",
|
|
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
|
" count = count + 1\n",
|
|
" plt.subplot(1, sample_size, count)\n",
|
|
" plt.axhline('')\n",
|
|
" plt.axvline('')\n",
|
|
" plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n",
|
|
" plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Upload MNIST dataset to blob datastore \n",
|
|
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ds = ws.get_default_datastore()\n",
|
|
"ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Retrieve or create a Azure Machine Learning compute\n",
|
|
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
|
|
"\n",
|
|
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
|
|
"\n",
|
|
"1. Create the configuration\n",
|
|
"2. Create the Azure Machine Learning compute\n",
|
|
"\n",
|
|
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"cluster_name = \"gpu-cluster\"\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target {}.'.format(cluster_name))\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
|
|
" max_nodes=4)\n",
|
|
"\n",
|
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
" compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
|
|
"\n",
|
|
"print(\"Azure Machine Learning Compute attached\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Copy the training files into the script folder\n",
|
|
"The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# the training logic is in the tf_mnist.py file.\n",
|
|
"shutil.copy('./tf_mnist.py', script_folder)\n",
|
|
"\n",
|
|
"# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n",
|
|
"shutil.copy('./utils.py', script_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create TensorFlow estimator\n",
|
|
"Next, we construct an [TensorFlow](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) estimator object.\n",
|
|
"The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n",
|
|
"\n",
|
|
"The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release.\n",
|
|
"\n",
|
|
"The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"est = TensorFlow(source_directory=script_folder, \n",
|
|
" compute_target=compute_target,\n",
|
|
" entry_script='tf_mnist.py', \n",
|
|
" use_gpu=True,\n",
|
|
" framework_version='1.13')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Intelligent hyperparameter tuning\n",
|
|
"Now let's try hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.\n",
|
|
"\n",
|
|
"In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`validation_acc`)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ps = RandomParameterSampling(\n",
|
|
" {\n",
|
|
" '--batch-size': choice(25, 50, 100),\n",
|
|
" '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n",
|
|
" '--second-layer-neurons': choice(10, 50, 200, 500),\n",
|
|
" '--learning-rate': loguniform(-6, -1)\n",
|
|
" }\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric.\n",
|
|
"\n",
|
|
"Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"hd_config = HyperDriveConfig(estimator=est, \n",
|
|
" hyperparameter_sampling=ps,\n",
|
|
" policy=early_termination_policy,\n",
|
|
" primary_metric_name='validation_acc', \n",
|
|
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
|
|
" max_total_runs=10,\n",
|
|
" max_concurrent_runs=4)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Add HyperDrive as a step of pipeline\n",
|
|
"\n",
|
|
"### Setup an input for the hypderdrive step\n",
|
|
"Let's setup a data reference for inputs of hyperdrive step."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"data_folder = DataReference(\n",
|
|
" datastore=ds,\n",
|
|
" data_reference_name=\"mnist_data\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### HyperDriveStep\n",
|
|
"HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n",
|
|
"- **name:** Name of the step\n",
|
|
"- **hyperdrive_config:** A HyperDriveConfig that defines the configuration for this HyperDrive run\n",
|
|
"- **estimator_entry_script_arguments:** List of command-line arguments for estimator entry script\n",
|
|
"- **inputs:** List of input port bindings\n",
|
|
"- **outputs:** List of output port bindings\n",
|
|
"- **metrics_output:** Optional value specifying the location to store HyperDrive run metrics as a JSON file\n",
|
|
"- **allow_reuse:** whether to allow reuse\n",
|
|
"- **version:** version\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"metrics_output_name = 'metrics_output'\n",
|
|
"metirics_data = PipelineData(name='metrics_data',\n",
|
|
" datastore=ds,\n",
|
|
" pipeline_output_name=metrics_output_name)\n",
|
|
"\n",
|
|
"hd_step_name='hd_step01'\n",
|
|
"hd_step = HyperDriveStep(\n",
|
|
" name=hd_step_name,\n",
|
|
" hyperdrive_config=hd_config,\n",
|
|
" estimator_entry_script_arguments=['--data-folder', data_folder],\n",
|
|
" inputs=[data_folder],\n",
|
|
" metrics_output=metirics_data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Run the pipeline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pipeline = Pipeline(workspace=ws, steps=[hd_step])\n",
|
|
"pipeline_run = exp.submit(pipeline)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor using widget"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"RunDetails(pipeline_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Wait for the completion of this Pipeline run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# pipeline_run.wait_for_completion()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Retrieve the metrics\n",
|
|
"Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will show the result metrics."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n",
|
|
"# num_file_downloaded = metrics_output.download('.', show_progress=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# import pandas as pd\n",
|
|
"# import json\n",
|
|
"# with open(metrics_output._path_on_datastore) as f: \n",
|
|
"# metrics_output_result = f.read()\n",
|
|
" \n",
|
|
"# deserialized_metrics_output = json.loads(metrics_output_result)\n",
|
|
"# df = pd.DataFrame(deserialized_metrics_output)\n",
|
|
"# df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Find and register best model\n",
|
|
"When all the jobs finish, we can find out the one that has the highest accuracy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# hd_step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run(hd_step_name)[0])\n",
|
|
"# best_run = hd_step_run.get_best_run_by_primary_metric()\n",
|
|
"# best_run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now let's list the model files uploaded during the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# print(best_run.get_file_names())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy the model in ACI\n",
|
|
"Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n",
|
|
"### Create score.py\n",
|
|
"First, we will create a scoring script that will be invoked by the web service call. \n",
|
|
"\n",
|
|
"* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n",
|
|
" * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n",
|
|
" * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%writefile score.py\n",
|
|
"import json\n",
|
|
"import numpy as np\n",
|
|
"import os\n",
|
|
"import tensorflow as tf\n",
|
|
"\n",
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"def init():\n",
|
|
" global X, output, sess\n",
|
|
" tf.reset_default_graph()\n",
|
|
" model_root = Model.get_model_path('tf-dnn-mnist')\n",
|
|
" saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n",
|
|
" X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n",
|
|
" output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n",
|
|
" \n",
|
|
" sess = tf.Session()\n",
|
|
" saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n",
|
|
"\n",
|
|
"def run(raw_data):\n",
|
|
" data = np.array(json.loads(raw_data)['data'])\n",
|
|
" # make prediction\n",
|
|
" out = output.eval(session=sess, feed_dict={X: data})\n",
|
|
" y_hat = np.argmax(out, axis=1)\n",
|
|
" return y_hat.tolist()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create myenv.yml\n",
|
|
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# from azureml.core.runconfig import CondaDependencies\n",
|
|
"\n",
|
|
"# cd = CondaDependencies.create()\n",
|
|
"# cd.add_conda_package('numpy')\n",
|
|
"# cd.add_tensorflow_conda_package()\n",
|
|
"# cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
|
|
"\n",
|
|
"# print(cd.serialize_to_string())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy to ACI\n",
|
|
"We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# from azureml.core.webservice import AciWebservice\n",
|
|
"\n",
|
|
"# aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
|
"# memory_gb=1, \n",
|
|
"# tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n",
|
|
"# description='Tensorflow DNN on MNIST')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Deployment Process\n",
|
|
"Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n",
|
|
"1. **Register model** \n",
|
|
"Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n",
|
|
"2. **Build Docker image** \n",
|
|
"Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n",
|
|
"3. **Register image** \n",
|
|
"Register that image under the workspace. \n",
|
|
"4. **Ship to ACI** \n",
|
|
"And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# from azureml.core.image import ContainerImage\n",
|
|
"\n",
|
|
"# imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
|
|
"# runtime=\"python\", \n",
|
|
"# conda_file=\"myenv.yml\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# %%time\n",
|
|
"# from azureml.core.webservice import Webservice\n",
|
|
"\n",
|
|
"# service = Webservice.deploy_from_model(workspace=ws,\n",
|
|
"# name='tf-mnist-svc',\n",
|
|
"# deployment_config=aciconfig,\n",
|
|
"# models=[model],\n",
|
|
"# image_config=imgconfig)\n",
|
|
"\n",
|
|
"# service.wait_for_deployment(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# print(service.get_logs())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This is the scoring web service endpoint:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# print(service.scoring_uri)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Test the deployed model\n",
|
|
"Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n",
|
|
"\n",
|
|
"After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# import json\n",
|
|
"\n",
|
|
"# # find 30 random samples from test set\n",
|
|
"# n = 30\n",
|
|
"# sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
|
"\n",
|
|
"# test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
|
|
"# test_samples = bytes(test_samples, encoding='utf8')\n",
|
|
"\n",
|
|
"# # predict using the deployed model\n",
|
|
"# result = service.run(input_data=test_samples)\n",
|
|
"\n",
|
|
"# # compare actual value vs. the predicted values:\n",
|
|
"# i = 0\n",
|
|
"# plt.figure(figsize = (20, 1))\n",
|
|
"\n",
|
|
"# for s in sample_indices:\n",
|
|
"# plt.subplot(1, n, i + 1)\n",
|
|
"# plt.axhline('')\n",
|
|
"# plt.axvline('')\n",
|
|
" \n",
|
|
"# # use different color for misclassified sample\n",
|
|
"# font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
|
"# clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
|
" \n",
|
|
"# plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
|
|
"# plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
|
|
" \n",
|
|
"# i = i + 1\n",
|
|
"# plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can also send raw HTTP request to the service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# import requests\n",
|
|
"\n",
|
|
"# # send a random row from the test set to score\n",
|
|
"# random_index = np.random.randint(0, len(X_test)-1)\n",
|
|
"# input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
|
|
"\n",
|
|
"# headers = {'Content-Type':'application/json'}\n",
|
|
"\n",
|
|
"# resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
|
"\n",
|
|
"# print(\"POST to url\", service.scoring_uri)\n",
|
|
"# print(\"input data:\", input_data)\n",
|
|
"# print(\"label:\", y_test[random_index])\n",
|
|
"# print(\"prediction:\", resp.text)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's look at the workspace after the web service was deployed. You should see \n",
|
|
"* a registered model named 'model' and with the id 'model:1'\n",
|
|
"* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n",
|
|
"* a webservice called 'tf-mnist' with some scoring URL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# models = ws.models\n",
|
|
"# for name, model in models.items():\n",
|
|
"# print(\"Model: {}, ID: {}\".format(name, model.id))\n",
|
|
" \n",
|
|
"# images = ws.images\n",
|
|
"# for name, image in images.items():\n",
|
|
"# print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
|
|
" \n",
|
|
"# webservices = ws.webservices\n",
|
|
"# for name, webservice in webservices.items():\n",
|
|
"# print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Clean up\n",
|
|
"You can delete the ACI deployment with a simple delete API call."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# PUBLISHONLY\n",
|
|
"# service.delete()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "sonnyp"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.7"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |