mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
806 lines
31 KiB
Plaintext
806 lines
31 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Train and hyperparameter tune with Chainer\n",
|
|
"\n",
|
|
"In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites\n",
|
|
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check core SDK version number\n",
|
|
"import azureml.core\n",
|
|
"\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!jupyter nbextension install --py --user azureml.widgets\n",
|
|
"!jupyter nbextension enable --py --user azureml.widgets"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Diagnostics\n",
|
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"Diagnostics"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
|
"\n",
|
|
"set_diagnostics_collection(send_diagnostics=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize workspace\n",
|
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print('Workspace name: ' + ws.name, \n",
|
|
" 'Azure region: ' + ws.location, \n",
|
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create or Attach existing AmlCompute\n",
|
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
|
|
"\n",
|
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
|
|
"\n",
|
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# choose a name for your cluster\n",
|
|
"cluster_name = \"gpu-cluster\"\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target.')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
|
" max_nodes=4)\n",
|
|
"\n",
|
|
" # create the cluster\n",
|
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
"\n",
|
|
"compute_target.wait_for_completion(show_output=True)\n",
|
|
"\n",
|
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
|
"print(compute_target.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train model on the remote compute\n",
|
|
"Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a project directory\n",
|
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"project_folder = './chainer-mnist'\n",
|
|
"os.makedirs(project_folder, exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare training script\n",
|
|
"Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n",
|
|
"\n",
|
|
"However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n",
|
|
"\n",
|
|
"In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n",
|
|
"```Python\n",
|
|
"from azureml.core.run import Run\n",
|
|
"run = Run.get_context()\n",
|
|
"```\n",
|
|
"Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n",
|
|
"```Python\n",
|
|
"run.log('Batch size', np.int(args.batchsize))\n",
|
|
"run.log('Epochs', np.int(args.epochs))\n",
|
|
"\n",
|
|
"run.log('Accuracy', np.float(val_accuracy))\n",
|
|
"```\n",
|
|
"These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Once your script is ready, copy the training script `chainer_mnist.py` into your project directory."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import shutil\n",
|
|
"\n",
|
|
"shutil.copy('chainer_mnist.py', project_folder)\n",
|
|
"shutil.copy('chainer_score.py', project_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create an experiment\n",
|
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'chainer-mnist'\n",
|
|
"experiment = Experiment(ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create an environment\n",
|
|
"\n",
|
|
"Define a conda environment YAML file with your training script dependencies and create an Azure ML environment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%writefile conda_dependencies.yml\n",
|
|
"\n",
|
|
"channels:\n",
|
|
"- conda-forge\n",
|
|
"dependencies:\n",
|
|
"- python=3.6.2\n",
|
|
"- pip:\n",
|
|
" - azureml-defaults\n",
|
|
" - chainer==5.1.0\n",
|
|
" - cupy-cuda90==5.1.0\n",
|
|
" - mpi4py==3.0.0\n",
|
|
" - pytest"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"chainer_env = Environment.from_conda_specification(name = 'chainer-5.1.0-gpu', file_path = './conda_dependencies.yml')\n",
|
|
"\n",
|
|
"# Specify a GPU base image\n",
|
|
"chainer_env.docker.enabled = True\n",
|
|
"chainer_env.docker.base_image = 'mcr.microsoft.com/azureml/intelmpi2018.3-cuda9.0-cudnn7-ubuntu16.04'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Configure your training job\n",
|
|
"\n",
|
|
"Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"\n",
|
|
"src = ScriptRunConfig(source_directory=project_folder,\n",
|
|
" script='chainer_mnist.py',\n",
|
|
" arguments=['--epochs', 10, '--batchsize', 128, '--output_dir', './outputs'],\n",
|
|
" compute_target=compute_target,\n",
|
|
" environment=chainer_env)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Submit job\n",
|
|
"Run your experiment by submitting your ScriptRunConfig object. Note that this call is asynchronous."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run = experiment.submit(src)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor your run\n",
|
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to get more details of your run\n",
|
|
"print(run.get_details())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tune model hyperparameters\n",
|
|
"Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Start a hyperparameter sweep\n",
|
|
"First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n",
|
|
"\n",
|
|
"Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n",
|
|
"Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n",
|
|
"from azureml.train.hyperdrive.sampling import RandomParameterSampling\n",
|
|
"from azureml.train.hyperdrive.policy import BanditPolicy\n",
|
|
"from azureml.train.hyperdrive.run import PrimaryMetricGoal\n",
|
|
"from azureml.train.hyperdrive.parameter_expressions import choice\n",
|
|
" \n",
|
|
"\n",
|
|
"param_sampling = RandomParameterSampling( {\n",
|
|
" \"--batchsize\": choice(128, 256),\n",
|
|
" \"--epochs\": choice(5, 10, 20, 40)\n",
|
|
" }\n",
|
|
")\n",
|
|
"\n",
|
|
"hyperdrive_config = HyperDriveConfig(run_config=src,\n",
|
|
" hyperparameter_sampling=param_sampling, \n",
|
|
" primary_metric_name='Accuracy',\n",
|
|
" policy=BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3),\n",
|
|
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
|
" max_total_runs=8,\n",
|
|
" max_concurrent_runs=4)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, lauch the hyperparameter tuning job."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# start the HyperDrive run\n",
|
|
"hyperdrive_run = experiment.submit(hyperdrive_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor HyperDrive runs\n",
|
|
"You can monitor the progress of the runs with the following Jupyter widget. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"RunDetails(hyperdrive_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"hyperdrive_run.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"assert(hyperdrive_run.get_status() == \"Completed\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Warm start a Hyperparameter Tuning experiment and resuming child runs\n",
|
|
"Often times, finding the best hyperparameter values for your model can be an iterative process, needing multiple tuning runs that learn from previous hyperparameter tuning runs. Reusing knowledge from these previous runs will accelerate the hyperparameter tuning process, thereby reducing the cost of tuning the model and will potentially improve the primary metric of the resulting model. When warm starting a hyperparameter tuning experiment with Bayesian sampling, trials from the previous run will be used as prior knowledge to intelligently pick new samples, so as to improve the primary metric. Additionally, when using Random or Grid sampling, any early termination decisions will leverage metrics from the previous runs to determine poorly performing training runs. \n",
|
|
"\n",
|
|
"Azure Machine Learning allows you to warm start your hyperparameter tuning run by leveraging knowledge from up to 5 previously completed hyperparameter tuning parent runs. \n",
|
|
"\n",
|
|
"Additionally, there might be occasions when individual training runs of a hyperparameter tuning experiment are cancelled due to budget constraints or fail due to other reasons. It is now possible to resume such individual training runs from the last checkpoint (assuming your training script handles checkpoints). Resuming an individual training run will use the same hyperparameter configuration and mount the storage used for that run. The training script should accept the \"--resume-from\" argument, which contains the checkpoint or model files from which to resume the training run. You can also resume individual runs as part of an experiment that spends additional budget on hyperparameter tuning. Any additional budget, after resuming the specified training runs is used for exploring additional configurations.\n",
|
|
"\n",
|
|
"For more information on warm starting and resuming hyperparameter tuning runs, please refer to the [Hyperparameter Tuning for Azure Machine Learning documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters) \n",
|
|
"\n",
|
|
"### Find and register best model\n",
|
|
"When all jobs finish, we can find out the one that has the highest accuracy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
|
|
"print(best_run.get_details()['runDefinition']['arguments'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, let's list the model files uploaded during the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(best_run.get_file_names())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can then register the folder (and all files in it) as a model named `chainer-dnn-mnist` under the workspace for deployment"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = best_run.register_model(model_name='chainer-dnn-mnist', model_path='outputs/model.npz')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy the model in ACI\n",
|
|
"Now, we are ready to deploy the model as a web service running in Azure Container Instance, [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n",
|
|
"\n",
|
|
"### Create scoring script\n",
|
|
"First, we will create a scoring script that will be invoked by the web service call.\n",
|
|
"+ Now that the scoring script must have two required functions, `init()` and `run(input_data)`.\n",
|
|
" + In `init()`, you typically load the model into a global object. This function is executed only once when the Docker contianer is started.\n",
|
|
" + In `run(input_data)`, the model is used to predict a value based on the input data. The input and output to `run` uses NPZ as the serialization and de-serialization format because it is the preferred format for Chainer, but you are not limited to it.\n",
|
|
" \n",
|
|
"Refer to the scoring script `chainer_score.py` for this tutorial. Our web service will use this file to predict. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"shutil.copy('chainer_score.py', project_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create myenv.yml\n",
|
|
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda package `numpy` and pip install `chainer`. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.runconfig import CondaDependencies\n",
|
|
"\n",
|
|
"cd = CondaDependencies.create()\n",
|
|
"cd.add_conda_package('numpy')\n",
|
|
"cd.add_pip_package('chainer==5.1.0')\n",
|
|
"cd.add_pip_package(\"azureml-defaults\")\n",
|
|
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
|
|
"\n",
|
|
"print(cd.serialize_to_string())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy to ACI\n",
|
|
"We are almost ready to deploy. Create the inference configuration and deployment configuration and deploy to ACI. This cell will run for about 7-8 minutes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.webservice import AciWebservice\n",
|
|
"from azureml.core.model import InferenceConfig\n",
|
|
"from azureml.core.webservice import Webservice\n",
|
|
"from azureml.core.model import Model\n",
|
|
"from azureml.core.environment import Environment\n",
|
|
"\n",
|
|
"\n",
|
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
|
"inference_config = InferenceConfig(entry_script=\"chainer_score.py\", environment=myenv)\n",
|
|
"\n",
|
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n",
|
|
" auth_enabled=True, # this flag generates API keys to secure access\n",
|
|
" memory_gb=1,\n",
|
|
" tags={'name': 'mnist', 'framework': 'Chainer'},\n",
|
|
" description='Chainer DNN with MNIST')\n",
|
|
"\n",
|
|
"service = Model.deploy(workspace=ws, \n",
|
|
" name='chainer-mnist-1', \n",
|
|
" models=[model], \n",
|
|
" inference_config=inference_config, \n",
|
|
" deployment_config=aciconfig)\n",
|
|
"service.wait_for_deployment(True)\n",
|
|
"print(service.state)\n",
|
|
"print(service.scoring_uri)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(service.get_logs())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This is the scoring web service endpoint:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(service.scoring_uri)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Test the deployed model\n",
|
|
"Let's test the deployed model. Pick a random sample from the test set, and send it to the web service hosted in ACI for a prediction. Note, here we are using the an HTTP request to invoke the service.\n",
|
|
"\n",
|
|
"We can retrieve the API keys used for accessing the HTTP endpoint and construct a raw HTTP request to send to the service. Don't forget to add key to the HTTP header."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# retreive the API keys. two keys were generated.\n",
|
|
"key1, Key2 = service.get_keys()\n",
|
|
"print(key1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%matplotlib inline\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import urllib\n",
|
|
"import gzip\n",
|
|
"import numpy as np\n",
|
|
"import struct\n",
|
|
"import requests\n",
|
|
"\n",
|
|
"\n",
|
|
"# load compressed MNIST gz files and return numpy arrays\n",
|
|
"def load_data(filename, label=False):\n",
|
|
" with gzip.open(filename) as gz:\n",
|
|
" struct.unpack('I', gz.read(4))\n",
|
|
" n_items = struct.unpack('>I', gz.read(4))\n",
|
|
" if not label:\n",
|
|
" n_rows = struct.unpack('>I', gz.read(4))[0]\n",
|
|
" n_cols = struct.unpack('>I', gz.read(4))[0]\n",
|
|
" res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)\n",
|
|
" res = res.reshape(n_items[0], n_rows * n_cols)\n",
|
|
" else:\n",
|
|
" res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)\n",
|
|
" res = res.reshape(n_items[0], 1)\n",
|
|
" return res\n",
|
|
"\n",
|
|
"os.makedirs('./data/mnist', exist_ok=True)\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
|
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')\n",
|
|
"\n",
|
|
"X_test = load_data('./data/mnist/test-images.gz', False)\n",
|
|
"y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n",
|
|
"\n",
|
|
"\n",
|
|
"# send a random row from the test set to score\n",
|
|
"random_index = np.random.randint(0, len(X_test)-1)\n",
|
|
"input_data = \"{\\\"data\\\": [\" + str(random_index) + \"]}\"\n",
|
|
"\n",
|
|
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
|
"\n",
|
|
"# send sample to service for scoring\n",
|
|
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
|
"\n",
|
|
"print(\"label:\", y_test[random_index])\n",
|
|
"print(\"prediction:\", resp.text[1])\n",
|
|
"\n",
|
|
"plt.imshow(X_test[random_index].reshape((28,28)), cmap='gray')\n",
|
|
"plt.axis('off')\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's look at the workspace after the web service was deployed. You should see\n",
|
|
"\n",
|
|
" + a registered model named 'chainer-dnn-mnist' and with the id 'chainer-dnn-mnist:1'\n",
|
|
" + a webservice called 'chainer-mnist-svc' with some scoring URL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = ws.models['chainer-dnn-mnist']\n",
|
|
"print(\"Model: {}, ID: {}\".format('chainer-dnn-mnist', model.id))\n",
|
|
" \n",
|
|
"webservice = ws.webservices['chainer-mnist-1']\n",
|
|
"print(\"Webservice: {}, scoring URI: {}\".format('chainer-mnist-1', webservice.scoring_uri))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Clean up"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can delete the ACI deployment with a simple delete API call."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"service.delete()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "nagaur"
|
|
}
|
|
],
|
|
"category": "training",
|
|
"compute": [
|
|
"AML Compute"
|
|
],
|
|
"datasets": [
|
|
"MNIST"
|
|
],
|
|
"deployment": [
|
|
"Azure Container Instance"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"Chainer"
|
|
],
|
|
"friendly_name": "Train a model with hyperparameter tuning",
|
|
"index_order": 1,
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.7"
|
|
},
|
|
"tags": [
|
|
"None"
|
|
],
|
|
"task": "Train a Convolutional Neural Network (CNN)"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |