mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
786 lines
30 KiB
Plaintext
786 lines
30 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 01. Train, hyperparameter tune, and deploy with PyTorch\n",
|
|
"\n",
|
|
"In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (AML) Python SDK.\n",
|
|
"\n",
|
|
"This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify ants and bees by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites\n",
|
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
|
"* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n",
|
|
" * install the AML SDK\n",
|
|
" * create a workspace and its configuration file (`config.json`)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check core SDK version number\n",
|
|
"import azureml.core\n",
|
|
"\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Diagnostics\n",
|
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"Diagnostics"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
|
"set_diagnostics_collection(send_diagnostics = True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize workspace\n",
|
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print('Workspace name: ' + ws.name, \n",
|
|
" 'Azure region: ' + ws.location, \n",
|
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create a remote compute target\n",
|
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n",
|
|
"\n",
|
|
"**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# choose a name for your cluster\n",
|
|
"cluster_name = \"gpucluster\"\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target.')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
|
" autoscale_enabled=True,\n",
|
|
" cluster_min_nodes=0, \n",
|
|
" cluster_max_nodes=4)\n",
|
|
"\n",
|
|
" # create the cluster\n",
|
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
"\n",
|
|
" compute_target.wait_for_completion(show_output=True)\n",
|
|
"\n",
|
|
" # Use the 'status' property to get a detailed status for the current cluster. \n",
|
|
" print(compute_target.status.serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Upload training data\n",
|
|
"The dataset we will use consists of about 120 training images each for ants and bees, with 75 validation images for each class.\n",
|
|
"\n",
|
|
"First, download the dataset (located [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip) as a zip file) locally to your current directory and extract the files. This will create a folder called `hymenoptera_data` with two subfolders `train` and `val` that contain the training and validation images, respectively. [Hymenoptera](https://en.wikipedia.org/wiki/Hymenoptera) is the order of insects that includes ants and bees."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import urllib\n",
|
|
"from zipfile import ZipFile\n",
|
|
"\n",
|
|
"# download data\n",
|
|
"download_url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'\n",
|
|
"data_file = './hymenoptera_data.zip'\n",
|
|
"urllib.request.urlretrieve(download_url, filename=data_file)\n",
|
|
"\n",
|
|
"# extract files\n",
|
|
"with ZipFile(data_file, 'r') as zip:\n",
|
|
" print('extracting files...')\n",
|
|
" zip.extractall()\n",
|
|
" print('done')\n",
|
|
" \n",
|
|
"# delete zip file\n",
|
|
"os.remove(data_file)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n",
|
|
"\n",
|
|
"**Note: If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step.**\n",
|
|
"\n",
|
|
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ds = ws.get_default_datastore()\n",
|
|
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following code will upload the training data to the path `./hymenoptera_data` on the default datastore."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ds.upload(src_dir='./hymenoptera_data', target_path='hymenoptera_data')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path_on_datastore = 'hymenoptera_data'\n",
|
|
"ds_data = ds.path(path_on_datastore)\n",
|
|
"print(ds_data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train model on the remote compute\n",
|
|
"Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a project directory\n",
|
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"project_folder = './pytorch-hymenoptera'\n",
|
|
"os.makedirs(project_folder, exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare training script\n",
|
|
"Now you will need to create your training script. In this tutorial, the training script is already provided for you at `pytorch_train.py`. In practice, you should be able to take any custom training script as is and run it with AML without having to modify your code.\n",
|
|
"\n",
|
|
"However, if you would like to use AML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of AML code inside your training script. \n",
|
|
"\n",
|
|
"In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n",
|
|
"```Python\n",
|
|
"from azureml.core.run import Run\n",
|
|
"run = Run.get_context()\n",
|
|
"```\n",
|
|
"Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n",
|
|
"```Python\n",
|
|
"run.log('lr', np.float(learning_rate))\n",
|
|
"run.log('momentum', np.float(momentum))\n",
|
|
"\n",
|
|
"run.log('best_val_acc', np.float(best_acc))\n",
|
|
"```\n",
|
|
"These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Once your script is ready, copy the training script `pytorch_train.py` into your project directory."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import shutil\n",
|
|
"shutil.copy('pytorch_train.py', project_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create an experiment\n",
|
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'pytorch-hymenoptera'\n",
|
|
"experiment = Experiment(ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a PyTorch estimator\n",
|
|
"The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.dnn import PyTorch\n",
|
|
"\n",
|
|
"script_params = {\n",
|
|
" '--data_dir': ds_data,\n",
|
|
" '--num_epochs': 25,\n",
|
|
" '--output_dir': './outputs'\n",
|
|
"}\n",
|
|
"\n",
|
|
"estimator = PyTorch(source_directory=project_folder, \n",
|
|
" script_params=script_params,\n",
|
|
" compute_target=compute_target,\n",
|
|
" entry_script='pytorch_train.py',\n",
|
|
" use_gpu=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n",
|
|
"- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `hymenoptera_data` on our datastore.\n",
|
|
"- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n",
|
|
"\n",
|
|
"To leverage the Azure VM's GPU for training, we set `use_gpu=True`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Submit job\n",
|
|
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run = experiment.submit(estimator)\n",
|
|
"print(run.get_details())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor your run\n",
|
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.widgets import RunDetails\n",
|
|
"RunDetails(run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tune model hyperparameters\n",
|
|
"Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Start a hyperparameter sweep\n",
|
|
"First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).\n",
|
|
"\n",
|
|
"Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).\n",
|
|
"Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.hyperdrive import *\n",
|
|
"\n",
|
|
"param_sampling = RandomParameterSampling( {\n",
|
|
" 'learning_rate': uniform(0.0005, 0.005),\n",
|
|
" 'momentum': uniform(0.9, 0.99)\n",
|
|
" }\n",
|
|
")\n",
|
|
"\n",
|
|
"early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n",
|
|
"\n",
|
|
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
|
|
" hyperparameter_sampling=param_sampling, \n",
|
|
" policy=early_termination_policy,\n",
|
|
" primary_metric_name='best_val_acc',\n",
|
|
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
|
" max_total_runs=20,\n",
|
|
" max_concurrent_runs=4)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, lauch the hyperparameter tuning job."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# start the HyperDrive run\n",
|
|
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor HyperDrive runs\n",
|
|
"You can monitor the progress of the runs with the following Jupyter widget. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(hyperdrive_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Find and register the best model\n",
|
|
"Once all the runs complete, we can find the run that produced the model with the highest accuracy."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
|
|
"best_run_metrics = best_run.get_metrics()\n",
|
|
"print(best_run)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print('Best Run is:\\n Validation accuracy: {0:.5f} \\n Learning rate: {1:.5f} \\n Momentum: {2:.5f}'.format(\n",
|
|
" best_run_metrics['best_val_acc'][-1],\n",
|
|
" best_run_metrics['lr'],\n",
|
|
" best_run_metrics['momentum'])\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, register the model from your best-performing run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. In the next section, we will deploy this registered model as a web service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = best_run.register_model(model_name = 'pytorch-hymenoptera', model_path = 'outputs/model.pt')\n",
|
|
"print(model.name, model.id, model.version, sep = '\\t')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy model as web service\n",
|
|
"Once you have your trained model, you can deploy the model on Azure. In this tutorial, we will deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). For more information on deploying models using Azure ML, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create scoring script\n",
|
|
"\n",
|
|
"First, we will create a scoring script that will be invoked by the web service call. Note that the scoring script must have two required functions:\n",
|
|
"* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n",
|
|
"* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n",
|
|
"\n",
|
|
"Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is an ant or a bee. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create environment file\n",
|
|
"Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by AML. In this case, we need to specify `torch`, `torchvision`, `pillow`, and `azureml-sdk`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%writefile myenv.yml\n",
|
|
"name: myenv\n",
|
|
"channels:\n",
|
|
" - defaults\n",
|
|
"dependencies:\n",
|
|
" - pip:\n",
|
|
" - torch\n",
|
|
" - torchvision\n",
|
|
" - pillow\n",
|
|
" - azureml-core"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Configure the container image\n",
|
|
"Now configure the Docker image that you will use to build your ACI container."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.image import ContainerImage\n",
|
|
"\n",
|
|
"image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n",
|
|
" runtime='python', \n",
|
|
" conda_file='myenv.yml',\n",
|
|
" description='Image with hymenoptera model')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Configure the ACI container\n",
|
|
"We are almost ready to deploy. Create a deployment configuration file to specify the number of CPUs and gigabytes of RAM needed for your ACI container. While it depends on your model, the default of `1` core and `1` gigabyte of RAM is usually sufficient for many models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.webservice import AciWebservice\n",
|
|
"\n",
|
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
|
" memory_gb=1, \n",
|
|
" tags={'data': 'hymenoptera', 'method':'transfer learning', 'framework':'pytorch'},\n",
|
|
" description='Classify ants/bees using transfer learning with PyTorch')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy the registered model\n",
|
|
"Finally, let's deploy a web service from our registered model. First, retrieve the model from your workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"model = Model(ws, name='pytorch-hymenoptera')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then, deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%time\n",
|
|
"from azureml.core.webservice import Webservice\n",
|
|
"\n",
|
|
"service_name = 'aci-hymenoptera'\n",
|
|
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
|
" name=service_name,\n",
|
|
" models=[model],\n",
|
|
" image_config=image_config,\n",
|
|
" deployment_config=aciconfig,)\n",
|
|
"\n",
|
|
"service.wait_for_deployment(show_output=True)\n",
|
|
"print(service.state)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If your deployment fails for any reason and you need to redeploy, make sure to delete the service before you do so: `service.delete()`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"service.get_logs()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Get the web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(service.scoring_uri)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Test the web service\n",
|
|
"Finally, let's test our deployed web service. We will send the data as a JSON string to the web service hosted in ACI and use the SDK's `run` API to invoke the service. Here we will take an arbitrary image from our validation data to predict on."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os, json, base64\n",
|
|
"from io import BytesIO\n",
|
|
"from PIL import Image\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"def imgToBase64(img):\n",
|
|
" \"\"\"Convert pillow image to base64-encoded image\"\"\"\n",
|
|
" imgio = BytesIO()\n",
|
|
" img.save(imgio, 'JPEG')\n",
|
|
" img_str = base64.b64encode(imgio.getvalue())\n",
|
|
" return img_str.decode('utf-8')\n",
|
|
"\n",
|
|
"test_img = os.path.join('hymenoptera_data', 'val', 'bees', '10870992_eebeeb3a12.jpg') #arbitary image from val dataset\n",
|
|
"plt.imshow(Image.open(test_img))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"base64Img = imgToBase64(Image.open(test_img))\n",
|
|
"\n",
|
|
"result = service.run(input_data=json.dumps({'data': base64Img}))\n",
|
|
"print(json.loads(result))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete web service\n",
|
|
"Once you no longer need the web service, you should delete it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"service.delete()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "minxia"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.6"
|
|
},
|
|
"msauthor": "minxia"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |