{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-within-notebook/train-within-notebook.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Train and deploy a model\n", "_**Create and deploy a model directly from a notebook**_\n", "\n", "---\n", "---\n", "\n", "## Contents\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Data](#Data)\n", "1. [Train](#Train)\n", " 1. Viewing run results\n", " 1. Simple parameter sweep\n", " 1. Viewing experiment results\n", " 1. Select the best model\n", "1. [Deploy](#Deploy)\n", " 1. Register the model\n", " 1. Create a scoring file\n", " 1. Describe your environment\n", " 1. Descrice your target compute\n", " 1. Deploy your webservice\n", " 1. Test your webservice\n", " 1. Clean up\n", "1. [Next Steps](#nextsteps)\n", "\n", "---\n", "\n", "## Introduction\n", "Azure Machine Learning provides capabilities to control all aspects of model training and deployment directly from a notebook using the AML Python SDK. In this notebook we will\n", "* connect to our AML Workspace\n", "* create an experiment that contains multiple runs with tracked metrics\n", "* choose the best model created across all runs\n", "* deploy that model as a service\n", "\n", "In the end we will have a model deployed as a web service which we can call from an HTTP endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Setup\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. From the configuration, the important sections are the workspace configuration and ACI regristration.\n", "\n", "We will also need the following libraries install to our conda environment. If these are not installed, use the following command to do so and restart the notebook.\n", "```shell\n", "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", "```\n", "\n", "For this notebook we need the Azure ML SDK and access to our workspace. The following cell imports the SDK, checks the version, and accesses our already configured AzureML workspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "install" ], "name": "load_ws", "msdoc": "how-to-track-experiments.md" }, "outputs": [], "source": [ "import azureml.core\n", "from azureml.core import Experiment, Workspace\n", "\n", "# Check core SDK version number\n", "print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n", "print(\"\")\n", "\n", "\n", "ws = Workspace.from_config()\n", "print('Workspace name: ' + ws.name, \n", " 'Azure region: ' + ws.location, \n", " 'Subscription id: ' + ws.subscription_id, \n", " 'Resource group: ' + ws.resource_group, sep='\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Data\n", "We will use the diabetes dataset for this experiement, a well-known small dataset that comes with scikit-learn. This cell loads the dataset and splits it into random training and testing sets.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "load_data", "msdoc": "how-to-track-experiments.md" }, "outputs": [], "source": [ "from sklearn.datasets import load_diabetes\n", "from sklearn.linear_model import Ridge\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.externals import joblib\n", "\n", "X, y = load_diabetes(return_X_y = True)\n", "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", "data = {\n", " \"train\":{\"X\": X_train, \"y\": y_train}, \n", " \"test\":{\"X\": X_test, \"y\": y_test}\n", "}\n", "\n", "print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Train\n", "\n", "Let's use scikit-learn to train a simple Ridge regression model. We use AML to record interesting information about the model in an Experiment. An Experiment contains a series of trials called Runs. During this trial we use AML in the following way:\n", "* We access an experiment from our AML workspace by name, which will be created if it doesn't exist\n", "* We use `start_logging` to create a new run in this experiment\n", "* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n", "* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n", "* We use `run.complete()` to indicate that the run is over and results can be captured and finalized" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "local run", "outputs upload" ], "name": "create_experiment", "msdoc": "how-to-track-experiments.md" }, "outputs": [], "source": [ "# Get an experiment object from Azure Machine Learning\n", "experiment = Experiment(workspace=ws, name=\"train-within-notebook\")\n", "\n", "# Create a run object in the experiment\n", "run = experiment.start_logging()\n", "# Log the algorithm parameter alpha to the run\n", "run.log('alpha', 0.03)\n", "\n", "# Create, fit, and test the scikit-learn Ridge regression model\n", "regression_model = Ridge(alpha=0.03)\n", "regression_model.fit(data['train']['X'], data['train']['y'])\n", "preds = regression_model.predict(data['test']['X'])\n", "\n", "# Output the Mean Squared Error to the notebook and to the run\n", "print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n", "run.log('mse', mean_squared_error(data['test']['y'], preds))\n", "\n", "# Save the model to the outputs directory for capture\n", "model_file_name = 'outputs/model.pkl'\n", "\n", "joblib.dump(value = regression_model, filename = model_file_name)\n", "\n", "# upload the model file explicitly into artifacts \n", "run.upload_file(name = model_file_name, path_or_stream = model_file_name)\n", "\n", "# Complete the run\n", "run.complete()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing run results\n", "Azure Machine Learning stores all the details about the run in the Azure cloud. Let's access those details by retrieving a link to the run using the default run output. Clicking on the resulting link will take you to an interactive page presenting all run information." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple parameter sweep\n", "Now let's take the same concept from above and modify the **alpha** parameter. For each value of alpha we will create a run that will store metrics and the resulting model. In the end we can use the captured run history to determine which model was the best for us to deploy. \n", "\n", "Note that by using `with experiment.start_logging() as run` AML will automatically call `run.complete()` at the end of each loop.\n", "\n", "This example also uses the **tqdm** library to provide a thermometer feedback" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from tqdm import tqdm\n", "\n", "# list of numbers from 0 to 1.0 with a 0.05 interval\n", "alphas = np.arange(0.0, 1.0, 0.05)\n", "\n", "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", "for alpha in tqdm(alphas):\n", " # create a bunch of runs, each train a model with a different alpha value\n", " with experiment.start_logging() as run:\n", " # Use Ridge algorithm to build a regression model\n", " regression_model = Ridge(alpha=alpha)\n", " regression_model.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n", " preds = regression_model.predict(X=data[\"test\"][\"X\"])\n", " mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n", "\n", " # log alpha, mean_squared_error and feature names in run history\n", " run.log(name=\"alpha\", value=alpha)\n", " run.log(name=\"mse\", value=mse)\n", "\n", " # Save the model to the outputs directory for capture\n", " joblib.dump(value=regression_model, filename='outputs/model.pkl')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing experiment results\n", "Similar to viewing the run, we can also view the entire experiment. The experiment report view in the Azure portal lets us view all the runs in a table, and also allows us to customize charts. This way, we can see how the alpha parameter impacts the quality of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# now let's take a look at the experiment in Azure portal.\n", "experiment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select the best model \n", "Now that we've created many runs with different parameters, we need to determine which model is the best for deployment. For this, we will iterate over the set of runs. From each run we will take the *run id* using the `id` property, and examine the metrics by calling `run.get_metrics()`. \n", "\n", "Since each run may be different, we do need to check if the run has the metric that we are looking for, in this case, **mse**. To find the best run, we create a dictionary mapping the run id's to the metrics.\n", "\n", "Finally, we use the `tag` method to mark the best run to make it easier to find later. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "runs = {}\n", "run_metrics = {}\n", "\n", "# Create dictionaries containing the runs and the metrics for all runs containing the 'mse' metric\n", "for r in tqdm(experiment.get_runs()):\n", " metrics = r.get_metrics()\n", " if 'mse' in metrics.keys():\n", " runs[r.id] = r\n", " run_metrics[r.id] = metrics\n", "\n", "# Find the run with the best (lowest) mean squared error and display the id and metrics\n", "best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n", "best_run = runs[best_run_id]\n", "print('Best run is:', best_run_id)\n", "print('Metrics:', run_metrics[best_run_id])\n", "\n", "# Tag the best run for identification later\n", "best_run.tag(\"Best Run\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Deploy\n", "Now that we have trained a set of models and identified the run containing the best model, we want to deploy the model for real time inference. The process of deploying a model involves\n", "* registering a model in your workspace\n", "* creating a scoring file containing init and run methods\n", "* creating an environment dependency file describing packages necessary for your scoring file\n", "* deploying the model and packages as a web service" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Register a model\n", "We have already identified which run contains the \"best model\" by our evaluation criteria. Each run has a file structure associated with it that contains various files collected during the run. Since a run can have many outputs we need to tell AML which file from those outputs represents the model that we want to use for our deployment. We can use the `run.get_file_names()` method to list the files associated with the run, and then use the `run.register_model()` method to place the model in the workspace's model registry.\n", "\n", "When using `run.register_model()` we supply a `model_name` that is meaningful for our scenario and the `model_path` of the model relative to the run. In this case, the model path is what is returned from `run.get_file_names()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "query history" ] }, "outputs": [], "source": [ "# View the files in the run\n", "for f in best_run.get_file_names():\n", " print(f)\n", " \n", "# Register the model with the workspace\n", "model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once a model is registered, it is accessible from the list of models on the AML workspace. If you register models with the same name multiple times, AML keeps a version history of those models for you. The `Model.list()` lists all models in a workspace, and can be filtered by name, tags, or model properties. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "register model from history" ] }, "outputs": [], "source": [ "# Find all models called \"best_model\" and display their version numbers\n", "from azureml.core.model import Model\n", "models = Model.list(ws, name='best_model')\n", "for m in models:\n", " print(m.name, m.version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a scoring file\n", "\n", "Since your model file can essentially be anything you want it to be, you need to supply a scoring script that can load your model and then apply the model to new data. This script is your 'scoring file'. This scoring file is a python program containing, at a minimum, two methods `init()` and `run()`. The `init()` method is called once when your deployment is started so you can load your model and any other required objects. This method uses the `get_model_path` function to locate the registered model inside the docker container. The `run()` method is called interactively when the web service is called with one or more data samples to predict.\n", "\n", "The scoring file used for this exercise is [here](score.py). \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Describe your environment\n", "\n", "Each modelling process may require a unique set of packages. Therefore we need to create an environment object describing the dependencies. \n", "\n", "Next we create an inference configuration using this environment object and the scoring script that we created previously." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.environment import Environment\n", "from azureml.core.model import InferenceConfig\n", "\n", "env = Environment('deploytocloudenv')\n", "env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],pip_packages=['azureml-defaults'])\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Describe your target compute\n", "In addition to the inference configuration, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n", "\n", "For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "from azureml.core.webservice import AciWebservice\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", " tags={'sample name': 'AML 101'}, \n", " description='This is a great example.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy your webservice\n", "The final step to deploying your webservice is to call `Model.deploy()`. This function uses the deployment and inference configurations created above to perform the following:\n", "* Build a docker image\n", "* Deploy to the docker image to an Azure Container Instance\n", "* Copy your model files to the Azure Container Instance\n", "* Call the `init()` function in your scoring file\n", "* Provide an HTTP endpoint for scoring calls\n", "\n", "The `Model.deploy` method requires the following parameters\n", "* `workspace` - the workspace containing the service\n", "* `name` - a unique named used to identify the service in the workspace\n", "* `models` - an array of models to be deployed into the container\n", "* `inference_config` - a configuration object describing the image environment\n", "* `deployment_config` - a configuration object describing the compute type\n", " \n", "**Note:** The web service creation can take several minutes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "%%time\n", "from azureml.core.model import Model\n", "from azureml.core.webservice import Webservice\n", "\n", "# Create the webservice using all of the precreated configurations and our best model\n", "service = Model.deploy(workspace=ws,\n", " name='my-aci-svc',\n", " models=[model],\n", " inference_config=inference_config,\n", " deployment_config=aciconfig)\n", "\n", "# Wait for the service deployment to complete while displaying log output\n", "service.wait_for_deployment(show_output=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Test your webservice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that your web service is runing you can send JSON data directly to the service using the `run` method. This cell pulls the first test sample from the original dataset into JSON and then sends it to the service." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "import json\n", "\n", "service = ws.webservices['my-aci-svc']\n", "\n", "# scrape the first row from the test set.\n", "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", "\n", "#score on our service\n", "service.run(input_data = test_samples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This cell shows how you can send multiple rows to the webservice at once. It then calculates the residuals - that is, the errors - by subtracting out the actual values from the results. These residuals are used later to show a plotted result." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "# score the entire test set.\n", "test_samples = json.dumps({'data': X_test.tolist()})\n", "\n", "result = service.run(input_data = test_samples)\n", "residual = result - y_test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This cell shows how you can use the `service.scoring_uri` property to access the HTTP endpoint of the service and call it using standard POST operations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "import requests\n", "\n", "# use the first row from the test set again\n", "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", "\n", "# create the required header\n", "headers = {'Content-Type':'application/json'}\n", "\n", "# post the request to the service and display the result\n", "resp = requests.post(service.scoring_uri, test_samples, headers = headers)\n", "print(resp.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Residual graph\n", "One way to understand the behavior of your model is to see how the data performs against data with known results. This cell uses matplotlib to create a histogram of the residual values, or errors, created from scoring the test samples.\n", "\n", "A good model should have residual values that cluster around 0 - that is, no error. Observing the resulting histogram can also show you if the model is skewed in any particular direction." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n", "f.suptitle('Residual Values', fontsize = 18)\n", "\n", "f.set_figheight(6)\n", "f.set_figwidth(14)\n", "\n", "a0.plot(residual, 'bo', alpha=0.4)\n", "a0.plot([0,90], [0,0], 'r', lw=2)\n", "a0.set_ylabel('residue values', fontsize=14)\n", "a0.set_xlabel('test data set', fontsize=14)\n", "\n", "a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step')\n", "a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10)\n", "a1.set_yticklabels([])\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clean up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the ACI instance to stop the compute and any associated billing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "deploy service", "aci" ] }, "outputs": [], "source": [ "%%time\n", "service.delete()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Next Steps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, you created a series of models inside the notebook using local data, stored them inside an AML experiment, found the best one and deployed it as a live service! From here you can continue to use Azure Machine Learning in this regard to run your own experiments and deploy your own models, or you can expand into further capabilities of AML!\n", "\n", "If you have a model that is difficult to process locally, either because the data is remote or the model is large, try the [train-on-remote-vm](../train-on-remote-vm) notebook to learn about submitting remote jobs.\n", "\n", "If you want to take advantage of multiple cloud machines to perform large parameter sweeps try the [train-hyperparameter-tune-deploy-with-pytorch](../../training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch\n", ") sample.\n", "\n", "If you want to deploy models to a production cluster try the [production-deploy-to-aks](../../deployment/production-deploy-to-aks\n", ") notebook." ] } ], "metadata": { "authors": [ { "name": "roastala" } ], "category": "tutorial", "compute": [ "Local" ], "datasets": [ "Diabetes" ], "deployment": [ "Azure Container Instance" ], "exclude_from_index": false, "framework": [ "None" ], "friendly_name": "Train and deploy a model using Python SDK", "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "tags": [ "None" ], "task": "Training and deploying a model from a notebook" }, "nbformat": 4, "nbformat_minor": 2 }