mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-23 02:52:39 -05:00
719
tutorials/01.train-models.ipynb
Normal file
719
tutorials/01.train-models.ipynb
Normal file
@@ -0,0 +1,719 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial #1: Train an image classification model with Azure Machine Learning\n",
|
||||
"\n",
|
||||
"In this tutorial, you train a machine learning model both locally and on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook. You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**. \n",
|
||||
"\n",
|
||||
"This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n",
|
||||
"\n",
|
||||
"Learn how to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train a simple logistic regression model locally using the popular scikit-learn machine learning library \n",
|
||||
"> * Train multiple models on a remote cluster\n",
|
||||
"> * Review training results, find and register the best model\n",
|
||||
"\n",
|
||||
"You'll learn how to select a model and deploy it in [part two of this tutorial](deploy-models.ipynb) later. \n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n",
|
||||
"* Create a workspace and its configuration file (**config.json**) \n",
|
||||
"* Save your **config.json** to the same folder as this notebook"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up your development environment\n",
|
||||
"\n",
|
||||
"All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n",
|
||||
"\n",
|
||||
"* Importing Python packages\n",
|
||||
"* Connecting to a workspace to enable communication between your local computer and remote resources\n",
|
||||
"* Creating an experiment to track all your runs\n",
|
||||
"* Creating a remote compute target to use for training\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"\n",
|
||||
"Import Python packages you need in this session. Also display the Azure Machine Learning SDK version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"check version"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib notebook\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"import azureml\n",
|
||||
"from azureml.core import Workspace, Run\n",
|
||||
"\n",
|
||||
"# check core SDK version number\n",
|
||||
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to workspace\n",
|
||||
"\n",
|
||||
"Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"load workspace"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load workspace configuration from the config.json file in the current folder.\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create experiment\n",
|
||||
"\n",
|
||||
"Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"create experiment"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'sklearn-mnist'\n",
|
||||
"\n",
|
||||
"from azureml.core import Experiment\n",
|
||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create remote compute target\n",
|
||||
"\n",
|
||||
"Azure Azure ML Managed Compute is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Managed Compute cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace. \n",
|
||||
"\n",
|
||||
" **Creation of the cluster takes approximately 5 minutes.** If the cluster is already in the workspace this code uses it and skips the creation process."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"create mlc",
|
||||
"batchai"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import BatchAiCompute\n",
|
||||
"from azureml.core.compute import ComputeTarget\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# choose a name for your cluster\n",
|
||||
"batchai_cluster_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", ws.name + \"gpu\")\n",
|
||||
"cluster_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 1)\n",
|
||||
"cluster_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 3)\n",
|
||||
"vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_NC6\")\n",
|
||||
"autoscale_enabled = os.environ.get(\"BATCHAI_CLUSTER_AUTOSCALE_ENABLED\", True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if batchai_cluster_name in ws.compute_targets:\n",
|
||||
" compute_target = ws.compute_targets[batchai_cluster_name]\n",
|
||||
" if compute_target and type(compute_target) is BatchAiCompute:\n",
|
||||
" print('found compute target. just use it. ' + batchai_cluster_name)\n",
|
||||
"else:\n",
|
||||
" print('creating a new compute target...')\n",
|
||||
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n",
|
||||
" vm_priority = 'lowpriority', # optional\n",
|
||||
" autoscale_enabled = autoscale_enabled,\n",
|
||||
" cluster_min_nodes = cluster_min_nodes, \n",
|
||||
" cluster_max_nodes = cluster_max_nodes)\n",
|
||||
"\n",
|
||||
" # create the cluster\n",
|
||||
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n",
|
||||
" \n",
|
||||
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
||||
" # if no min node count is provided it will use the scale settings for the cluster\n",
|
||||
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||
" \n",
|
||||
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
|
||||
" print(compute_target.status.serialize())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have the necessary packages and compute resources to train a model in the cloud. \n",
|
||||
"\n",
|
||||
"## Explore data\n",
|
||||
"\n",
|
||||
"Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n",
|
||||
"\n",
|
||||
"* Download the MNIST dataset\n",
|
||||
"* Display some sample images\n",
|
||||
"* Upload data to the cloud\n",
|
||||
"\n",
|
||||
"### Download the MNIST dataset\n",
|
||||
"\n",
|
||||
"Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import urllib.request\n",
|
||||
"\n",
|
||||
"os.makedirs('./data', exist_ok = True)\n",
|
||||
"\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/train-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/train-labels.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display some sample images\n",
|
||||
"\n",
|
||||
"Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make sure utils.py is in the same directory as this code\n",
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
|
||||
"X_train = load_data('./data/train-images.gz', False) / 255.0\n",
|
||||
"y_train = load_data('./data/train-labels.gz', True).reshape(-1)\n",
|
||||
"\n",
|
||||
"X_test = load_data('./data/test-images.gz', False) / 255.0\n",
|
||||
"y_test = load_data('./data/test-labels.gz', True).reshape(-1)\n",
|
||||
"\n",
|
||||
"# now let's show some randomly chosen images from the traininng set.\n",
|
||||
"count = 0\n",
|
||||
"sample_size = 30\n",
|
||||
"plt.figure(figsize = (16, 6))\n",
|
||||
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||
" count = count + 1\n",
|
||||
" plt.subplot(1, sample_size, count)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
|
||||
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you have an idea of what these images look like and the expected prediction outcome.\n",
|
||||
"\n",
|
||||
"### Upload data to the cloud\n",
|
||||
"\n",
|
||||
"Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n",
|
||||
"\n",
|
||||
"The MNIST files are uploaded into a directory named `mnist` at the root of the datastore."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use datastore"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"print(ds.datastore_type, ds.account_name, ds.container_name)\n",
|
||||
"\n",
|
||||
"ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have everything you need to start training a model. \n",
|
||||
"\n",
|
||||
"## Train a local model\n",
|
||||
"\n",
|
||||
"Train a simple logistic regression model using scikit-learn locally.\n",
|
||||
"\n",
|
||||
"**Training locally can take a minute or two** depending on your computer configuration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"\n",
|
||||
"clf = LogisticRegression()\n",
|
||||
"clf.fit(X_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, make predictions using the test set and calculate the accuracy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_hat = clf.predict(X_test)\n",
|
||||
"print(np.average(y_hat == y_test))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With just a few lines of code, you have a 92% accuracy.\n",
|
||||
"\n",
|
||||
"## Train on a remote cluster\n",
|
||||
"\n",
|
||||
"Now you can expand on this simple model by building a model with a different regularization rate. This time you'll train the model on a remote resource. \n",
|
||||
"\n",
|
||||
"For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n",
|
||||
"* Create a directory\n",
|
||||
"* Create a training script\n",
|
||||
"* Create an estimator object\n",
|
||||
"* Submit the job \n",
|
||||
"\n",
|
||||
"### Create a directory\n",
|
||||
"\n",
|
||||
"Create a directory to deliver the necessary code from your computer to the remote resource."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"script_folder = './sklearn-mnist'\n",
|
||||
"os.makedirs(script_folder, exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a training script\n",
|
||||
"\n",
|
||||
"To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile $script_folder/train.py\n",
|
||||
"\n",
|
||||
"import argparse\n",
|
||||
"import os\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"\n",
|
||||
"from azureml.core import Run\n",
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model\n",
|
||||
"parser = argparse.ArgumentParser()\n",
|
||||
"parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')\n",
|
||||
"parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n",
|
||||
"args = parser.parse_args()\n",
|
||||
"\n",
|
||||
"data_folder = os.path.join(args.data_folder, 'mnist')\n",
|
||||
"print('Data folder:', data_folder)\n",
|
||||
"\n",
|
||||
"# load train and test set into numpy arrays\n",
|
||||
"# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n",
|
||||
"X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n",
|
||||
"X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n",
|
||||
"y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n",
|
||||
"y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n",
|
||||
"print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n",
|
||||
"\n",
|
||||
"# get hold of the current run\n",
|
||||
"run = Run.get_context()\n",
|
||||
"\n",
|
||||
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n",
|
||||
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n",
|
||||
"clf.fit(X_train, y_train)\n",
|
||||
"\n",
|
||||
"print('Predict the test set')\n",
|
||||
"y_hat = clf.predict(X_test)\n",
|
||||
"\n",
|
||||
"# calculate accuracy on the prediction\n",
|
||||
"acc = np.average(y_hat == y_test)\n",
|
||||
"print('Accuracy is', acc)\n",
|
||||
"\n",
|
||||
"run.log('regularization rate', np.float(args.reg))\n",
|
||||
"run.log('accuracy', np.float(acc))\n",
|
||||
"\n",
|
||||
"os.makedirs('outputs', exist_ok=True)\n",
|
||||
"# note file saved in the outputs folder is automatically uploaded into experiment record\n",
|
||||
"joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice how the script gets data and saves models:\n",
|
||||
"\n",
|
||||
"+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the datastore for this argument:\n",
|
||||
"`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"+ The training script saves your model into a directory named outputs. <br/>\n",
|
||||
"`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`<br/>\n",
|
||||
"Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"shutil.copy('utils.py', script_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an estimator\n",
|
||||
"\n",
|
||||
"An estimator object is used to submit the run. Create your estimator by running the following code to define:\n",
|
||||
"\n",
|
||||
"* The name of the estimator object, `est`\n",
|
||||
"* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
|
||||
"* The compute target. In this case you will use the Batch AI cluster you created\n",
|
||||
"* The training script name, train.py\n",
|
||||
"* Parameters required from the training script \n",
|
||||
"* Python packages needed for training\n",
|
||||
"\n",
|
||||
"In this tutorial, this target is the Batch AI cluster. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure estimator"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.estimator import Estimator\n",
|
||||
"\n",
|
||||
"script_params = {\n",
|
||||
" '--data-folder': ds.as_mount(),\n",
|
||||
" '--regularization': 0.8\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"est = Estimator(source_directory=script_folder,\n",
|
||||
" script_params=script_params,\n",
|
||||
" compute_target=compute_target,\n",
|
||||
" entry_script='train.py',\n",
|
||||
" conda_packages=['scikit-learn'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit the job to the cluster\n",
|
||||
"\n",
|
||||
"Run the experiment by submitting the estimator object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"remote run",
|
||||
"batchai",
|
||||
"scikit-learn"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run = exp.submit(config=est)\n",
|
||||
"run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n",
|
||||
"\n",
|
||||
"## Monitor a remote run\n",
|
||||
"\n",
|
||||
"In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n",
|
||||
"\n",
|
||||
"Here is what's happening while you wait:\n",
|
||||
"\n",
|
||||
"- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n",
|
||||
"\n",
|
||||
" This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n",
|
||||
"\n",
|
||||
"- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n",
|
||||
"\n",
|
||||
"### Jupyter widget\n",
|
||||
"\n",
|
||||
"Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use notebook widget"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.widgets import RunDetails\n",
|
||||
"RunDetails(run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Get log results upon completion\n",
|
||||
"\n",
|
||||
"Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"remote run",
|
||||
"batchai",
|
||||
"scikit-learn"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display run results\n",
|
||||
"\n",
|
||||
"You now have a model trained on a remote cluster. Retrieve the accuracy of the model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get metrics"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(run.get_metrics())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next tutorial you will explore this model in more detail.\n",
|
||||
"\n",
|
||||
"## Register model\n",
|
||||
"\n",
|
||||
"The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n",
|
||||
"\n",
|
||||
"You can see files associated with that run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"query history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(run.get_file_names())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"register model from history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# register model \n",
|
||||
"model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n",
|
||||
"print(model.name, model.id, model.version, sep = '\\t')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train a simple logistic regression locally using the popular scikit-learn machine learning library\n",
|
||||
"> * Train multiple models on a remote cluster\n",
|
||||
"> * Review training details and register the best model\n",
|
||||
"\n",
|
||||
"You are ready to deploy this registered model using the instructions in the next part of the tutorial series:\n",
|
||||
"\n",
|
||||
"> [Tutorial 2 - Deploy models](02.deploy-models.ipynb)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "roastala"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
615
tutorials/02.deploy-models.ipynb
Normal file
615
tutorials/02.deploy-models.ipynb
Normal file
@@ -0,0 +1,615 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n",
|
||||
"\n",
|
||||
"This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](01.train-models.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n",
|
||||
"\n",
|
||||
"Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n",
|
||||
"\n",
|
||||
"In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n",
|
||||
"\n",
|
||||
"> * Set up your testing environment\n",
|
||||
"> * Retrieve the model from your workspace\n",
|
||||
"> * Test the model locally\n",
|
||||
"> * Deploy the model to ACI\n",
|
||||
"> * Test the deployed model\n",
|
||||
"\n",
|
||||
"ACI is not ideal for production deployments, but it is great for testing and understanding the workflow. For scalable production deployments, consider using AKS.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](01.train-models.ipynb) notebook. \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"register model from file"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# If you did NOT complete the tutorial, you can instead run this cell \n",
|
||||
"# This will register a model and download the data needed for this tutorial\n",
|
||||
"# These prerequisites are created in the training tutorial\n",
|
||||
"# Feel free to skip this cell if you completed the training tutorial \n",
|
||||
"\n",
|
||||
"# register a model\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"model_name = \"sklearn_mnist\"\n",
|
||||
"model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n",
|
||||
" model_name=model_name,\n",
|
||||
" tags={\"data\": \"mnist\", \"model\": \"classification\"},\n",
|
||||
" description=\"Mnist handwriting recognition\",\n",
|
||||
" workspace=ws)\n",
|
||||
"\n",
|
||||
"# download test data\n",
|
||||
"import os\n",
|
||||
"import urllib.request\n",
|
||||
"\n",
|
||||
"os.makedirs('./data', exist_ok=True)\n",
|
||||
"\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up the environment\n",
|
||||
"\n",
|
||||
"Start by setting up a testing environment.\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"\n",
|
||||
"Import the Python packages needed for this tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"check version"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib notebook\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
" \n",
|
||||
"import azureml\n",
|
||||
"from azureml.core import Workspace, Run\n",
|
||||
"\n",
|
||||
"# display the core SDK version number\n",
|
||||
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieve the model\n",
|
||||
"\n",
|
||||
"You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"load workspace",
|
||||
"download model"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"model=Model(ws, 'sklearn_mnist')\n",
|
||||
"model.download(target_dir='.', exists_ok=True)\n",
|
||||
"import os \n",
|
||||
"# verify the downloaded model file\n",
|
||||
"os.stat('./sklearn_mnist_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test model locally\n",
|
||||
"\n",
|
||||
"Before deploying, make sure your model is working locally by:\n",
|
||||
"* Loading test data\n",
|
||||
"* Predicting test data\n",
|
||||
"* Examining the confusion matrix\n",
|
||||
"\n",
|
||||
"### Load test data\n",
|
||||
"\n",
|
||||
"Load the test data from the **./data/** directory created during the training tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n",
|
||||
"X_test = load_data('./data/test-images.gz', False) / 255.0\n",
|
||||
"y_test = load_data('./data/test-labels.gz', True).reshape(-1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Predict test data\n",
|
||||
"\n",
|
||||
"Feed the test dataset to the model to get predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pickle\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"\n",
|
||||
"clf = joblib.load('./sklearn_mnist_model.pkl')\n",
|
||||
"y_hat = clf.predict(X_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Examine the confusion matrix\n",
|
||||
"\n",
|
||||
"Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.metrics import confusion_matrix\n",
|
||||
"\n",
|
||||
"conf_mx = confusion_matrix(y_test, y_hat)\n",
|
||||
"print(conf_mx)\n",
|
||||
"print('Overall accuracy:', np.average(y_hat == y_test))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the diagnal cells so that they don't overpower the rest of the cells when visualized\n",
|
||||
"row_sums = conf_mx.sum(axis=1, keepdims=True)\n",
|
||||
"norm_conf_mx = conf_mx / row_sums\n",
|
||||
"np.fill_diagonal(norm_conf_mx, 0)\n",
|
||||
"\n",
|
||||
"fig = plt.figure(figsize=(8,5))\n",
|
||||
"ax = fig.add_subplot(111)\n",
|
||||
"cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n",
|
||||
"ticks = np.arange(0, 10, 1)\n",
|
||||
"ax.set_xticks(ticks)\n",
|
||||
"ax.set_yticks(ticks)\n",
|
||||
"ax.set_xticklabels(ticks)\n",
|
||||
"ax.set_yticklabels(ticks)\n",
|
||||
"fig.colorbar(cax)\n",
|
||||
"plt.ylabel('true labels', fontsize=14)\n",
|
||||
"plt.xlabel('predicted values', fontsize=14)\n",
|
||||
"plt.savefig('conf.png')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy as web service\n",
|
||||
"\n",
|
||||
"Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n",
|
||||
"\n",
|
||||
"To build the correct environment for ACI, provide the following:\n",
|
||||
"* A scoring script to show how to use the model\n",
|
||||
"* An environment file to show what packages need to be installed\n",
|
||||
"* A configuration file to build the ACI\n",
|
||||
"* The model you trained before\n",
|
||||
"\n",
|
||||
"### Create scoring script\n",
|
||||
"\n",
|
||||
"Create the scoring script, called score.py, used by the web service call to show how to use the model.\n",
|
||||
"\n",
|
||||
"You must include two required functions into the scoring script:\n",
|
||||
"* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n",
|
||||
"\n",
|
||||
"* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile score.py\n",
|
||||
"import json\n",
|
||||
"import numpy as np\n",
|
||||
"import os\n",
|
||||
"import pickle\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"def init():\n",
|
||||
" global model\n",
|
||||
" # retreive the path to the model file using the model name\n",
|
||||
" model_path = Model.get_model_path('sklearn_mnist')\n",
|
||||
" model = joblib.load(model_path)\n",
|
||||
"\n",
|
||||
"def run(raw_data):\n",
|
||||
" data = np.array(json.loads(raw_data)['data'])\n",
|
||||
" # make prediction\n",
|
||||
" y_hat = model.predict(data)\n",
|
||||
" # you can return any data type as long as it is JSON-serializable\n",
|
||||
" return y_hat.tolist()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create environment file\n",
|
||||
"\n",
|
||||
"Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"set conda dependencies"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||
"\n",
|
||||
"myenv = CondaDependencies()\n",
|
||||
"myenv.add_conda_package(\"scikit-learn\")\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||
" f.write(myenv.serialize_to_string())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Review the content of the `myenv.yml` file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"myenv.yml\",\"r\") as f:\n",
|
||||
" print(f.read())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create configuration file\n",
|
||||
"\n",
|
||||
"Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure web service",
|
||||
"aci"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||
" memory_gb=1, \n",
|
||||
" tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n",
|
||||
" description='Predict MNIST with sklearn')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deploy in ACI\n",
|
||||
"Estimated time to complete: **about 7-8 minutes**\n",
|
||||
"\n",
|
||||
"Configure the image and deploy. The following code goes through these steps:\n",
|
||||
"\n",
|
||||
"1. Build an image using:\n",
|
||||
" * The scoring file (`score.py`)\n",
|
||||
" * The environment file (`myenv.yml`)\n",
|
||||
" * The model file\n",
|
||||
"1. Register that image under the workspace. \n",
|
||||
"1. Send the image to the ACI container.\n",
|
||||
"1. Start up a container in ACI using the image.\n",
|
||||
"1. Get the web service HTTP endpoint."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure image",
|
||||
"create image",
|
||||
"deploy web service",
|
||||
"aci"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"# configure the image\n",
|
||||
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" conda_file=\"myenv.yml\")\n",
|
||||
"\n",
|
||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||
" name='sklearn-mnist-svc',\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
" models=[model],\n",
|
||||
" image_config=image_config)\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get scoring uri"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(service.scoring_uri)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test deployed service\n",
|
||||
"\n",
|
||||
"Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n",
|
||||
"\n",
|
||||
"The following code goes through these steps:\n",
|
||||
"1. Send the data as a JSON array to the web service hosted in ACI. \n",
|
||||
"\n",
|
||||
"1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n",
|
||||
"\n",
|
||||
"1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n",
|
||||
"\n",
|
||||
" Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"score web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"# find 30 random samples from test set\n",
|
||||
"n = 30\n",
|
||||
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
||||
"\n",
|
||||
"test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
|
||||
"test_samples = bytes(test_samples, encoding='utf8')\n",
|
||||
"\n",
|
||||
"# predict using the deployed model\n",
|
||||
"result = service.run(input_data=test_samples)\n",
|
||||
"\n",
|
||||
"# compare actual value vs. the predicted values:\n",
|
||||
"i = 0\n",
|
||||
"plt.figure(figsize = (20, 1))\n",
|
||||
"\n",
|
||||
"for s in sample_indices:\n",
|
||||
" plt.subplot(1, n, i + 1)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" \n",
|
||||
" # use different color for misclassified sample\n",
|
||||
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
||||
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
||||
" \n",
|
||||
" plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n",
|
||||
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
|
||||
" \n",
|
||||
" i = i + 1\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also send raw HTTP request to test the web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"score web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"# send a random row from the test set to score\n",
|
||||
"random_index = np.random.randint(0, len(X_test)-1)\n",
|
||||
"input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
|
||||
"\n",
|
||||
"headers = {'Content-Type':'application/json'}\n",
|
||||
"\n",
|
||||
"# for AKS deployment you'd need to the service key in the header as well\n",
|
||||
"# api_key = service.get_key()\n",
|
||||
"# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n",
|
||||
"\n",
|
||||
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"POST to url\", service.scoring_uri)\n",
|
||||
"#print(\"input data:\", input_data)\n",
|
||||
"print(\"label:\", y_test[random_index])\n",
|
||||
"print(\"prediction:\", resp.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Clean up resources\n",
|
||||
"\n",
|
||||
"To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"delete web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your testing environment\n",
|
||||
"> * Retrieve the model from your workspace\n",
|
||||
"> * Test the model locally\n",
|
||||
"> * Deploy the model to ACI\n",
|
||||
"> * Test the deployed model\n",
|
||||
" \n",
|
||||
"You can also try out the [Automatic algorithm selection tutorial](03.auto-train-models.ipynb) to see how Azure Machine Learning can auto-select and tune the best algorithm for your model and build that model for you."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "roastala"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
427
tutorials/03.auto-train-models.ipynb
Normal file
427
tutorials/03.auto-train-models.ipynb
Normal file
@@ -0,0 +1,427 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial: Train a classification model with automated machine learning\n",
|
||||
"\n",
|
||||
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
|
||||
"\n",
|
||||
"[flow diagram](./imgs/flow2.png)\n",
|
||||
"\n",
|
||||
"Similar to the [train models tutorial](01.train-models.ipynb), this tutorial classifies handwritten images of digits (0-9) from the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. But this time you don't to specify an algorithm or tune hyperparameters. The automated ML technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n",
|
||||
"\n",
|
||||
"You'll learn how to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train using an automated classifier locally with custom parameters\n",
|
||||
"> * Explore the results\n",
|
||||
"> * Review training results\n",
|
||||
"> * Register the best model\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n",
|
||||
"* Create a workspace and its configuration file (**config.json**) \n",
|
||||
"* Upload your **config.json** to the same folder as this notebook"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Start a notebook\n",
|
||||
"\n",
|
||||
"To follow along, start a new notebook from the same directory as **config.json** and copy the code from the sections below.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Set up your development environment\n",
|
||||
"\n",
|
||||
"All the setup for your development work can be accomplished in the Python notebook. Setup includes:\n",
|
||||
"\n",
|
||||
"* Import Python packages\n",
|
||||
"* Configure a workspace to enable communication between your local computer and remote resources\n",
|
||||
"* Create a directory to store training scripts\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"Import Python packages you need in this tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"import pandas as pd\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.train.automl.run import AutoMLRun\n",
|
||||
"import time\n",
|
||||
"import logging\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"from matplotlib.pyplot import imshow\n",
|
||||
"import random\n",
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure workspace\n",
|
||||
"\n",
|
||||
"Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **aml_config/config.json** and loads the details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.\n",
|
||||
"\n",
|
||||
"Once you have a workspace object, specify a name for the experiment and create and register a local directory with the workspace. The history of all runs is recorded under the specified experiment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"# choose a name for the run history container in the workspace\n",
|
||||
"experiment_name = 'automl-classifier'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './automl-classifier'\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"output = {}\n",
|
||||
"output['SDK version'] = azureml.core.VERSION\n",
|
||||
"output['Subscription ID'] = ws.subscription_id\n",
|
||||
"output['Workspace'] = ws.name\n",
|
||||
"output['Resource Group'] = ws.resource_group\n",
|
||||
"output['Location'] = ws.location\n",
|
||||
"output['Project Directory'] = project_folder\n",
|
||||
"pd.set_option('display.max_colwidth', -1)\n",
|
||||
"pd.DataFrame(data=output, index=['']).T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explore data\n",
|
||||
"\n",
|
||||
"The initial training tutorial used a high-resolution version of the MNIST dataset (28x28 pixels). Since auto training requires many iterations, this tutorial uses a smaller resolution version of the images (8x8 pixels) to demonstrate the concepts while speeding up the time needed for each iteration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import datasets\n",
|
||||
"\n",
|
||||
"digits = datasets.load_digits()\n",
|
||||
"\n",
|
||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||
"X_train = digits.data[100:,:]\n",
|
||||
"y_train = digits.target[100:]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display some sample images\n",
|
||||
"\n",
|
||||
"Load the data into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"count = 0\n",
|
||||
"sample_size = 30\n",
|
||||
"plt.figure(figsize = (16, 6))\n",
|
||||
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||
" count = count + 1\n",
|
||||
" plt.subplot(1, sample_size, count)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" plt.text(x = 2, y = -2, s = y_train[i], fontsize = 18)\n",
|
||||
" plt.imshow(X_train[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have the necessary packages and data ready for auto training for your model. \n",
|
||||
"\n",
|
||||
"## Auto train a model \n",
|
||||
"\n",
|
||||
"To auto train a model, first define settings for autogeneration and tuning and then run the automatic classifier.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### Define settings for autogeneration and tuning\n",
|
||||
"\n",
|
||||
"Define the experiment parameters and models settings for autogeneration and tuning. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"|Property| Value in this tutorial |Description|\n",
|
||||
"|----|----|---|\n",
|
||||
"|**primary_metric**|AUC Weighted | Metric that you want to optimize.|\n",
|
||||
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
|
||||
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
|
||||
"|**n_cross_validations**|3|Number of cross validation splits|\n",
|
||||
"|**exit_score**|0.9985|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
|
||||
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure automl"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.automl import AutoMLConfig\n",
|
||||
"\n",
|
||||
"##Local compute \n",
|
||||
"Automl_config = AutoMLConfig(task = 'classification',\n",
|
||||
" primary_metric = 'AUC_weighted',\n",
|
||||
" max_time_sec = 12000,\n",
|
||||
" iterations = 20,\n",
|
||||
" n_cross_validations = 3,\n",
|
||||
" exit_score = 0.9985,\n",
|
||||
" blacklist_algos = ['kNN','LinearSVM'],\n",
|
||||
" X = X_train,\n",
|
||||
" y = y_train,\n",
|
||||
" path=project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the automatic classifier\n",
|
||||
"\n",
|
||||
"Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"local submitted run",
|
||||
"automl"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"local_run = experiment.submit(Automl_config, show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explore the results\n",
|
||||
"\n",
|
||||
"Explore the results of automatic training with a Jupyter widget or by examining the experiment history.\n",
|
||||
"\n",
|
||||
"### Jupyter widget\n",
|
||||
"\n",
|
||||
"Use the Jupyter notebook widget to see a graph and a table of all results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use notebook widget"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.widgets import RunDetails\n",
|
||||
"RunDetails(local_run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieve all iterations\n",
|
||||
"\n",
|
||||
"View the experiment history and see individual metrics for each iteration run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get metrics",
|
||||
"query history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"children = list(local_run.get_children())\n",
|
||||
"metricslist = {}\n",
|
||||
"for run in children:\n",
|
||||
" properties = run.get_properties()\n",
|
||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||
"\n",
|
||||
"import pandas as pd\n",
|
||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||
"rundata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Register the best model \n",
|
||||
"\n",
|
||||
"Use the `local_run` object to get the best model and register it into the workspace. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"query history",
|
||||
"register model from history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# find the run with the highest accuracy value.\n",
|
||||
"best_run, fitted_model = local_run.get_output()\n",
|
||||
"\n",
|
||||
"# register model in workspace\n",
|
||||
"description = 'Automated Machine Learning Model'\n",
|
||||
"tags = None\n",
|
||||
"local_run.register_model(description=description, tags=tags)\n",
|
||||
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test the best model\n",
|
||||
"\n",
|
||||
"Use the model to predict a few random digits. Display the predicted and the image. Red font and inverse image (white on black) is used to highlight the misclassified samples.\n",
|
||||
"\n",
|
||||
"Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# find 30 random samples from test set\n",
|
||||
"n = 30\n",
|
||||
"X_test = digits.data[:100, :]\n",
|
||||
"y_test = digits.target[:100]\n",
|
||||
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
||||
"test_samples = X_test[sample_indices]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# predict using the model\n",
|
||||
"result = fitted_model.predict(test_samples)\n",
|
||||
"\n",
|
||||
"# compare actual value vs. the predicted values:\n",
|
||||
"i = 0\n",
|
||||
"plt.figure(figsize = (20, 1))\n",
|
||||
"\n",
|
||||
"for s in sample_indices:\n",
|
||||
" plt.subplot(1, n, i + 1)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" \n",
|
||||
" # use different color for misclassified sample\n",
|
||||
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
||||
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
||||
" \n",
|
||||
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
|
||||
" plt.imshow(X_test[s].reshape(8, 8), cmap = clr_map)\n",
|
||||
" \n",
|
||||
" i = i + 1\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train using an automated classifier locally with custom parameters\n",
|
||||
"> * Explore the results\n",
|
||||
"> * Review training results\n",
|
||||
"> * Register the best model\n",
|
||||
"\n",
|
||||
"Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-to-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "jeffshep"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,427 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial: Train a classification model with automated machine learning\n",
|
||||
"\n",
|
||||
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
|
||||
"\n",
|
||||
"[flow diagram](./imgs/flow2.png)\n",
|
||||
"\n",
|
||||
"Similar to the [train models tutorial](01.train-models.ipynb), this tutorial classifies handwritten images of digits (0-9) from the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. But this time you don't to specify an algorithm or tune hyperparameters. The automated ML technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n",
|
||||
"\n",
|
||||
"You'll learn how to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train using an automated classifier locally with custom parameters\n",
|
||||
"> * Explore the results\n",
|
||||
"> * Review training results\n",
|
||||
"> * Register the best model\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n",
|
||||
"* Create a workspace and its configuration file (**config.json**) \n",
|
||||
"* Upload your **config.json** to the same folder as this notebook"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Start a notebook\n",
|
||||
"\n",
|
||||
"To follow along, start a new notebook from the same directory as **config.json** and copy the code from the sections below.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Set up your development environment\n",
|
||||
"\n",
|
||||
"All the setup for your development work can be accomplished in the Python notebook. Setup includes:\n",
|
||||
"\n",
|
||||
"* Import Python packages\n",
|
||||
"* Configure a workspace to enable communication between your local computer and remote resources\n",
|
||||
"* Create a directory to store training scripts\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"Import Python packages you need in this tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"import pandas as pd\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.train.automl.run import AutoMLRun\n",
|
||||
"import time\n",
|
||||
"import logging\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"from matplotlib.pyplot import imshow\n",
|
||||
"import random\n",
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure workspace\n",
|
||||
"\n",
|
||||
"Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **aml_config/config.json** and loads the details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.\n",
|
||||
"\n",
|
||||
"Once you have a workspace object, specify a name for the experiment and create and register a local directory with the workspace. The history of all runs is recorded under the specified experiment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"# choose a name for the run history container in the workspace\n",
|
||||
"experiment_name = 'automl-classifier'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './automl-classifier'\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"output = {}\n",
|
||||
"output['SDK version'] = azureml.core.VERSION\n",
|
||||
"output['Subscription ID'] = ws.subscription_id\n",
|
||||
"output['Workspace'] = ws.name\n",
|
||||
"output['Resource Group'] = ws.resource_group\n",
|
||||
"output['Location'] = ws.location\n",
|
||||
"output['Project Directory'] = project_folder\n",
|
||||
"pd.set_option('display.max_colwidth', -1)\n",
|
||||
"pd.DataFrame(data=output, index=['']).T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explore data\n",
|
||||
"\n",
|
||||
"The initial training tutorial used a high-resolution version of the MNIST dataset (28x28 pixels). Since auto training requires many iterations, this tutorial uses a smaller resolution version of the images (8x8 pixels) to demonstrate the concepts while speeding up the time needed for each iteration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import datasets\n",
|
||||
"\n",
|
||||
"digits = datasets.load_digits()\n",
|
||||
"\n",
|
||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||
"X_train = digits.data[100:,:]\n",
|
||||
"y_train = digits.target[100:]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display some sample images\n",
|
||||
"\n",
|
||||
"Load the data into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"count = 0\n",
|
||||
"sample_size = 30\n",
|
||||
"plt.figure(figsize = (16, 6))\n",
|
||||
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||
" count = count + 1\n",
|
||||
" plt.subplot(1, sample_size, count)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" plt.text(x = 2, y = -2, s = y_train[i], fontsize = 18)\n",
|
||||
" plt.imshow(X_train[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have the necessary packages and data ready for auto training for your model. \n",
|
||||
"\n",
|
||||
"## Auto train a model \n",
|
||||
"\n",
|
||||
"To auto train a model, first define settings for autogeneration and tuning and then run the automatic classifier.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### Define settings for autogeneration and tuning\n",
|
||||
"\n",
|
||||
"Define the experiment parameters and models settings for autogeneration and tuning. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"|Property| Value in this tutorial |Description|\n",
|
||||
"|----|----|---|\n",
|
||||
"|**primary_metric**|AUC Weighted | Metric that you want to optimize.|\n",
|
||||
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
|
||||
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
|
||||
"|**n_cross_validations**|3|Number of cross validation splits|\n",
|
||||
"|**exit_score**|0.9985|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
|
||||
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure automl"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.automl import AutoMLConfig\n",
|
||||
"\n",
|
||||
"##Local compute \n",
|
||||
"Automl_config = AutoMLConfig(task = 'classification',\n",
|
||||
" primary_metric = 'AUC_weighted',\n",
|
||||
" max_time_sec = 12000,\n",
|
||||
" iterations = 20,\n",
|
||||
" n_cross_validations = 3,\n",
|
||||
" exit_score = 0.9985,\n",
|
||||
" blacklist_algos = ['kNN','LinearSVM'],\n",
|
||||
" X = X_train,\n",
|
||||
" y = y_train,\n",
|
||||
" path=project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the automatic classifier\n",
|
||||
"\n",
|
||||
"Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"local submitted run",
|
||||
"automl"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"local_run = experiment.submit(Automl_config, show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explore the results\n",
|
||||
"\n",
|
||||
"Explore the results of automatic training with a Jupyter widget or by examining the experiment history.\n",
|
||||
"\n",
|
||||
"### Jupyter widget\n",
|
||||
"\n",
|
||||
"Use the Jupyter notebook widget to see a graph and a table of all results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use notebook widget"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"RunDetails(local_run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieve all iterations\n",
|
||||
"\n",
|
||||
"View the experiment history and see individual metrics for each iteration run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get metrics",
|
||||
"query history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"children = list(local_run.get_children())\n",
|
||||
"metricslist = {}\n",
|
||||
"for run in children:\n",
|
||||
" properties = run.get_properties()\n",
|
||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||
"\n",
|
||||
"import pandas as pd\n",
|
||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||
"rundata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Register the best model \n",
|
||||
"\n",
|
||||
"Use the `local_run` object to get the best model and register it into the workspace. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"query history",
|
||||
"register model from history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# find the run with the highest accuracy value.\n",
|
||||
"best_run, fitted_model = local_run.get_output()\n",
|
||||
"\n",
|
||||
"# register model in workspace\n",
|
||||
"description = 'Automated Machine Learning Model'\n",
|
||||
"tags = None\n",
|
||||
"local_run.register_model(description=description, tags=tags)\n",
|
||||
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test the best model\n",
|
||||
"\n",
|
||||
"Use the model to predict a few random digits. Display the predicted and the image. Red font and inverse image (white on black) is used to highlight the misclassified samples.\n",
|
||||
"\n",
|
||||
"Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# find 30 random samples from test set\n",
|
||||
"n = 30\n",
|
||||
"X_test = digits.data[:100, :]\n",
|
||||
"y_test = digits.target[:100]\n",
|
||||
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
||||
"test_samples = X_test[sample_indices]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# predict using the model\n",
|
||||
"result = fitted_model.predict(test_samples)\n",
|
||||
"\n",
|
||||
"# compare actual value vs. the predicted values:\n",
|
||||
"i = 0\n",
|
||||
"plt.figure(figsize = (20, 1))\n",
|
||||
"\n",
|
||||
"for s in sample_indices:\n",
|
||||
" plt.subplot(1, n, i + 1)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" \n",
|
||||
" # use different color for misclassified sample\n",
|
||||
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
||||
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
||||
" \n",
|
||||
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
|
||||
" plt.imshow(X_test[s].reshape(8, 8), cmap = clr_map)\n",
|
||||
" \n",
|
||||
" i = i + 1\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train using an automated classifier locally with custom parameters\n",
|
||||
"> * Explore the results\n",
|
||||
"> * Review training results\n",
|
||||
"> * Register the best model\n",
|
||||
"\n",
|
||||
"Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-to-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "jeffshep"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,615 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n",
|
||||
"\n",
|
||||
"This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](01.train-models.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n",
|
||||
"\n",
|
||||
"Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n",
|
||||
"\n",
|
||||
"In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n",
|
||||
"\n",
|
||||
"> * Set up your testing environment\n",
|
||||
"> * Retrieve the model from your workspace\n",
|
||||
"> * Test the model locally\n",
|
||||
"> * Deploy the model to ACI\n",
|
||||
"> * Test the deployed model\n",
|
||||
"\n",
|
||||
"ACI is not ideal for production deployments, but it is great for testing and understanding the workflow. For scalable production deployments, consider using AKS.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](train-models.ipynb) notebook. \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"register model from file"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# If you did NOT complete the tutorial, you can instead run this cell \n",
|
||||
"# This will register a model and download the data needed for this tutorial\n",
|
||||
"# These prerequisites are created in the training tutorial\n",
|
||||
"# Feel free to skip this cell if you completed the training tutorial \n",
|
||||
"\n",
|
||||
"# register a model\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"model_name = \"sklearn_mnist\"\n",
|
||||
"model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n",
|
||||
" model_name=model_name,\n",
|
||||
" tags={\"data\": \"mnist\", \"model\": \"classification\"},\n",
|
||||
" description=\"Mnist handwriting recognition\",\n",
|
||||
" workspace=ws)\n",
|
||||
"\n",
|
||||
"# download test data\n",
|
||||
"import os\n",
|
||||
"import urllib.request\n",
|
||||
"\n",
|
||||
"os.makedirs('./data', exist_ok=True)\n",
|
||||
"\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up the environment\n",
|
||||
"\n",
|
||||
"Start by setting up a testing environment.\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"\n",
|
||||
"Import the Python packages needed for this tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"check version"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib notebook\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
" \n",
|
||||
"import azureml\n",
|
||||
"from azureml.core import Workspace, Run\n",
|
||||
"\n",
|
||||
"# display the core SDK version number\n",
|
||||
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieve the model\n",
|
||||
"\n",
|
||||
"You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"load workspace",
|
||||
"download model"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"model=Model(ws, 'sklearn_mnist')\n",
|
||||
"model.download(target_dir='.', exist_ok=True)\n",
|
||||
"import os \n",
|
||||
"# verify the downloaded model file\n",
|
||||
"os.stat('./sklearn_mnist_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test model locally\n",
|
||||
"\n",
|
||||
"Before deploying, make sure your model is working locally by:\n",
|
||||
"* Loading test data\n",
|
||||
"* Predicting test data\n",
|
||||
"* Examining the confusion matrix\n",
|
||||
"\n",
|
||||
"### Load test data\n",
|
||||
"\n",
|
||||
"Load the test data from the **./data/** directory created during the training tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n",
|
||||
"X_test = load_data('./data/test-images.gz', False) / 255.0\n",
|
||||
"y_test = load_data('./data/test-labels.gz', True).reshape(-1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Predict test data\n",
|
||||
"\n",
|
||||
"Feed the test dataset to the model to get predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pickle\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"\n",
|
||||
"clf = joblib.load('./sklearn_mnist_model.pkl')\n",
|
||||
"y_hat = clf.predict(X_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Examine the confusion matrix\n",
|
||||
"\n",
|
||||
"Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.metrics import confusion_matrix\n",
|
||||
"\n",
|
||||
"conf_mx = confusion_matrix(y_test, y_hat)\n",
|
||||
"print(conf_mx)\n",
|
||||
"print('Overall accuracy:', np.average(y_hat == y_test))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the diagnal cells so that they don't overpower the rest of the cells when visualized\n",
|
||||
"row_sums = conf_mx.sum(axis=1, keepdims=True)\n",
|
||||
"norm_conf_mx = conf_mx / row_sums\n",
|
||||
"np.fill_diagonal(norm_conf_mx, 0)\n",
|
||||
"\n",
|
||||
"fig = plt.figure(figsize=(8,5))\n",
|
||||
"ax = fig.add_subplot(111)\n",
|
||||
"cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n",
|
||||
"ticks = np.arange(0, 10, 1)\n",
|
||||
"ax.set_xticks(ticks)\n",
|
||||
"ax.set_yticks(ticks)\n",
|
||||
"ax.set_xticklabels(ticks)\n",
|
||||
"ax.set_yticklabels(ticks)\n",
|
||||
"fig.colorbar(cax)\n",
|
||||
"plt.ylabel('true labels', fontsize=14)\n",
|
||||
"plt.xlabel('predicted values', fontsize=14)\n",
|
||||
"plt.savefig('conf.png')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy as web service\n",
|
||||
"\n",
|
||||
"Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n",
|
||||
"\n",
|
||||
"To build the correct environment for ACI, provide the following:\n",
|
||||
"* A scoring script to show how to use the model\n",
|
||||
"* An environment file to show what packages need to be installed\n",
|
||||
"* A configuration file to build the ACI\n",
|
||||
"* The model you trained before\n",
|
||||
"\n",
|
||||
"### Create scoring script\n",
|
||||
"\n",
|
||||
"Create the scoring script, called score.py, used by the web service call to show how to use the model.\n",
|
||||
"\n",
|
||||
"You must include two required functions into the scoring script:\n",
|
||||
"* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n",
|
||||
"\n",
|
||||
"* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile score.py\n",
|
||||
"import json\n",
|
||||
"import numpy as np\n",
|
||||
"import os\n",
|
||||
"import pickle\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"def init():\n",
|
||||
" global model\n",
|
||||
" # retreive the path to the model file using the model name\n",
|
||||
" model_path = Model.get_model_path('sklearn_mnist')\n",
|
||||
" model = joblib.load(model_path)\n",
|
||||
"\n",
|
||||
"def run(raw_data):\n",
|
||||
" data = np.array(json.loads(raw_data)['data'])\n",
|
||||
" # make prediction\n",
|
||||
" y_hat = model.predict(data)\n",
|
||||
" # you can return any data type as long as it is JSON-serializable\n",
|
||||
" return y_hat.tolist()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create environment file\n",
|
||||
"\n",
|
||||
"Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"set conda dependencies"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||
"\n",
|
||||
"myenv = CondaDependencies()\n",
|
||||
"myenv.add_conda_package(\"scikit-learn\")\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||
" f.write(myenv.serialize_to_string())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Review the content of the `myenv.yml` file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"myenv.yml\",\"r\") as f:\n",
|
||||
" print(f.read())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create configuration file\n",
|
||||
"\n",
|
||||
"Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure web service",
|
||||
"aci"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||
" memory_gb=1, \n",
|
||||
" tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n",
|
||||
" description='Predict MNIST with sklearn')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deploy in ACI\n",
|
||||
"Estimated time to complete: **about 7-8 minutes**\n",
|
||||
"\n",
|
||||
"Configure the image and deploy. The following code goes through these steps:\n",
|
||||
"\n",
|
||||
"1. Build an image using:\n",
|
||||
" * The scoring file (`score.py`)\n",
|
||||
" * The environment file (`myenv.yml`)\n",
|
||||
" * The model file\n",
|
||||
"1. Register that image under the workspace. \n",
|
||||
"1. Send the image to the ACI container.\n",
|
||||
"1. Start up a container in ACI using the image.\n",
|
||||
"1. Get the web service HTTP endpoint."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure image",
|
||||
"create image",
|
||||
"deploy web service",
|
||||
"aci"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"# configure the image\n",
|
||||
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" conda_file=\"myenv.yml\")\n",
|
||||
"\n",
|
||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||
" name='sklearn-mnist-svc',\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
" models=[model],\n",
|
||||
" image_config=image_config)\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get scoring uri"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(service.scoring_uri)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test deployed service\n",
|
||||
"\n",
|
||||
"Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n",
|
||||
"\n",
|
||||
"The following code goes through these steps:\n",
|
||||
"1. Send the data as a JSON array to the web service hosted in ACI. \n",
|
||||
"\n",
|
||||
"1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n",
|
||||
"\n",
|
||||
"1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n",
|
||||
"\n",
|
||||
" Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"score web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"# find 30 random samples from test set\n",
|
||||
"n = 30\n",
|
||||
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
||||
"\n",
|
||||
"test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
|
||||
"test_samples = bytes(test_samples, encoding='utf8')\n",
|
||||
"\n",
|
||||
"# predict using the deployed model\n",
|
||||
"result = service.run(input_data=test_samples)\n",
|
||||
"\n",
|
||||
"# compare actual value vs. the predicted values:\n",
|
||||
"i = 0\n",
|
||||
"plt.figure(figsize = (20, 1))\n",
|
||||
"\n",
|
||||
"for s in sample_indices:\n",
|
||||
" plt.subplot(1, n, i + 1)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" \n",
|
||||
" # use different color for misclassified sample\n",
|
||||
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
||||
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
||||
" \n",
|
||||
" plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n",
|
||||
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
|
||||
" \n",
|
||||
" i = i + 1\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also send raw HTTP request to test the web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"score web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"# send a random row from the test set to score\n",
|
||||
"random_index = np.random.randint(0, len(X_test)-1)\n",
|
||||
"input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
|
||||
"\n",
|
||||
"headers = {'Content-Type':'application/json'}\n",
|
||||
"\n",
|
||||
"# for AKS deployment you'd need to the service key in the header as well\n",
|
||||
"# api_key = service.get_key()\n",
|
||||
"# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n",
|
||||
"\n",
|
||||
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"POST to url\", service.scoring_uri)\n",
|
||||
"#print(\"input data:\", input_data)\n",
|
||||
"print(\"label:\", y_test[random_index])\n",
|
||||
"print(\"prediction:\", resp.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Clean up resources\n",
|
||||
"\n",
|
||||
"To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"delete web service"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your testing environment\n",
|
||||
"> * Retrieve the model from your workspace\n",
|
||||
"> * Test the model locally\n",
|
||||
"> * Deploy the model to ACI\n",
|
||||
"> * Test the deployed model\n",
|
||||
" \n",
|
||||
"You can also try out the [Automatic algorithm selection tutorial](03.auto-train-models.ipynb) to see how Azure Machine Learning can auto-select and tune the best algorithm for your model and build that model for you."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "roastala"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
BIN
tutorials/imgs/flow2.png
Normal file
BIN
tutorials/imgs/flow2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 104 KiB |
@@ -1,718 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial #1: Train an image classification model with Azure Machine Learning\n",
|
||||
"\n",
|
||||
"In this tutorial, you train a machine learning model both locally and on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook. You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**. \n",
|
||||
"\n",
|
||||
"This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n",
|
||||
"\n",
|
||||
"Learn how to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train a simple logistic regression model locally using the popular scikit-learn machine learning library \n",
|
||||
"> * Train multiple models on a remote cluster\n",
|
||||
"> * Review training results, find and register the best model\n",
|
||||
"\n",
|
||||
"You'll learn how to select a model and deploy it in [part two of this tutorial](deploy-models.ipynb) later. \n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n",
|
||||
"* Create a workspace and its configuration file (**config.json**) \n",
|
||||
"* Save your **config.json** to the same folder as this notebook"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up your development environment\n",
|
||||
"\n",
|
||||
"All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n",
|
||||
"\n",
|
||||
"* Importing Python packages\n",
|
||||
"* Connecting to a workspace to enable communication between your local computer and remote resources\n",
|
||||
"* Creating an experiment to track all your runs\n",
|
||||
"* Creating a remote compute target to use for training\n",
|
||||
"\n",
|
||||
"### Import packages\n",
|
||||
"\n",
|
||||
"Import Python packages you need in this session. Also display the Azure Machine Learning SDK version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"check version"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib notebook\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"import azureml\n",
|
||||
"from azureml.core import Workspace, Run\n",
|
||||
"\n",
|
||||
"# check core SDK version number\n",
|
||||
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to workspace\n",
|
||||
"\n",
|
||||
"Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"load workspace"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load workspace configuration from the config.json file in the current folder.\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create experiment\n",
|
||||
"\n",
|
||||
"Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"create experiment"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'sklearn-mnist'\n",
|
||||
"\n",
|
||||
"from azureml.core import Experiment\n",
|
||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create remote compute target\n",
|
||||
"\n",
|
||||
"Azure Machine Learning Managed Compute(AmlCompute) is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create AmlCompute as your training environment. This code creates compute for you if it does not already exist in your workspace. \n",
|
||||
"\n",
|
||||
" **Creation of the compute takes approximately 5 minutes.** If the compute is already in the workspace this code uses it and skips the creation process."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"create mlc",
|
||||
"batchai"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import AmlCompute\n",
|
||||
"from azureml.core.compute import ComputeTarget\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# choose a name for your cluster\n",
|
||||
"compute_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", \"cpucluster\")\n",
|
||||
"compute_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 0)\n",
|
||||
"compute_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 4)\n",
|
||||
"\n",
|
||||
"# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n",
|
||||
"vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_D2_V2\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if compute_name in ws.compute_targets:\n",
|
||||
" compute_target = ws.compute_targets[compute_name]\n",
|
||||
" if compute_target and type(compute_target) is AmlCompute:\n",
|
||||
" print('found compute target. just use it. ' + compute_name)\n",
|
||||
"else:\n",
|
||||
" print('creating a new compute target...')\n",
|
||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,\n",
|
||||
" min_nodes = compute_min_nodes, \n",
|
||||
" max_nodes = compute_max_nodes)\n",
|
||||
"\n",
|
||||
" # create the cluster\n",
|
||||
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
|
||||
" \n",
|
||||
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
||||
" # if no min node count is provided it will use the scale settings for the cluster\n",
|
||||
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||
" \n",
|
||||
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
|
||||
" print(compute_target.status.serialize())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have the necessary packages and compute resources to train a model in the cloud. \n",
|
||||
"\n",
|
||||
"## Explore data\n",
|
||||
"\n",
|
||||
"Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n",
|
||||
"\n",
|
||||
"* Download the MNIST dataset\n",
|
||||
"* Display some sample images\n",
|
||||
"* Upload data to the cloud\n",
|
||||
"\n",
|
||||
"### Download the MNIST dataset\n",
|
||||
"\n",
|
||||
"Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import urllib.request\n",
|
||||
"\n",
|
||||
"os.makedirs('./data', exist_ok = True)\n",
|
||||
"\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/train-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/train-labels.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display some sample images\n",
|
||||
"\n",
|
||||
"Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make sure utils.py is in the same directory as this code\n",
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
|
||||
"X_train = load_data('./data/train-images.gz', False) / 255.0\n",
|
||||
"y_train = load_data('./data/train-labels.gz', True).reshape(-1)\n",
|
||||
"\n",
|
||||
"X_test = load_data('./data/test-images.gz', False) / 255.0\n",
|
||||
"y_test = load_data('./data/test-labels.gz', True).reshape(-1)\n",
|
||||
"\n",
|
||||
"# now let's show some randomly chosen images from the traininng set.\n",
|
||||
"count = 0\n",
|
||||
"sample_size = 30\n",
|
||||
"plt.figure(figsize = (16, 6))\n",
|
||||
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||
" count = count + 1\n",
|
||||
" plt.subplot(1, sample_size, count)\n",
|
||||
" plt.axhline('')\n",
|
||||
" plt.axvline('')\n",
|
||||
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
|
||||
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you have an idea of what these images look like and the expected prediction outcome.\n",
|
||||
"\n",
|
||||
"### Upload data to the cloud\n",
|
||||
"\n",
|
||||
"Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n",
|
||||
"\n",
|
||||
"The MNIST files are uploaded into a directory named `mnist` at the root of the datastore."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use datastore"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"print(ds.datastore_type, ds.account_name, ds.container_name)\n",
|
||||
"\n",
|
||||
"ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now have everything you need to start training a model. \n",
|
||||
"\n",
|
||||
"## Train a local model\n",
|
||||
"\n",
|
||||
"Train a simple logistic regression model using scikit-learn locally.\n",
|
||||
"\n",
|
||||
"**Training locally can take a minute or two** depending on your computer configuration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"\n",
|
||||
"clf = LogisticRegression()\n",
|
||||
"clf.fit(X_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, make predictions using the test set and calculate the accuracy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_hat = clf.predict(X_test)\n",
|
||||
"print(np.average(y_hat == y_test))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With just a few lines of code, you have a 92% accuracy.\n",
|
||||
"\n",
|
||||
"## Train on a remote cluster\n",
|
||||
"\n",
|
||||
"Now you can expand on this simple model by building a model with a different regularization rate. This time you'll train the model on a remote resource. \n",
|
||||
"\n",
|
||||
"For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n",
|
||||
"* Create a directory\n",
|
||||
"* Create a training script\n",
|
||||
"* Create an estimator object\n",
|
||||
"* Submit the job \n",
|
||||
"\n",
|
||||
"### Create a directory\n",
|
||||
"\n",
|
||||
"Create a directory to deliver the necessary code from your computer to the remote resource."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"script_folder = './sklearn-mnist'\n",
|
||||
"os.makedirs(script_folder, exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a training script\n",
|
||||
"\n",
|
||||
"To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile $script_folder/train.py\n",
|
||||
"\n",
|
||||
"import argparse\n",
|
||||
"import os\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"\n",
|
||||
"from azureml.core import Run\n",
|
||||
"from utils import load_data\n",
|
||||
"\n",
|
||||
"# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model\n",
|
||||
"parser = argparse.ArgumentParser()\n",
|
||||
"parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')\n",
|
||||
"parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n",
|
||||
"args = parser.parse_args()\n",
|
||||
"\n",
|
||||
"data_folder = os.path.join(args.data_folder, 'mnist')\n",
|
||||
"print('Data folder:', data_folder)\n",
|
||||
"\n",
|
||||
"# load train and test set into numpy arrays\n",
|
||||
"# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n",
|
||||
"X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n",
|
||||
"X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n",
|
||||
"y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n",
|
||||
"y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n",
|
||||
"print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n",
|
||||
"\n",
|
||||
"# get hold of the current run\n",
|
||||
"run = Run.get_context()\n",
|
||||
"\n",
|
||||
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n",
|
||||
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n",
|
||||
"clf.fit(X_train, y_train)\n",
|
||||
"\n",
|
||||
"print('Predict the test set')\n",
|
||||
"y_hat = clf.predict(X_test)\n",
|
||||
"\n",
|
||||
"# calculate accuracy on the prediction\n",
|
||||
"acc = np.average(y_hat == y_test)\n",
|
||||
"print('Accuracy is', acc)\n",
|
||||
"\n",
|
||||
"run.log('regularization rate', np.float(args.reg))\n",
|
||||
"run.log('accuracy', np.float(acc))\n",
|
||||
"\n",
|
||||
"os.makedirs('outputs', exist_ok=True)\n",
|
||||
"# note file saved in the outputs folder is automatically uploaded into experiment record\n",
|
||||
"joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice how the script gets data and saves models:\n",
|
||||
"\n",
|
||||
"+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the datastore for this argument:\n",
|
||||
"`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"+ The training script saves your model into a directory named outputs. <br/>\n",
|
||||
"`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`<br/>\n",
|
||||
"Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"shutil.copy('utils.py', script_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an estimator\n",
|
||||
"\n",
|
||||
"An estimator object is used to submit the run. Create your estimator by running the following code to define:\n",
|
||||
"\n",
|
||||
"* The name of the estimator object, `est`\n",
|
||||
"* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
|
||||
"* The compute target. In this case you will use the Batch AI cluster you created\n",
|
||||
"* The training script name, train.py\n",
|
||||
"* Parameters required from the training script \n",
|
||||
"* Python packages needed for training\n",
|
||||
"\n",
|
||||
"In this tutorial, this target is the Batch AI cluster. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"configure estimator"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.estimator import Estimator\n",
|
||||
"\n",
|
||||
"script_params = {\n",
|
||||
" '--data-folder': ds.as_mount(),\n",
|
||||
" '--regularization': 0.8\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"est = Estimator(source_directory=script_folder,\n",
|
||||
" script_params=script_params,\n",
|
||||
" compute_target=compute_target,\n",
|
||||
" entry_script='train.py',\n",
|
||||
" conda_packages=['scikit-learn'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit the job to the cluster\n",
|
||||
"\n",
|
||||
"Run the experiment by submitting the estimator object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"remote run",
|
||||
"batchai",
|
||||
"scikit-learn"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run = exp.submit(config=est)\n",
|
||||
"run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n",
|
||||
"\n",
|
||||
"## Monitor a remote run\n",
|
||||
"\n",
|
||||
"In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n",
|
||||
"\n",
|
||||
"Here is what's happening while you wait:\n",
|
||||
"\n",
|
||||
"- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n",
|
||||
"\n",
|
||||
" This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n",
|
||||
"\n",
|
||||
"- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n",
|
||||
"\n",
|
||||
"### Jupyter widget\n",
|
||||
"\n",
|
||||
"Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"use notebook widget"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"RunDetails(run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Get log results upon completion\n",
|
||||
"\n",
|
||||
"Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"remote run",
|
||||
"batchai",
|
||||
"scikit-learn"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Display run results\n",
|
||||
"\n",
|
||||
"You now have a model trained on a remote cluster. Retrieve the accuracy of the model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"get metrics"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(run.get_metrics())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next tutorial you will explore this model in more detail.\n",
|
||||
"\n",
|
||||
"## Register model\n",
|
||||
"\n",
|
||||
"The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n",
|
||||
"\n",
|
||||
"You can see files associated with that run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"query history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(run.get_file_names())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"register model from history"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# register model \n",
|
||||
"model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n",
|
||||
"print(model.name, model.id, model.version, sep = '\\t')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
||||
"\n",
|
||||
"> * Set up your development environment\n",
|
||||
"> * Access and examine the data\n",
|
||||
"> * Train a simple logistic regression locally using the popular scikit-learn machine learning library\n",
|
||||
"> * Train multiple models on a remote cluster\n",
|
||||
"> * Review training details and register the best model\n",
|
||||
"\n",
|
||||
"You are ready to deploy this registered model using the instructions in the next part of the tutorial series:\n",
|
||||
"\n",
|
||||
"> [Tutorial 2 - Deploy models](02.deploy-models.ipynb)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "roastala"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.2"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
Reference in New Issue
Block a user