From 231c1062a8ee6648411eab988ae9ffbb78d94987 Mon Sep 17 00:00:00 2001 From: Roope Astala Date: Mon, 1 Oct 2018 13:45:50 -0400 Subject: [PATCH 1/2] update notebooks for new version --- .../01.train-within-notebook.ipynb | 1620 ++++----- .../02.train-on-local/02.train-on-local.ipynb | 926 ++--- .../03.train-on-aci/03.train-on-aci.ipynb | 564 +-- .../04.train-on-remote-vm.ipynb | 1224 +++---- .../05.train-in-spark/05.train-in-spark.ipynb | 643 ++-- ...er-model-create-image-deploy-service.ipynb | 838 ++--- .../11.production-deploy-to-aks.ipynb | 667 ++-- ...le-data-collection-for-models-in-aks.ipynb | 5 +- automl/00.configuration.ipynb | 526 +-- automl/01.auto-ml-classification.ipynb | 802 ++-- automl/02.auto-ml-regression.ipynb | 814 ++--- automl/03.auto-ml-remote-execution.ipynb | 947 ++--- automl/03b.auto-ml-remote-batchai.ipynb | 1043 +++--- ...emote-execution-text-data-blob-store.ipynb | 979 +++-- ...ing-data-Blacklist-Early-Termination.ipynb | 787 ++-- ....auto-ml-sparse-data-custom-cv-split.ipynb | 832 ++--- .../07.auto-ml-exploring-previous-runs.ipynb | 649 ++-- ...ote-execution-with-text-file-on-DSVM.ipynb | 954 +++-- ...to-ml-classification-with-deployment.ipynb | 990 +++-- automl/10.auto-ml-multi-output-example.ipynb | 575 ++- automl/11.auto-ml-sample-weight.ipynb | 493 ++- ...l-retrieve-the-training-sdk-versions.ipynb | 478 +-- automl/13.auto-ml-dataprep.ipynb | 1120 +++--- automl/README.md | 118 +- automl/automl_env.yml | 2 +- automl/automl_setup.cmd | 3 +- automl/automl_setup_linux.sh | 3 +- automl/automl_setup_mac.sh | 3 +- onnx/README.md | 23 - onnx/onnx-inference-emotion-recognition.ipynb | 1533 ++++---- onnx/onnx-inference-mnist.ipynb | 1561 ++++---- pipeline/00.pipeline-setup.ipynb | 148 +- pipeline/pipeline-batch-scoring.ipynb | 1240 +++---- .../project-brainwave-custom-weights.ipynb | 1230 +++---- .../project-brainwave-quickstart.ipynb | 614 ++-- .../project-brainwave-transfer-learning.ipynb | 1130 +++--- ...erparameter-tune-deploy-with-pytorch.ipynb | 1516 ++++---- .../02.distributed-pytorch-with-horovod.ipynb | 574 +-- ...arameter-tune-deploy-with-tensorflow.ipynb | 3240 ++++++++--------- ....distributed-tensorflow-with-horovod.ipynb | 716 ++-- ...ted-tensorflow-with-parameter-server.ipynb | 568 +-- ....distributed-cntk-with-custom-docker.ipynb | 724 ++-- training/07.tensorboard/07.tensorboard.ipynb | 1004 ++--- ...08.export-run-history-to-tensorboard.ipynb | 482 +-- tutorials/01.train-models.ipynb | 1416 +++---- tutorials/02.deploy-models.ipynb | 1216 +++---- tutorials/03.auto-train-models.ipynb | 821 +++-- 47 files changed, 19121 insertions(+), 19240 deletions(-) delete mode 100644 onnx/README.md diff --git a/01.getting-started/01.train-within-notebook/01.train-within-notebook.ipynb b/01.getting-started/01.train-within-notebook/01.train-within-notebook.ipynb index c74ddc06..42aa1991 100644 --- a/01.getting-started/01.train-within-notebook/01.train-within-notebook.ipynb +++ b/01.getting-started/01.train-within-notebook/01.train-within-notebook.ipynb @@ -1,812 +1,812 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 01. Train in the Notebook & Deploy Model to ACI\n", + "\n", + "* Load workspace\n", + "* Train a simple regression model directly in the Notebook python kernel\n", + "* Record run history\n", + "* Find the best model in run history and download it.\n", + "* Deploy the model as an Azure Container Instance (ACI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "1. Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't. \n", + "\n", + "2. Install following pre-requisite libraries to your conda environment and restart notebook.\n", + "```shell\n", + "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", + "```\n", + "\n", + "3. Check that ACI is registered for your Azure Subscription. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!az provider show -n Microsoft.ContainerInstance -o table" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If ACI is not registered, run following command to register it. Note that you have to be a subscription owner, or this command will fail." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!az provider register -n Microsoft.ContainerInstance" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Validate Azure ML SDK installation and get version number for debugging purposes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "install" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment, Run, Workspace\n", + "import azureml.core\n", + "\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set experiment name\n", + "Choose a name for experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'train-in-notebook'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start a training run in local Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# load diabetes dataset, a well-known small dataset that comes with scikit-learn\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.externals import joblib\n", + "\n", + "X, y = load_diabetes(return_X_y = True)\n", + "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", + "data = {\n", + " \"train\":{\"X\": X_train, \"y\": y_train}, \n", + " \"test\":{\"X\": X_test, \"y\": y_test}\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train a simple Ridge model\n", + "Train a very simple Ridge regression model in scikit-learn, and save it as a pickle file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "reg = Ridge(alpha = 0.03)\n", + "reg.fit(X=data['train']['X'], y=data['train']['y'])\n", + "preds = reg.predict(data['test']['X'])\n", + "print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n", + "joblib.dump(value=reg, filename='model.pkl');" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add experiment tracking\n", + "Now, let's add Azure ML experiment logging, and upload persisted model into run record as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "local run", + "outputs upload" + ] + }, + "outputs": [], + "source": [ + "experiment = Experiment(workspace=ws, name=experiment_name)\n", + "run = experiment.start_logging()\n", + "\n", + "run.tag(\"Description\",\"My first run!\")\n", + "run.log('alpha', 0.03)\n", + "reg = Ridge(alpha=0.03)\n", + "reg.fit(data['train']['X'], data['train']['y'])\n", + "preds = reg.predict(data['test']['X'])\n", + "run.log('mse', mean_squared_error(data['test']['y'], preds))\n", + "joblib.dump(value=reg, filename='model.pkl')\n", + "run.upload_file(name='outputs/model.pkl', path_or_stream='./model.pkl')\n", + "\n", + "run.complete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can browse to the recorded run. Please make sure you use Chrome to navigate the run history page." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Simple parameter sweep\n", + "Sweep over alpha values of a sklearn ridge model, and capture metrics and trained model in the Azure ML experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import os\n", + "from tqdm import tqdm\n", + "\n", + "model_name = \"model.pkl\"\n", + "\n", + "# list of numbers from 0 to 1.0 with a 0.05 interval\n", + "alphas = np.arange(0.0, 1.0, 0.05)\n", + "\n", + "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", + "for alpha in tqdm(alphas):\n", + " # create a bunch of runs, each train a model with a different alpha value\n", + " with experiment.start_logging() as run:\n", + " # Use Ridge algorithm to build a regression model\n", + " reg = Ridge(alpha=alpha)\n", + " reg.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n", + " preds = reg.predict(X=data[\"test\"][\"X\"])\n", + " mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n", + "\n", + " # log alpha, mean_squared_error and feature names in run history\n", + " run.log(name=\"alpha\", value=alpha)\n", + " run.log(name=\"mse\", value=mse)\n", + " run.log_list(name=\"columns\", value=columns)\n", + "\n", + " with open(model_name, \"wb\") as file:\n", + " joblib.dump(value=reg, filename=file)\n", + " \n", + " # upload the serialized model into run history record\n", + " run.upload_file(name=\"outputs/\" + model_name, path_or_stream=model_name)\n", + "\n", + " # now delete the serialized model from local folder since it is already uploaded to run history \n", + " os.remove(path=model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# now let's take a look at the experiment in Azure portal.\n", + "experiment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select best model from the experiment\n", + "Load all experiment run metrics recursively from the experiment into a dictionary object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "runs = {}\n", + "run_metrics = {}\n", + "\n", + "for r in tqdm(experiment.get_runs()):\n", + " metrics = r.get_metrics()\n", + " if 'mse' in metrics.keys():\n", + " runs[r.id] = r\n", + " run_metrics[r.id] = metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now find the run with the lowest Mean Squared Error value" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n", + "best_run = runs[best_run_id]\n", + "print('Best run is:', best_run_id)\n", + "print('Metrics:', run_metrics[best_run_id])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags to your runs to make them easier to catalog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "best_run.tag(key=\"Description\", value=\"The best one\")\n", + "best_run.get_tags()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot MSE over alpha\n", + "\n", + "Let's observe the best model visually by plotting the MSE values over alpha values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "best_alpha = run_metrics[best_run_id]['alpha']\n", + "min_mse = run_metrics[best_run_id]['mse']\n", + "\n", + "alpha_mse = np.array([(run_metrics[k]['alpha'], run_metrics[k]['mse']) for k in run_metrics.keys()])\n", + "sorted_alpha_mse = alpha_mse[alpha_mse[:,0].argsort()]\n", + "\n", + "plt.plot(sorted_alpha_mse[:,0], sorted_alpha_mse[:,1], 'r--')\n", + "plt.plot(sorted_alpha_mse[:,0], sorted_alpha_mse[:,1], 'bo')\n", + "\n", + "plt.xlabel('alpha', fontsize = 14)\n", + "plt.ylabel('mean squared error', fontsize = 14)\n", + "plt.title('MSE over alpha', fontsize = 16)\n", + "\n", + "# plot arrow\n", + "plt.arrow(x = best_alpha, y = min_mse + 39, dx = 0, dy = -26, ls = '-', lw = 0.4,\n", + " width = 0, head_width = .03, head_length = 8)\n", + "\n", + "# plot \"best run\" text\n", + "plt.text(x = best_alpha - 0.08, y = min_mse + 50, s = 'Best Run', fontsize = 14)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the best model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Find the model file saved in the run record of best run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "for f in best_run.get_file_names():\n", + " print(f)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can register this model in the model registry of the workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from history" + ] + }, + "outputs": [], + "source": [ + "model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Verify that the model has been registered properly. If you have done this several times you'd see the version number auto-increases each time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from history" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "models = Model.list(workspace=ws, name='best_model')\n", + "for m in models:\n", + " print(m.name, m.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also download the registered model. Afterwards, you should see a `model.pkl` file in the current directory. You can then use it for local testing if you'd like." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "download file" + ] + }, + "outputs": [], + "source": [ + "# remove the model file if it is already on disk\n", + "if os.path.isfile('model.pkl'): \n", + " os.remove('model.pkl')\n", + "# download the model\n", + "model.download(target_dir=\"./\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Scoring script\n", + "\n", + "Now we are ready to build a Docker image and deploy the model in it as a web service. The first step is creating the scoring script. For convenience, we have created the scoring script for you. It is printed below as text, but you can also run `%pfile ./score.py` in a cell to show the file.\n", + "\n", + "Tbe scoring script consists of two functions: `init` that is used to load the model to memory when starting the container, and `run` that makes the prediction when web service is called. Please pay special attention to how the model is loaded in the `init()` function. When Docker image is built for this model, the actual model file is downloaded and placed on disk, and `get_model_path` function returns the local path where the model is placed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./score.py', 'r') as scoring_script:\n", + " print(scoring_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create environment dependency file\n", + "\n", + "We need a environment dependency file `myenv.yml` to specify which libraries are needed by the scoring script when building the Docker image for web service deployment. We can manually create this file, or we can use the `CondaDependencies` API to automatically create this file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies()\n", + "myenv.add_conda_package(\"scikit-learn\")\n", + "myenv.add_pip_package(\"pynacl==1.2.1\")\n", + "print(myenv.serialize_to_string())\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy web service into an Azure Container Instance\n", + "The deployment process takes the registered model and your scoring scrip, and builds a Docker image. It then deploys the Docker image into Azure Container Instance as a running container with an HTTP endpoint readying for scoring calls. Read more about [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/).\n", + "\n", + "Note ACI is great for quick and cost-effective dev/test deployment scenarios. For production workloads, please use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Please follow in struction in [this notebook](11.production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n", + " \n", + "** Note: ** The web service creation can take 6-7 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice, Webservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'sample name': 'AML 101'}, \n", + " description='This is a great example.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note the below `WebService.deploy_from_model()` function takes a model object registered under the workspace. It then bakes the model file in the Docker image so it can be looked-up using the `Model.get_model_path()` function in `score.py`. \n", + "\n", + "If you have a local model file instead of a registered model object, you can also use the `WebService.deploy()` function which would register the model and then deploy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "# this will take 5-10 minutes to finish\n", + "# you can also use \"az container list\" command to find the ACI being deployed\n", + "service = Webservice.deploy_from_model(name='my-aci-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=image_config,\n", + " workspace=ws)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Test web service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "print('web service is hosted in ACI:', service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use the `run` API to call the web service with one row of data to get a prediction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "# score the first row from the test set.\n", + "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", + "service.run(input_data = test_samples)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feed the entire test set and calculate the errors (residual values)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "# score the entire test set.\n", + "test_samples = json.dumps({'data': X_test.tolist()})\n", + "\n", + "result = json.loads(service.run(input_data = test_samples))['result']\n", + "residual = result - y_test" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also send raw HTTP request to test the web service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "\n", + "# 2 rows of input data, each with 10 made-up numerical features\n", + "input_data = \"{\\\"data\\\": [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\"\n", + "\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "# for AKS deployment you'd need to the service key in the header as well\n", + "# api_key = service.get_key()\n", + "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers = headers)\n", + "print(resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Residual graph\n", + "Plot a residual value graph to chart the errors on the entire test set. Observe the nice bell curve." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n", + "f.suptitle('Residual Values', fontsize = 18)\n", + "\n", + "f.set_figheight(6)\n", + "f.set_figwidth(14)\n", + "\n", + "a0.plot(residual, 'bo', alpha=0.4);\n", + "a0.plot([0,90], [0,0], 'r', lw=2)\n", + "a0.set_ylabel('residue values', fontsize=14)\n", + "a0.set_xlabel('test data set', fontsize=14)\n", + "\n", + "a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step');\n", + "a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10);\n", + "a1.set_yticklabels([])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Delete ACI to clean up" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Deleting ACI is super fast!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "service.delete()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 01. Train in the Notebook & Deploy Model to ACI\n", - "\n", - "* Load workspace\n", - "* Train a simple regression model directly in the Notebook python kernel\n", - "* Record run history\n", - "* Find the best model in run history and download it.\n", - "* Deploy the model as an Azure Container Instance (ACI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "1. Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't. \n", - "\n", - "2. Install following pre-requisite libraries to your conda environment and restart notebook.\n", - "```shell\n", - "(myenv) $ conda install -y matplotlib tqdm scikit-learn\n", - "```\n", - "\n", - "3. Check that ACI is registered for your Azure Subscription. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!az provider show -n Microsoft.ContainerInstance -o table" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If ACI is not registered, run following command to register it. Note that you have to be a subscription owner, or this command will fail." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!az provider register -n Microsoft.ContainerInstance" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Validate Azure ML SDK installation and get version number for debugging purposes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "install" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment, Run, Workspace\n", - "import azureml.core\n", - "\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name\n", - "Choose a name for experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'train-in-notebook'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start a training run in local Notebook" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load diabetes dataset, a well-known small dataset that comes with scikit-learn\n", - "from sklearn.datasets import load_diabetes\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.metrics import mean_squared_error\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.externals import joblib\n", - "\n", - "X, y = load_diabetes(return_X_y = True)\n", - "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", - "data = {\n", - " \"train\":{\"X\": X_train, \"y\": y_train}, \n", - " \"test\":{\"X\": X_test, \"y\": y_test}\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train a simple Ridge model\n", - "Train a very simple Ridge regression model in scikit-learn, and save it as a pickle file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "reg = Ridge(alpha = 0.03)\n", - "reg.fit(X=data['train']['X'], y=data['train']['y'])\n", - "preds = reg.predict(data['test']['X'])\n", - "print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n", - "joblib.dump(value=reg, filename='model.pkl');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Add experiment tracking\n", - "Now, let's add Azure ML experiment logging, and upload persisted model into run record as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "local run", - "outputs upload" - ] - }, - "outputs": [], - "source": [ - "experiment = Experiment(workspace=ws, name=experiment_name)\n", - "run = experiment.start_logging()\n", - "\n", - "run.tag(\"Description\",\"My first run!\")\n", - "run.log('alpha', 0.03)\n", - "reg = Ridge(alpha=0.03)\n", - "reg.fit(data['train']['X'], data['train']['y'])\n", - "preds = reg.predict(data['test']['X'])\n", - "run.log('mse', mean_squared_error(data['test']['y'], preds))\n", - "joblib.dump(value=reg, filename='model.pkl')\n", - "run.upload_file(name='outputs/model.pkl', path_or_stream='./model.pkl')\n", - "\n", - "run.complete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can browse to the recorded run. Please make sure you use Chrome to navigate the run history page." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Simple parameter sweep\n", - "Sweep over alpha values of a sklearn ridge model, and capture metrics and trained model in the Azure ML experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import os\n", - "from tqdm import tqdm\n", - "\n", - "model_name = \"model.pkl\"\n", - "\n", - "# list of numbers from 0 to 1.0 with a 0.05 interval\n", - "alphas = np.arange(0.0, 1.0, 0.05)\n", - "\n", - "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", - "for alpha in tqdm(alphas):\n", - " # create a bunch of runs, each train a model with a different alpha value\n", - " with experiment.start_logging() as run:\n", - " # Use Ridge algorithm to build a regression model\n", - " reg = Ridge(alpha=alpha)\n", - " reg.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n", - " preds = reg.predict(X=data[\"test\"][\"X\"])\n", - " mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n", - "\n", - " # log alpha, mean_squared_error and feature names in run history\n", - " run.log(name=\"alpha\", value=alpha)\n", - " run.log(name=\"mse\", value=mse)\n", - " run.log_list(name=\"columns\", value=columns)\n", - "\n", - " with open(model_name, \"wb\") as file:\n", - " joblib.dump(value=reg, filename=file)\n", - " \n", - " # upload the serialized model into run history record\n", - " run.upload_file(name=\"outputs/\" + model_name, path_or_stream=model_name)\n", - "\n", - " # now delete the serialized model from local folder since it is already uploaded to run history \n", - " os.remove(path=model_name)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# now let's take a look at the experiment in Azure portal.\n", - "experiment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Select best model from the experiment\n", - "Load all experiment run metrics recursively from the experiment into a dictionary object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "runs = {}\n", - "run_metrics = {}\n", - "\n", - "for r in tqdm(experiment.get_runs()):\n", - " metrics = r.get_metrics()\n", - " if 'mse' in metrics.keys():\n", - " runs[r.id] = r\n", - " run_metrics[r.id] = metrics" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now find the run with the lowest Mean Squared Error value" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n", - "best_run = runs[best_run_id]\n", - "print('Best run is:', best_run_id)\n", - "print('Metrics:', run_metrics[best_run_id])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can add tags to your runs to make them easier to catalog" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history" - ] - }, - "outputs": [], - "source": [ - "best_run.tag(key=\"Description\", value=\"The best one\")\n", - "best_run.get_tags()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Plot MSE over alpha\n", - "\n", - "Let's observe the best model visually by plotting the MSE values over alpha values:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt\n", - "\n", - "best_alpha = run_metrics[best_run_id]['alpha']\n", - "min_mse = run_metrics[best_run_id]['mse']\n", - "\n", - "alpha_mse = np.array([(run_metrics[k]['alpha'], run_metrics[k]['mse']) for k in run_metrics.keys()])\n", - "sorted_alpha_mse = alpha_mse[alpha_mse[:,0].argsort()]\n", - "\n", - "plt.plot(sorted_alpha_mse[:,0], sorted_alpha_mse[:,1], 'r--')\n", - "plt.plot(sorted_alpha_mse[:,0], sorted_alpha_mse[:,1], 'bo')\n", - "\n", - "plt.xlabel('alpha', fontsize = 14)\n", - "plt.ylabel('mean squared error', fontsize = 14)\n", - "plt.title('MSE over alpha', fontsize = 16)\n", - "\n", - "# plot arrow\n", - "plt.arrow(x = best_alpha, y = min_mse + 39, dx = 0, dy = -26, ls = '-', lw = 0.4,\n", - " width = 0, head_width = .03, head_length = 8)\n", - "\n", - "# plot \"best run\" text\n", - "plt.text(x = best_alpha - 0.08, y = min_mse + 50, s = 'Best Run', fontsize = 14)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register the best model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Find the model file saved in the run record of best run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history" - ] - }, - "outputs": [], - "source": [ - "for f in best_run.get_file_names():\n", - " print(f)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can register this model in the model registry of the workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from history" - ] - }, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Verify that the model has been registered properly. If you have done this several times you'd see the version number auto-increases each time." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from history" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "models = Model.list(workspace=ws, name='best_model')\n", - "for m in models:\n", - " print(m.name, m.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also download the registered model. Afterwards, you should see a `model.pkl` file in the current directory. You can then use it for local testing if you'd like." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "download file" - ] - }, - "outputs": [], - "source": [ - "# remove the model file if it is already on disk\n", - "if os.path.isfile('model.pkl'): \n", - " os.remove('model.pkl')\n", - "# download the model\n", - "model.download(target_dir=\"./\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Scoring script\n", - "\n", - "Now we are ready to build a Docker image and deploy the model in it as a web service. The first step is creating the scoring script. For convenience, we have created the scoring script for you. It is printed below as text, but you can also run `%pfile ./score.py` in a cell to show the file.\n", - "\n", - "Tbe scoring script consists of two functions: `init` that is used to load the model to memory when starting the container, and `run` that makes the prediction when web service is called. Please pay special attention to how the model is loaded in the `init()` function. When Docker image is built for this model, the actual model file is downloaded and placed on disk, and `get_model_path` function returns the local path where the model is placed." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./score.py', 'r') as scoring_script:\n", - " print(scoring_script.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create environment dependency file\n", - "\n", - "We need a environment dependency file `myenv.yml` to specify which libraries are needed by the scoring script when building the Docker image for web service deployment. We can manually create this file, or we can use the `CondaDependencies` API to automatically create this file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies()\n", - "myenv.add_conda_package(\"scikit-learn\")\n", - "myenv.add_pip_package(\"pynacl==1.2.1\")\n", - "print(myenv.serialize_to_string())\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy web service into an Azure Container Instance\n", - "The deployment process takes the registered model and your scoring scrip, and builds a Docker image. It then deploys the Docker image into Azure Container Instance as a running container with an HTTP endpoint readying for scoring calls. Read more about [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/).\n", - "\n", - "Note ACI is great for quick and cost-effective dev/test deployment scenarios. For production workloads, please use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Please follow in struction in [this notebook](11.production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n", - " \n", - "** Note: ** The web service creation can take 6-7 minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice, Webservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'sample name': 'AML 101'}, \n", - " description='This is a great example.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note the below `WebService.deploy_from_model()` function takes a model object registered under the workspace. It then bakes the model file in the Docker image so it can be looked-up using the `Model.get_model_path()` function in `score.py`. \n", - "\n", - "If you have a local model file instead of a registered model object, you can also use the `WebService.deploy()` function which would register the model and then deploy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "# this will take 5-10 minutes to finish\n", - "# you can also use \"az container list\" command to find the ACI being deployed\n", - "service = Webservice.deploy_from_model(name='my-aci-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=image_config,\n", - " workspace=ws)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Test web service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "print('web service is hosted in ACI:', service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the `run` API to call the web service with one row of data to get a prediction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "# score the first row from the test set.\n", - "test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n", - "service.run(input_data = test_samples)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feed the entire test set and calculate the errors (residual values)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "# score the entire test set.\n", - "test_samples = json.dumps({'data': X_test.tolist()})\n", - "\n", - "result = json.loads(service.run(input_data = test_samples))['result']\n", - "residual = result - y_test" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also send raw HTTP request to test the web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import requests\n", - "import json\n", - "\n", - "# 2 rows of input data, each with 10 made-up numerical features\n", - "input_data = \"{\\\"data\\\": [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "# for AKS deployment you'd need to the service key in the header as well\n", - "# api_key = service.get_key()\n", - "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers = headers)\n", - "print(resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Residual graph\n", - "Plot a residual value graph to chart the errors on the entire test set. Observe the nice bell curve." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n", - "f.suptitle('Residual Values', fontsize = 18)\n", - "\n", - "f.set_figheight(6)\n", - "f.set_figwidth(14)\n", - "\n", - "a0.plot(residual, 'bo', alpha=0.4);\n", - "a0.plot([0,90], [0,0], 'r', lw=2)\n", - "a0.set_ylabel('residue values', fontsize=14)\n", - "a0.set_xlabel('test data set', fontsize=14)\n", - "\n", - "a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step');\n", - "a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10);\n", - "a1.set_yticklabels([])\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Delete ACI to clean up" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Deleting ACI is super fast!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "service.delete()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/02.train-on-local/02.train-on-local.ipynb b/01.getting-started/02.train-on-local/02.train-on-local.ipynb index 8597c74f..ee5ea76f 100644 --- a/01.getting-started/02.train-on-local/02.train-on-local.ipynb +++ b/01.getting-started/02.train-on-local/02.train-on-local.ipynb @@ -1,465 +1,465 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 02. Train locally\n", + "* Create or load workspace.\n", + "* Create scripts locally.\n", + "* Create `train.py` in a folder, along with a `my.lib` file.\n", + "* Configure & execute a local run in a user-managed Python environment.\n", + "* Configure & execute a local run in a system-managed Python environment.\n", + "* Configure & execute a local run in a Docker environment.\n", + "* Query run metrics to find the best model\n", + "* Register model for operationalization." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create An Experiment\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "experiment_name = 'train-on-local'\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## View `train.py`\n", + "\n", + "`train.py` is already created for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./train.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note `train.py` also references a `mylib.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./mylib.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run\n", + "### User-managed environment\n", + "Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "\n", + "# Editing a run configuration property on-fly.\n", + "run_config_user_managed = RunConfiguration()\n", + "\n", + "run_config_user_managed.environment.python.user_managed_dependencies = True\n", + "\n", + "# You can choose a specific Python environment by pointing to a Python path \n", + "#run_config.environment.python.interpreter_path = '/home/johndoe/miniconda3/envs/sdk2/bin/python'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Submit script to run in the user-managed environment\n", + "Note whole script folder is submitted for execution, including the `mylib.py` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)\n", + "run = exp.submit(src)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Get run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Block to wait till run finishes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### System-managed environment\n", + "You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "run_config_system_managed = RunConfiguration()\n", + "\n", + "run_config_system_managed.environment.python.user_managed_dependencies = False\n", + "run_config_system_managed.prepare_environment = True\n", + "\n", + "# Specify conda dependencies with scikit-learn\n", + "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", + "run_config_system_managed.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Submit script to run in the system-managed environment\n", + "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_system_managed)\n", + "run = exp.submit(src)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Get run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Block and wait till run finishes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Docker-based execution\n", + "**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n", + "\n", + "You can also ask the system to pull down a Docker image and execute your scripts in it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "run_config_docker = RunConfiguration()\n", + "\n", + "run_config_docker.environment.python.user_managed_dependencies = False\n", + "run_config_docker.prepare_environment = True\n", + "run_config_docker.environment.docker.enabled = True\n", + "run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", + "\n", + "# Specify conda dependencies with scikit-learn\n", + "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", + "run_config_docker.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Submit script to run in the system-managed environment\n", + "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)\n", + "run = exp.submit(src)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Get run history details\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Query run metrics" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history", + "get metrics" + ] + }, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "run.get_metrics()\n", + "metrics = run.get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's find the model that has the lowest MSE value logged." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n", + "\n", + "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", + " min(metrics['mse']), \n", + " best_alpha\n", + "))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also list all the files that are associated with this run record" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We know the model `ridge_0.40.pkl` is the best performing model from the eariler queries. So let's register it with the workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# supply a model name, and the full path to the serialized model file.\n", + "model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(model.name, model.version, model.url)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you can deploy this model following the example in the 01 notebook." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 02. Train locally\n", - "* Create or load workspace.\n", - "* Create scripts locally.\n", - "* Create `train.py` in a folder, along with a `my.lib` file.\n", - "* Configure & execute a local run in a user-managed Python environment.\n", - "* Configure & execute a local run in a system-managed Python environment.\n", - "* Configure & execute a local run in a Docker environment.\n", - "* Query run metrics to find the best model\n", - "* Register model for operationalization." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create An Experiment\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "experiment_name = 'train-on-local'\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View `train.py`\n", - "\n", - "`train.py` is already created for you." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./train.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note `train.py` also references a `mylib.py` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./mylib.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run\n", - "### User-managed environment\n", - "Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "\n", - "# Editing a run configuration property on-fly.\n", - "run_config_user_managed = RunConfiguration()\n", - "\n", - "run_config_user_managed.environment.python.user_managed_dependencies = True\n", - "\n", - "# You can choose a specific Python environment by pointing to a Python path \n", - "#run_config.environment.python.interpreter_path = '/home/johndoe/miniconda3/envs/sdk2/bin/python'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Submit script to run in the user-managed environment\n", - "Note whole script folder is submitted for execution, including the `mylib.py` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)\n", - "run = exp.submit(src)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Get run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Block to wait till run finishes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### System-managed environment\n", - "You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "run_config_system_managed = RunConfiguration()\n", - "\n", - "run_config_system_managed.environment.python.user_managed_dependencies = False\n", - "run_config_system_managed.prepare_environment = True\n", - "\n", - "# Specify conda dependencies with scikit-learn\n", - "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", - "run_config_system_managed.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Submit script to run in the system-managed environment\n", - "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_system_managed)\n", - "run = exp.submit(src)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Get run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Block and wait till run finishes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Docker-based execution\n", - "**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n", - "\n", - "You can also ask the system to pull down a Docker image and execute your scripts in it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "run_config_docker = RunConfiguration()\n", - "\n", - "run_config_docker.environment.python.user_managed_dependencies = False\n", - "run_config_docker.prepare_environment = True\n", - "run_config_docker.environment.docker.enabled = True\n", - "run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", - "\n", - "# Specify conda dependencies with scikit-learn\n", - "cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n", - "run_config_docker.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Submit script to run in the system-managed environment\n", - "A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.\n", - "\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)\n", - "run = exp.submit(src)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Get run history details\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Query run metrics" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history", - "get metrics" - ] - }, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "run.get_metrics()\n", - "metrics = run.get_metrics()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's find the model that has the lowest MSE value logged." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n", - "\n", - "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", - " min(metrics['mse']), \n", - " best_alpha\n", - "))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also list all the files that are associated with this run record" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We know the model `ridge_0.40.pkl` is the best performing model from the eariler queries. So let's register it with the workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# supply a model name, and the full path to the serialized model file.\n", - "model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(model.name, model.version, model.url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you can deploy this model following the example in the 01 notebook." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/03.train-on-aci/03.train-on-aci.ipynb b/01.getting-started/03.train-on-aci/03.train-on-aci.ipynb index 83cfe617..521396f5 100644 --- a/01.getting-started/03.train-on-aci/03.train-on-aci.ipynb +++ b/01.getting-started/03.train-on-aci/03.train-on-aci.ipynb @@ -1,284 +1,284 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 03. Train on Azure Container Instance\n", + "\n", + "* Create Workspace\n", + "* Create `train.py` in the project folder.\n", + "* Configure an ACI (Azure Container Instance) run\n", + "* Execute in ACI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create An Experiment\n", + "\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "experiment_name = 'train-on-aci'\n", + "experiment = Experiment(workspace = ws, name = experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Remote execution on ACI\n", + "\n", + "The training script `train.py` is already created for you. Let's have a look." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('./train.py', 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure for using ACI\n", + "Linux-based ACI is available in `West US`, `East US`, `West Europe`, `North Europe`, `West US 2`, `Southeast Asia`, `Australia East`, `East US 2`, and `Central US` regions. See details [here](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-quotas#region-availability)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure run" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# create a new runconfig object\n", + "run_config = RunConfiguration()\n", + "\n", + "# signal that you want to use ACI to execute script.\n", + "run_config.target = \"containerinstance\"\n", + "\n", + "# ACI container group is only supported in certain regions, which can be different than the region the Workspace is in.\n", + "run_config.container_instance.region = 'eastus2'\n", + "\n", + "# set the ACI CPU and Memory \n", + "run_config.container_instance.cpu_cores = 1\n", + "run_config.container_instance.memory_gb = 2\n", + "\n", + "# enable Docker \n", + "run_config.environment.docker.enabled = True\n", + "\n", + "# set Docker base image to the default CPU-based image\n", + "run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", + "\n", + "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", + "run_config.environment.python.user_managed_dependencies = False\n", + "\n", + "# auto-prepare the Docker image when used for execution (if it is not already prepared)\n", + "run_config.auto_prepare_environment = True\n", + "\n", + "# specify CondaDependencies obj\n", + "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit the Experiment\n", + "Finally, run the training job on the ACI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time \n", + "from azureml.core.script_run_config import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory='./',\n", + " script='train.py',\n", + " run_config=run_config)\n", + "\n", + "run = experiment.submit(script_run_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "# Show run details\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get metrics" + ] + }, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "run.get_metrics()\n", + "metrics = run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", + " min(metrics['mse']), \n", + " metrics['alpha'][np.argmin(metrics['mse'])]\n", + "))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# show all the files stored within the run record\n", + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you can take a model produced here, register it and then deploy as a web service." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 03. Train on Azure Container Instance\n", - "\n", - "* Create Workspace\n", - "* Create `train.py` in the project folder.\n", - "* Configure an ACI (Azure Container Instance) run\n", - "* Execute in ACI" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create An Experiment\n", - "\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "experiment_name = 'train-on-aci'\n", - "experiment = Experiment(workspace = ws, name = experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Remote execution on ACI\n", - "\n", - "The training script `train.py` is already created for you. Let's have a look." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('./train.py', 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure for using ACI\n", - "Linux-based ACI is available in `West US`, `East US`, `West Europe`, `North Europe`, `West US 2`, `Southeast Asia`, `Australia East`, `East US 2`, and `Central US` regions. See details [here](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-quotas#region-availability)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure run" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# create a new runconfig object\n", - "run_config = RunConfiguration()\n", - "\n", - "# signal that you want to use ACI to execute script.\n", - "run_config.target = \"containerinstance\"\n", - "\n", - "# ACI container group is only supported in certain regions, which can be different than the region the Workspace is in.\n", - "run_config.container_instance.region = 'eastus2'\n", - "\n", - "# set the ACI CPU and Memory \n", - "run_config.container_instance.cpu_cores = 1\n", - "run_config.container_instance.memory_gb = 2\n", - "\n", - "# enable Docker \n", - "run_config.environment.docker.enabled = True\n", - "\n", - "# set Docker base image to the default CPU-based image\n", - "run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", - "\n", - "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", - "run_config.environment.python.user_managed_dependencies = False\n", - "\n", - "# auto-prepare the Docker image when used for execution (if it is not already prepared)\n", - "run_config.auto_prepare_environment = True\n", - "\n", - "# specify CondaDependencies obj\n", - "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit the Experiment\n", - "Finally, run the training job on the ACI" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "remote run", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time \n", - "from azureml.core.script_run_config import ScriptRunConfig\n", - "\n", - "script_run_config = ScriptRunConfig(source_directory='./',\n", - " script='train.py',\n", - " run_config=run_config)\n", - "\n", - "run = experiment.submit(script_run_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history" - ] - }, - "outputs": [], - "source": [ - "# Show run details\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "remote run", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "get metrics" - ] - }, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "run.get_metrics()\n", - "metrics = run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", - " min(metrics['mse']), \n", - " metrics['alpha'][np.argmin(metrics['mse'])]\n", - "))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# show all the files stored within the run record\n", - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you can take a model produced here, register it and then deploy as a web service." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/04.train-on-remote-vm/04.train-on-remote-vm.ipynb b/01.getting-started/04.train-on-remote-vm/04.train-on-remote-vm.ipynb index 45af3acc..4587cd83 100644 --- a/01.getting-started/04.train-on-remote-vm/04.train-on-remote-vm.ipynb +++ b/01.getting-started/04.train-on-remote-vm/04.train-on-remote-vm.ipynb @@ -1,613 +1,615 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 04. Train in a remote Linux VM\n", + "* Create Workspace\n", + "* Create `train.py` file\n", + "* Create (or attach) DSVM as compute resource.\n", + "* Upoad data files into default datastore\n", + "* Configure & execute a run in a few different ways\n", + " - Use system-built conda\n", + " - Use existing Python environment\n", + " - Use Docker \n", + "* Find the best model in the run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Experiment\n", + "\n", + "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'train-on-remote-vm'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's also create a local folder to hold the training script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "script_folder = './vm-run'\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload data files into datastore\n", + "Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the default datastore\n", + "ds = ws.get_default_datastore()\n", + "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load diabetes data from `scikit-learn` and save it as 2 local files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "import numpy as np\n", + "\n", + "training_data = load_diabetes()\n", + "np.save(file='./feeatures.npy', arr=training_data['data'])\n", + "np.save(file='./labels.npy', arr=training_data['target'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's upload the 2 files into the default datastore under a path named `diabetes`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload_files(['./feeatures.npy', './labels.npy'], target_path='diabetes', overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## View `train.py`\n", + "\n", + "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# copy train.py into the script folder\n", + "import shutil\n", + "shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n", + "\n", + "with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n", + " print(training_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Linux DSVM as a compute target\n", + "\n", + "**Note**: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", + " \n", + "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import DsvmCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "compute_target_name = 'mydsvm'\n", + "\n", + "try:\n", + " dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n", + " print('found existing:', dsvm_compute.name)\n", + "except ComputeTargetException:\n", + " print('creating new.')\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n", + " dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n", + " dsvm_compute.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Attach an existing Linux DSVM\n", + "You can also attach an existing Linux VM as a compute target. The default port is 22." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'''\n", + "from azureml.core.compute import RemoteCompute \n", + "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", + "attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n", + " name=\"attached_vm\",\n", + " username='',\n", + " address='',\n", + " ssh_port=22,\n", + " password='')\n", + "attached_dsvm_compute.wait_for_completion(show_output=True)\n", + "'''\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run\n", + "First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the copmute target." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import DataReferenceConfiguration\n", + "dr = DataReferenceConfiguration(datastore_name=ds.name, \n", + " path_on_datastore='diabetes', \n", + " mode='download', # download files from datastore to compute target\n", + " overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can try a few different ways to run the training script in the VM." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conda run\n", + "You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# create a new RunConfig object\n", + "conda_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to the Linux DSVM\n", + "conda_run_config.target = dsvm_compute.name\n", + "\n", + "# set the data reference of the run configuration\n", + "conda_run_config.data_references = {ds.name: dr}\n", + "\n", + "# specify CondaDependencies obj\n", + "conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Run\n", + "from azureml.core import ScriptRunConfig\n", + "\n", + "src = ScriptRunConfig(source_directory=script_folder, \n", + " script='train.py', \n", + " run_config=conda_run_config, \n", + " # pass the datastore reference as a parameter to the training script\n", + " arguments=['--data-folder', str(ds.as_download())] \n", + " ) \n", + "run = exp.submit(config=src)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the run object. You can navigate to the Azure portal to see detailed information about the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Native VM run\n", + "You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create a new RunConfig object\n", + "vm_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to the Linux DSVM\n", + "vm_run_config.target = dsvm_compute.name\n", + "\n", + "# set the data reference of the run coonfiguration\n", + "conda_run_config.data_references = {ds.name: dr}\n", + "\n", + "# Let system know that you will configure the Python environment yourself.\n", + "vm_run_config.environment.python.user_managed_dependencies = True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src = ScriptRunConfig(source_directory=script_folder, \n", + " script='train.py', \n", + " run_config=vm_run_config,\n", + " # pass the datastore reference as a parameter to the training script\n", + " arguments=['--data-folder', str(ds.as_download())])\n", + "run = exp.submit(config=src)\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to create another script `train2.py` that doesn't have azureml dependencies, and submit it instead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $script_folder/train2.py\n", + "print('####################################')\n", + "print('Hello World (without Azure ML SDK)!')\n", + "print('####################################')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's try again. And this time it should work fine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src = ScriptRunConfig(source_directory=script_folder, \n", + " script='train2.py', \n", + " run_config=vm_run_config)\n", + "run = exp.submit(config=src)\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note even in this case you get a run record with some basic statistics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure a Docker run with new conda environment on the VM\n", + "You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "\n", + "# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n", + "docker_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to the Linux DSVM\n", + "docker_run_config.target = dsvm_compute.name\n", + "\n", + "# Use Docker in the remote VM\n", + "docker_run_config.environment.docker.enabled = True\n", + "\n", + "# Use CPU base image from DockerHub\n", + "docker_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", + "print('Base Docker image is:', docker_run_config.environment.docker.base_image)\n", + "\n", + "# set the data reference of the run coonfiguration\n", + "docker_run_config.data_references = {ds.name: dr}\n", + "\n", + "# specify CondaDependencies obj\n", + "docker_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the Experiment\n", + "Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "src = ScriptRunConfig(source_directory=script_folder, \n", + " script='train.py', \n", + " run_config=docker_run_config,\n", + " # pass the datastore reference as a parameter to the training script\n", + " arguments=['--data-folder', str(ds.as_download())])\n", + "run = exp.submit(config=src)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### View run history details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Find the best model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we have tried various execution modes, we can find the best model from the last run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "run.get_metrics()\n", + "metrics = run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# find the index where MSE is the smallest\n", + "indices = list(range(0, len(metrics['mse'])))\n", + "min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n", + "\n", + "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", + " metrics['mse'][min_mse_index], \n", + " metrics['alpha'][min_mse_index]\n", + "))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up compute resource" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dsvm_compute.delete()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 04. Train in a remote Linux VM\n", - "* Create Workspace\n", - "* Create `train.py` file\n", - "* Create (or attach) DSVM as compute resource.\n", - "* Upoad data files into default datastore\n", - "* Configure & execute a run in a few different ways\n", - " - Use system-built conda\n", - " - Use existing Python environment\n", - " - Use Docker \n", - "* Find the best model in the run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'train-on-remote-vm'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's also create a local folder to hold the training script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "script_folder = './vm-run'\n", - "os.makedirs(script_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data files into datastore\n", - "Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the default datastore\n", - "ds = ws.get_default_datastore()\n", - "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load diabetes data from `scikit-learn` and save it as 2 local files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_diabetes\n", - "import numpy as np\n", - "\n", - "training_data = load_diabetes()\n", - "np.save(file='./feeatures.npy', arr=training_data['data'])\n", - "np.save(file='./labels.npy', arr=training_data['target'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's upload the 2 files into the default datastore under a path named `diabetes`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload_files(['./feeatures.npy', './labels.npy'], target_path='diabetes', overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View `train.py`\n", - "\n", - "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# copy train.py into the script folder\n", - "import shutil\n", - "shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n", - "\n", - "with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n", - " print(training_script.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Linux DSVM as a compute target\n", - "\n", - "**Note**: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", - " \n", - "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import DsvmCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "compute_target_name = 'mydsvm'\n", - "\n", - "try:\n", - " dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n", - " print('found existing:', dsvm_compute.name)\n", - "except ComputeTargetException:\n", - " print('creating new.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n", - " dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n", - " dsvm_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Attach an existing Linux DSVM\n", - "You can also attach an existing Linux VM as a compute target. The default port is 22." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import RemoteCompute \n", - "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", - "attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n", - " name=\"attached_vm\",\n", - " username='',\n", - " address='',\n", - " ssh_port=22,\n", - " password='')\n", - "attached_dsvm_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run\n", - "First let's create a `DataReferenceConfigruation` object to inform the system what data folder to download to the copmute target." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import DataReferenceConfiguration\n", - "dr = DataReferenceConfiguration(datastore_name=ds.name, \n", - " path_on_datastore='diabetes', \n", - " mode='download', # download files from datastore to compute target\n", - " overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can try a few different ways to run the training script in the VM." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Conda run\n", - "You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# create a new RunConfig object\n", - "conda_run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to the Linux DSVM\n", - "conda_run_config.target = dsvm_compute.name\n", - "\n", - "# set the data reference of the run coonfiguration\n", - "conda_run_config.data_references = {ds.name: dr}\n", - "\n", - "# specify CondaDependencies obj\n", - "conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Run\n", - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory=script_folder, \n", - " script='train.py', \n", - " run_config=conda_run_config, \n", - " # pass the datastore reference as a parameter to the training script\n", - " arguments=['--data-folder', str(ds.as_download())] \n", - " ) \n", - "run = exp.submit(config=src)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the run object. You can navigate to the Azure portal to see detailed information about the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Native VM run\n", - "You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a new RunConfig object\n", - "vm_run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to the Linux DSVM\n", - "vm_run_config.target = dsvm_compute.name\n", - "\n", - "# set the data reference of the run coonfiguration\n", - "conda_run_config.data_references = {ds.name: dr}\n", - "\n", - "# Let system know that you will configure the Python environment yourself.\n", - "vm_run_config.environment.python.user_managed_dependencies = True" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src = ScriptRunConfig(source_directory=script_folder, \n", - " script='train.py', \n", - " run_config=vm_run_config,\n", - " # pass the datastore reference as a parameter to the training script\n", - " arguments=['--data-folder', str(ds.as_download())])\n", - "run = exp.submit(config=src)\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to create another script `train2.py` that doesn't have azureml dependencies, and submit it instead." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $script_folder/train2.py\n", - "print('####################################')\n", - "print('Hello World (without Azure ML SDK)!')\n", - "print('####################################')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's try again. And this time it should work fine." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src = ScriptRunConfig(source_directory=script_folder, \n", - " script='train2.py', \n", - " run_config=vm_run_config)\n", - "run = exp.submit(config=src)\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note even in this case you get a run record with some basic statistics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure a Docker run with new conda environment on the VM\n", - "You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "\n", - "# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n", - "docker_run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to the Linux DSVM\n", - "docker_run_config.target = dsvm_compute.name\n", - "\n", - "# Use Docker in the remote VM\n", - "docker_run_config.environment.docker.enabled = True\n", - "\n", - "# Use CPU base image from DockerHub\n", - "docker_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", - "print('Base Docker image is:', docker_run_config.environment.docker.base_image)\n", - "\n", - "# set the data reference of the run coonfiguration\n", - "docker_run_config.data_references = {ds.name: dr}\n", - "\n", - "# specify CondaDependencies obj\n", - "docker_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the Experiment\n", - "Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "src = ScriptRunConfig(source_directory=script_folder, \n", - " script='train.py', \n", - " run_config=docker_run_config,\n", - " # pass the datastore reference as a parameter to the training script\n", - " arguments=['--data-folder', str(ds.as_download())])\n", - "run = exp.submit(config=src)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the best model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we have tried various execution modes, we can find the best model from the last run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "run.get_metrics()\n", - "metrics = run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find the index where MSE is the smallest\n", - "indices = list(range(0, len(metrics['mse'])))\n", - "min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n", - "\n", - "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n", - " metrics['mse'][min_mse_index], \n", - " metrics['alpha'][min_mse_index]\n", - "))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up compute resource" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dsvm_compute.delete()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/05.train-in-spark/05.train-in-spark.ipynb b/01.getting-started/05.train-in-spark/05.train-in-spark.ipynb index 3d9a9edd..d944760a 100644 --- a/01.getting-started/05.train-in-spark/05.train-in-spark.ipynb +++ b/01.getting-started/05.train-in-spark/05.train-in-spark.ipynb @@ -1,321 +1,326 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 05. Train in Spark\n", + "* Create Workspace\n", + "* Create Experiment\n", + "* Copy relevant files to the script folder\n", + "* Configure and Run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Experiment\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'train-on-spark'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## View `train-spark.py`\n", + "\n", + "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train-spark.py` in a cell to show the file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('train-spark.py', 'r') as training_script:\n", + " print(training_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure an ACI run\n", + "Before you try running on an actual Spark cluster, you can use a Docker image with Spark already baked in, and run it in ACI(Azure Container Registry)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# use pyspark framework\n", + "aci_run_config = RunConfiguration(framework=\"pyspark\")\n", + "\n", + "# use ACI to run the Spark job\n", + "aci_run_config.target = 'containerinstance'\n", + "aci_run_config.container_instance.region = 'eastus2'\n", + "aci_run_config.container_instance.cpu_cores = 1\n", + "aci_run_config.container_instance.memory_gb = 2\n", + "\n", + "# specify base Docker image to use\n", + "aci_run_config.environment.docker.enabled = True\n", + "aci_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_MMLSPARK_CPU_IMAGE\n", + "\n", + "# specify CondaDependencies\n", + "cd = CondaDependencies()\n", + "cd.add_conda_package('numpy')\n", + "aci_run_config.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit script to ACI to run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory = '.',\n", + " script= 'train-spark.py',\n", + " run_config = aci_run_config)\n", + "run = exp.submit(script_run_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note** you can also create a new VM, or attach an existing VM, and use Docker-based execution to run the Spark job. Please see the `04.train-in-vm` for example on how to configure and run in Docker mode in a VM." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attach an HDI cluster\n", + "Now we can use a real Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n", + " 1. Create a Spark for HDI cluster in Azure. Here are some [quick instructions](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql). Make sure you use the Ubuntu flavor, NOT CentOS.\n", + " 2. Enter the IP address, username and password below" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import HDInsightCompute\n", + "from azureml.exceptions import ComputeTargetException\n", + "\n", + "try:\n", + " # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n", + " hdi_compute = HDInsightCompute.attach(workspace=ws, \n", + " name=\"myhdi\", \n", + " address=\".azurehdinsight.net\", \n", + " ssh_port=22, \n", + " username='', \n", + " password='')\n", + "\n", + "except ComputeTargetException as e:\n", + " print(\"Caught = {}\".format(e.message))\n", + " \n", + " \n", + "hdi_compute.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure HDI run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "\n", + "# use pyspark framework\n", + "hdi_run_config = RunConfiguration(framework=\"pyspark\")\n", + "\n", + "# Set compute target to the HDI cluster\n", + "hdi_run_config.target = hdi_compute.name\n", + "\n", + "# specify CondaDependencies object to ask system installing numpy\n", + "cd = CondaDependencies()\n", + "cd.add_conda_package('numpy')\n", + "hdi_run_config.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the script to HDI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory = '.',\n", + " script= 'train-spark.py',\n", + " run_config = hdi_run_config)\n", + "run = exp.submit(config=script_run_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get the URL of the run history web page\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get all metris logged in the run\n", + "metrics = run.get_metrics()\n", + "print(metrics)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 05. Train in Spark\n", - "* Create Workspace\n", - "* Create Experiment\n", - "* Copy relevant files to the script folder\n", - "* Configure and Run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'train-on-spark'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View `train-spark.py`\n", - "\n", - "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train-spark.py` in a cell to show the file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('train-spark.py', 'r') as training_script:\n", - " print(training_script.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure an ACI run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "# use pyspark framework\n", - "aci_run_config = RunConfiguration(framework=\"pyspark\")\n", - "\n", - "# use ACI to run the Spark job\n", - "aci_run_config.target = 'containerinstance'\n", - "aci_run_config.container_instance.region = 'eastus2'\n", - "aci_run_config.container_instance.cpu_cores = 1\n", - "aci_run_config.container_instance.memory_gb = 2\n", - "\n", - "# specify base Docker image to use\n", - "aci_run_config.environment.docker.enabled = True\n", - "aci_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_MMLSPARK_CPU_IMAGE\n", - "\n", - "# specify CondaDependencies\n", - "cd = CondaDependencies()\n", - "cd.add_conda_package('numpy')\n", - "aci_run_config.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit script to ACI to run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "script_run_config = ScriptRunConfig(source_directory = '.',\n", - " script= 'train-spark.py',\n", - " run_config = aci_run_config)\n", - "run = exp.submit(script_run_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Attach an HDI cluster\n", - "Now we can use a real Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n", - " 1. Create an Spark for HDI cluster in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS.\n", - " 2. Enter the IP address, username and password below" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import HDInsightCompute\n", - "from azureml.exceptions import ComputeTargetException\n", - "\n", - "try:\n", - " # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n", - " hdi_compute = HDInsightCompute.attach(workspace=ws, \n", - " name=\"myhdi\", \n", - " address=\"myhdi-ssh.azurehdinsight.net\", \n", - " ssh_port=22, \n", - " username='', \n", - " password='')\n", - "\n", - "except ComputeTargetException as e:\n", - " print(\"Caught = {}\".format(e.message))\n", - " \n", - " \n", - "hdi_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure HDI run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "\n", - "# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n", - "hdi_run_config = RunConfiguration(framework=\"pyspark\")\n", - "\n", - "# Set compute target to the Linux DSVM\n", - "hdi_run_config.target = hdi_compute.name\n", - "\n", - "# Ask system to provision a new one based on the conda_dependencies.yml file\n", - "hdi_run_config.environment.python.user_managed_dependencies = False\n", - "\n", - "# specify CondaDependencies obj\n", - "cd = CondaDependencies()\n", - "cd.add_conda_package('numpy')\n", - "hdi_run_config.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the script to HDI" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "\n", - "script_run_config = ScriptRunConfig(source_directory = '.',\n", - " script= 'train-spark.py',\n", - " run_config = hdi_run_config)\n", - "run = exp.submit(script_run_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get the URL of the run history web page\n", - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# get all metris logged in the run\n", - "metrics = run.get_metrics()\n", - "print(metrics)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/10.register-model-create-image-deploy-service/10.register-model-create-image-deploy-service.ipynb b/01.getting-started/10.register-model-create-image-deploy-service/10.register-model-create-image-deploy-service.ipynb index 97291db8..45d9e429 100644 --- a/01.getting-started/10.register-model-create-image-deploy-service/10.register-model-create-image-deploy-service.ipynb +++ b/01.getting-started/10.register-model-create-image-deploy-service/10.register-model-create-image-deploy-service.ipynb @@ -1,421 +1,421 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10. Register Model, Create Image and Deploy Service\n", + "\n", + "This example shows how to deploy a web service in step-by-step fashion:\n", + "\n", + " 1. Register model\n", + " 2. Query versions of models and select one to deploy\n", + " 3. Create Docker image\n", + " 4. Query versions of images\n", + " 5. Deploy the image as web service\n", + " \n", + "**IMPORTANT**:\n", + " * This notebook requires you to first complete \"01.SDK-101-Train-and-Deploy-to-ACI.ipynb\" Notebook\n", + " \n", + "The 101 Notebook taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "import sklearn\n", + "\n", + "library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n", + "\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n", + " model_name = \"sklearn_regression_model.pkl\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "regression_models = Model.list(workspace=ws, tags=['area'])\n", + "for m in regression_models:\n", + " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can pick a specific model to deploy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(model.name, model.description, model.version, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Docker Image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " except Exception as e:\n", + " result = str(e)\n", + " return json.dumps({\"result\": result.tolist()})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "myenv.add_pip_package(\"pynacl==1.2.1\")\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that following command can take few minutes. \n", + "\n", + "You can add tags and descriptions to images. Also, an image can contain multiple models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.image import Image, ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", + " execution_script=\"score.py\",\n", + " conda_file=\"myenv.yml\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Image with ridge regression model\")\n", + "\n", + "image = Image.create(name = \"myimage1\",\n", + " # this is the model object \n", + " models = [model],\n", + " image_config = image_config, \n", + " workspace = ws)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "List images by tag and find out the detailed build log for debugging." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "for i in Image.list(workspace = ws,tags = [\"area\"]):\n", + " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy image as web service on Azure Container Instance\n", + "\n", + "Note that the service creation can take few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", + " description = 'Predict diabetes using regression model')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'my-aci-service-2'\n", + "print(aci_service_name)\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test web service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call the web service with some dummy input data to get a prediction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "prediction = aci_service.run(input_data = test_sample)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete ACI to clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "aci_service.delete()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 10. Register Model, Create Image and Deploy Service\n", - "\n", - "This example shows how to deploy a web service in step-by-step fashion:\n", - "\n", - " 1. Register model\n", - " 2. Query versions of models and select one to deploy\n", - " 3. Create Docker image\n", - " 4. Query versions of images\n", - " 5. Deploy the image as web service\n", - " \n", - "**IMPORTANT**:\n", - " * This notebook requires you to first complete \"01.SDK-101-Train-and-Deploy-to-ACI.ipynb\" Notebook\n", - " \n", - "The 101 Notebook taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n", - "\n", - "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "import sklearn\n", - "\n", - "library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n", - "\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n", - " model_name = \"sklearn_regression_model.pkl\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "regression_models = Model.list(workspace=ws, tags=['area'])\n", - "for m in regression_models:\n", - " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can pick a specific model to deploy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(model.name, model.description, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Docker Image" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", - " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", - " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " except Exception as e:\n", - " result = str(e)\n", - " return json.dumps({\"result\": result.tolist()})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "myenv.add_pip_package(\"pynacl==1.2.1\")\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that following command can take few minutes. \n", - "\n", - "You can add tags and descriptions to images. Also, an image can contain multiple models." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Image with ridge regression model\")\n", - "\n", - "image = Image.create(name = \"myimage1\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "List images by tag and find out the detailed build log for debugging." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "for i in Image.list(workspace = ws,tags = [\"area\"]):\n", - " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy image as web service on Azure Container Instance\n", - "\n", - "Note that the service creation can take few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", - " description = 'Predict diabetes using regression model')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'my-aci-service-2'\n", - "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test web service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Call the web service with some dummy input data to get a prediction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "prediction = aci_service.run(input_data = test_sample)\n", - "print(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete ACI to clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "aci_service.delete()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/11.production-deploy-to-aks/11.production-deploy-to-aks.ipynb b/01.getting-started/11.production-deploy-to-aks/11.production-deploy-to-aks.ipynb index 3b680880..a2058671 100644 --- a/01.getting-started/11.production-deploy-to-aks/11.production-deploy-to-aks.ipynb +++ b/01.getting-started/11.production-deploy-to-aks/11.production-deploy-to-aks.ipynb @@ -1,335 +1,336 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploying a web service to Azure Kubernetes Service (AKS)\n", + "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", + "We then test and delete the service, image and model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.compute import AksCompute, ComputeTarget\n", + "from azureml.core.webservice import Webservice, AksWebservice\n", + "from azureml.core.image import Image\n", + "from azureml.core.model import Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Get workspace\n", + "Load existing workspace from the config file info." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register the model\n", + "Register an existing trained model, add descirption and tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", + " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create an image\n", + "Create an image using the registered model the script that will load and run the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " except Exception as e:\n", + " result = str(e)\n", + " return json.dumps({\"result\": result.tolist()})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "myenv.add_pip_package(\"pynacl==1.2.1\")\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"Image with ridge regression model\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", + " )\n", + "\n", + "image = ContainerImage.create(name = \"myimage1\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Provision the AKS Cluster\n", + "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the default configuration (can also provide parameters to customize)\n", + "prov_config = AksCompute.provisioning_configuration()\n", + "\n", + "aks_name = 'my-aks-9' \n", + "# Create the cluster\n", + "aks_target = ComputeTarget.create(workspace = ws, \n", + " name = aks_name, \n", + " provisioning_configuration = prov_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_target.wait_for_completion(show_output = True)\n", + "print(aks_target.provisioning_state)\n", + "print(aks_target.provisioning_errors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optional step: Attach existing AKS cluster\n", + "\n", + "If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'''\n", + "# Use the default configuration (can also provide parameters to customize)\n", + "resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n", + "\n", + "create_name='my-existing-aks' \n", + "# Create the cluster\n", + "aks_target = AksCompute.attach(workspace=ws, name=create_name, resource_id=resource_id)\n", + "# Wait for the operation to complete\n", + "aks_target.wait_for_completion(True)\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploy web service to AKS" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Set the web service configuration (using default here)\n", + "aks_config = AksWebservice.deploy_configuration()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service_name ='aks-service-1'\n", + "\n", + "aks_service = Webservice.deploy_from_image(workspace = ws, \n", + " name = aks_service_name,\n", + " image = image,\n", + " deployment_config = aks_config,\n", + " deployment_target = aks_target)\n", + "aks_service.wait_for_deployment(show_output = True)\n", + "print(aks_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Test the web service\n", + "We test the web sevice by passing data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "prediction = aks_service.run(input_data = test_sample)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Clean up\n", + "Delete the service, image and model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service.delete()\n", + "image.delete()\n", + "model.delete()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploying a web service to Azure Kubernetes Service (AKS)\n", - "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", - "We then test and delete the service, image and model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.webservice import Webservice, AksWebservice\n", - "from azureml.core.image import Image\n", - "from azureml.core.model import Model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Get workspace\n", - "Load existing workspace from the config file info." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Register the model\n", - "Register an existing trained model, add descirption and tags." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "from azureml.core.model import Model\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", - " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)\n", - "\n", - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create an image\n", - "Create an image using the registered model the script that will load and run the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", - " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", - " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " except Exception as e:\n", - " result = str(e)\n", - " return json.dumps({\"result\": result.tolist()})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"Image with ridge regression model\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", - " )\n", - "\n", - "image = ContainerImage.create(name = \"myimage1\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Provision the AKS Cluster\n", - "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the default configuration (can also provide parameters to customize)\n", - "prov_config = AksCompute.provisioning_configuration()\n", - "\n", - "aks_name = 'my-aks-9' \n", - "# Create the cluster\n", - "aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_target.wait_for_completion(show_output = True)\n", - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional step: Attach existing AKS cluster\n", - "\n", - "If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "'''\n", - "# Use the default configuration (can also provide parameters to customize)\n", - "resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n", - "\n", - "create_name='my-existing-aks' \n", - "# Create the cluster\n", - "aks_target = AksCompute.attach(workspace=ws, name=create_name, resource_id=resource_id)\n", - "# Wait for the operation to complete\n", - "aks_target.wait_for_completion(True)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploy web service to AKS" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Set the web service configuration (using default here)\n", - "aks_config = AksWebservice.deploy_configuration()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service_name ='aks-service-1'\n", - "\n", - "aks_service = Webservice.deploy_from_image(workspace = ws, \n", - " name = aks_service_name,\n", - " image = image,\n", - " deployment_config = aks_config,\n", - " deployment_target = aks_target)\n", - "aks_service.wait_for_deployment(show_output = True)\n", - "print(aks_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Test the web service\n", - "We test the web sevice by passing data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "prediction = aks_service.run(input_data = test_sample)\n", - "print(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Clean up\n", - "Delete the service, image and model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service.delete()\n", - "image.delete()\n", - "model.delete()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/01.getting-started/12.enable-data-collection-for-models-in-aks/12.enable-data-collection-for-models-in-aks.ipynb b/01.getting-started/12.enable-data-collection-for-models-in-aks/12.enable-data-collection-for-models-in-aks.ipynb index 0db73f0f..852a82c2 100644 --- a/01.getting-started/12.enable-data-collection-for-models-in-aks/12.enable-data-collection-for-models-in-aks.ipynb +++ b/01.getting-started/12.enable-data-collection-for-models-in-aks/12.enable-data-collection-for-models-in-aks.ipynb @@ -102,7 +102,7 @@ "### b. In your init function add:\n", "```python \n", "global inputs_dc, prediction_d\n", - "inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\". \"feat4\", \"feat5\", \"Feat6\"])\n", + "inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n", "prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n", " \n", "* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n", @@ -180,6 +180,7 @@ "\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", "myenv.add_pip_package(\"azureml-monitoring\")\n", + "myenv.add_pip_package(\"pynacl==1.2.1\")\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -286,7 +287,7 @@ " create_name= 'myaks4'\n", " aks_target = AksCompute.attach(workspace = ws, \n", " name = create_name, \n", - " #esource_id=resource_id)\n", + " resource_id=resource_id)\n", " ## Wait for the operation to complete\n", " aks_target.wait_for_provisioning(True)```" ] diff --git a/automl/00.configuration.ipynb b/automl/00.configuration.ipynb index bc12144d..f9d9ed90 100644 --- a/automl/00.configuration.ipynb +++ b/automl/00.configuration.ipynb @@ -1,265 +1,265 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 00. Configuration\n", + "\n", + "In this example you will create an Azure Machine Learning `Workspace` object and initialize your notebook directory to easily reload this object from a configuration file. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n", + "\n", + "\n", + "## Prerequisites:\n", + "\n", + "Before running this notebook, run the `automl_setup` script described in README.md.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to Your Azure Subscription\n", + "\n", + "In order to use an Azure ML workspace, you need access to an Azure Subscription. You can [create a new Azure Subscription](https://azure.microsoft.com/en-us/free) or get existing subscription information from the [Azure portal](https://portal.azure.com).\n", + "\n", + "First login to Azure and follow prompts to authenticate. Then check that your subscription is correct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!az login" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!az account show" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you have multiple subscriptions and need to change the active one, you can use this command:\n", + "```shell\n", + "az account set -s \n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Machine Learning Services Resource Provider\n", + "\n", + "This step is required to use the Azure ML services backing the SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Register the new resource provider.\n", + "!az provider register -n Microsoft.MachineLearningServices\n", + "\n", + "# Check resource provider registration status.\n", + "!az provider show -n Microsoft.MachineLearningServices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Check the Azure ML Core SDK Version to Validate Your Installation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "\n", + "print(\"SDK Version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize an Azure ML Workspace\n", + "### What is an Azure ML Workspace and Why Do I Need One?\n", + "\n", + "An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n", + "\n", + "\n", + "### What do I Need?\n", + "\n", + "To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n", + "* A name for your workspace. You can choose one.\n", + "* Your subscription id. Use the `id` value from the `az account show` command output above.\n", + "* The resource group name. The resource group organizes Azure resources and provides default region for the resources in the group. You can either specify a new one, in which case it gets created for your workspace, or use an existing one or create a new one from [Azure portal](https://portal.azure.com)\n", + "* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "subscription_id = \"\"\n", + "resource_group = \"myrg\"\n", + "workspace_name = \"myws\"\n", + "workspace_region = \"eastus2\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating a Workspace\n", + "If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n", + "\n", + "This will fail when:\n", + "1. The workspace already exists.\n", + "2. You do not have permission to create a workspace in the resource group.\n", + "3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n", + "\n", + "If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n", + "\n", + "**Note:** Creation of a new workspace can take several minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import the Workspace class and check the Azure ML SDK version.\n", + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.create(name = workspace_name,\n", + " subscription_id = subscription_id,\n", + " resource_group = resource_group, \n", + " location = workspace_region)\n", + "ws.get_details()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configuring Your Local Environment\n", + "You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace(workspace_name = workspace_name,\n", + " subscription_id = subscription_id,\n", + " resource_group = resource_group)\n", + "\n", + "# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n", + "ws.write_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then load the workspace from this config file from any notebook in the current directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load workspace configuration from ./aml_config/config.json file.\n", + "my_workspace = Workspace.from_config()\n", + "my_workspace.get_details()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a Folder to Host All Sample Projects\n", + "Finally, create a folder where all the sample projects will be hosted." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "sample_projects_folder = './sample_projects'\n", + "\n", + "if not os.path.isdir(sample_projects_folder):\n", + " os.mkdir(sample_projects_folder)\n", + " \n", + "print('Sample projects will be created in {}.'.format(sample_projects_folder))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Success!\n", + "Great, you are ready to move on to the rest of the sample notebooks." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 00. configuration\n", - "\n", - "In this example you will create an Azure Machine Learning Workspace and initialize your notebook directory to easily use this workspace. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n", - "\n", - "\n", - "## Prerequisites:\n", - "\n", - "Before running this notebook, run the automl_setup script described in README.md.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to your Azure Subscription\n", - "\n", - "In order to use an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com).\n", - "\n", - "First login to azure and follow prompts to authenticate. Then check that your subscription is correct" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!az login" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!az account show" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you have multiple subscriptions and need to change the active one, you can use a command\n", - "```shell\n", - "az account set -s \n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register Machine Learning Services Resource Provider\n", - "\n", - "This step is required to use the Azure ML services backing the SDK." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# register the new RP\n", - "!az provider register -n Microsoft.MachineLearningServices\n", - "\n", - "# check the registration status\n", - "!az provider show -n Microsoft.MachineLearningServices" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Check core SDK version number for validate your installation and for debugging purposes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "\n", - "print(\"SDK Version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize an Azure ML Workspace\n", - "### What is an Azure ML Workspace and why do I need one?\n", - "\n", - "An AML Workspace is an Azure resource that organaizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n", - "\n", - "\n", - "### What do I need\n", - "\n", - "To create or access an Azure ML Workspace, you will need to import the AML library and specify following information:\n", - "* A name for your workspace. You can choose one.\n", - "* Your subscription id. Use *id* value from *az account show* output above. \n", - "* The resource group name. Resource group organizes Azure resources and provides default region for the resources in the group. You can either specify a new one, in which case it gets created for your Workspace, or use an existing one or create a new one from [Azure portal](https://portal.azure.com)\n", - "* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "subscription_id = \"\"\n", - "resource_group = \"myrg\"\n", - "workspace_name = \"myws\"\n", - "workspace_region = \"eastus2\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating a workspace\n", - "If you already have access to an AML Workspace you want to use, you can skip this cell. Otherwise, this cell will create an AML workspace for you in a subscription provided you have the correct permissions for the given `subscription_id`.\n", - "\n", - "This will fail when:\n", - "1. The workspace already exists\n", - "2. You do not have permission to create a workspace in the resource group\n", - "3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n", - "\n", - "If workspace creation fails for any reason other than already existing, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.\n", - "\n", - "**Note** The workspace creation can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import the Workspace class and check the azureml SDK version\n", - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.create(name = workspace_name,\n", - " subscription_id = subscription_id,\n", - " resource_group = resource_group, \n", - " location = workspace_region)\n", - "ws.get_details()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuring your local environment\n", - "You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace(workspace_name = workspace_name,\n", - " subscription_id = subscription_id,\n", - " resource_group = resource_group)\n", - "\n", - "# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n", - "ws.write_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can then load the workspace from this config file from any notebook in the current directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load workspace configuratio from ./aml_config/config.json file.\n", - "my_workspace = Workspace.from_config()\n", - "my_workspace.get_details()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a folder to host all sample projects\n", - "Lastly, create a folder where all the sample projects will be hosted." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "sample_projects_folder = './sample_projects'\n", - "\n", - "if not os.path.isdir(sample_projects_folder):\n", - " os.mkdir(sample_projects_folder)\n", - " \n", - "print('Sample projects will be created in {}.'.format(sample_projects_folder))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Success!\n", - "Great, you are ready to move on to the rest of the sample notebooks." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/01.auto-ml-classification.ipynb b/automl/01.auto-ml-classification.ipynb index a5f6fe98..1a67d9e6 100644 --- a/automl/01.auto-ml-classification.ipynb +++ b/automl/01.auto-ml-classification.ipynb @@ -1,399 +1,407 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 01: Classification with Local Compute\n", + "\n", + "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "3. Train the model using local compute.\n", + "4. Explore the results.\n", + "5. Test the best fitted model.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the experiment and specify the project folder.\n", + "experiment_name = 'automl-local-classification'\n", + "project_folder = './sample_projects/automl-local-classification'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Training Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn import datasets\n", + "\n", + "digits = datasets.load_digits()\n", + "\n", + "# Exclude the first 100 rows from training so that they can be used for test.\n", + "X_train = digits.data[100:,:]\n", + "y_train = digits.target[100:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 3600,\n", + " iterations = 50,\n", + " n_cross_validations = 3,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = local_run.continue_experiment(X = X_train, \n", + " y = y_train, \n", + " show_output = True,\n", + " iterations = 5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model that has the smallest `log_loss` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lookup_metric = \"log_loss\"\n", + "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 3\n", + "third_run, third_model = local_run.get_output(iteration = iteration)\n", + "print(third_run)\n", + "print(third_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model\n", + "\n", + "#### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Testing Our Best Pipeline\n", + "We will try to predict 2 digits and see how our model works." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Randomly select digits and test.\n", + "for index in np.random.choice(len(y_test), 2, replace = False):\n", + " print(index)\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", + " fig = plt.figure(1, figsize = (3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 01: Classification with local compute\n", - "\n", - "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment in an existing Workspace\n", - "2. Instantiating AutoMLConfig\n", - "3. Training the Model using local compute\n", - "4. Exploring the results\n", - "5. Testing the fitted model\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'automl-local-classification'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-classification'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load Digits Dataset" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn import datasets\n", - "\n", - "digits = datasets.load_digits()\n", - "\n", - "# Exclude the first 100 rows from training so that they can be used for test.\n", - "X_digits = digits.data[100:,:]\n", - "y_digits = digits.target[100:]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data |\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " primary_metric = 'AUC_weighted',\n", - " max_time_sec = 3600,\n", - " iterations = 50,\n", - " n_cross_validations = 3,\n", - " verbosity = logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Optionally, you can continue an interrupted local run by calling continue_experiment without the iterations parameter, or run more iterations to a completed run by specifying the iterations parameter:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = local_run.continue_experiment(X = X_digits, \n", - " y = y_digits, \n", - " show_output = True,\n", - " iterations = 5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric\n", - "Give me the run and the model that has the smallest `log_loss`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration\n", - "Give me the run and the model from the 3rd iteration:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 3\n", - "third_run, third_model = local_run.get_output(iteration = iteration)\n", - "print(third_run)\n", - "print(third_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n", - "\n", - "#### Load Test Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Testing our best pipeline\n", - "We will try to predict 2 digits and see how our model works." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "for index in np.random.choice(len(y_digits), 2):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " title = \"Label value = %d Predicted value = %d \" % ( label,predicted)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/02.auto-ml-regression.ipynb b/automl/02.auto-ml-regression.ipynb index d7b7aa16..bc14d3c1 100644 --- a/automl/02.auto-ml-regression.ipynb +++ b/automl/02.auto-ml-regression.ipynb @@ -1,409 +1,409 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 02: Regression with Local Compute\n", + "\n", + "In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple regression problem.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "3. Train the model using local compute.\n", + "4. Explore the results.\n", + "5. Test the best fitted model.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the experiment and specify the project folder.\n", + "experiment_name = 'automl-local-regression'\n", + "project_folder = './sample_projects/automl-local-regression'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Training Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y = True)\n", + "\n", + "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'regression',\n", + " max_time_sec = 600,\n", + " iterations = 10,\n", + " primary_metric = 'spearman_correlation',\n", + " n_cross_validations = 5,\n", + " debug_log = 'automl.log',\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lookup_metric = \"root_mean_squared_error\"\n", + "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 3\n", + "third_run, third_model = local_run.get_output(iteration = iteration)\n", + "print(third_run)\n", + "print(third_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Predict on training and test set, and calculate residual values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y_pred_train = fitted_model.predict(X_train)\n", + "y_residual_train = y_train - y_pred_train\n", + "\n", + "y_pred_test = fitted_model.predict(X_test)\n", + "y_residual_test = y_test - y_pred_test" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn import datasets\n", + "from sklearn.metrics import mean_squared_error, r2_score\n", + "\n", + "# Set up a multi-plot chart.\n", + "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n", + "f.suptitle('Regression Residual Values', fontsize = 18)\n", + "f.set_figheight(6)\n", + "f.set_figwidth(16)\n", + "\n", + "# Plot residual values of training set.\n", + "a0.axis([0, 360, -200, 200])\n", + "a0.plot(y_residual_train, 'bo', alpha = 0.5)\n", + "a0.plot([-10,360],[0,0], 'r-', lw = 3)\n", + "a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n", + "a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n", + "a0.set_xlabel('Training samples', fontsize = 12)\n", + "a0.set_ylabel('Residual Values', fontsize = 12)\n", + "\n", + "# Plot a histogram.\n", + "a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n", + "a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n", + "\n", + "# Plot residual values of test set.\n", + "a1.axis([0, 90, -200, 200])\n", + "a1.plot(y_residual_test, 'bo', alpha = 0.5)\n", + "a1.plot([-10,360],[0,0], 'r-', lw = 3)\n", + "a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n", + "a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n", + "a1.set_xlabel('Test samples', fontsize = 12)\n", + "a1.set_yticklabels([])\n", + "\n", + "# Plot a histogram.\n", + "a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n", + "a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n", + "\n", + "plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 02: Regression with local compute\n", - "\n", - "In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple regression problem.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Instantiating AutoMLConfig\n", - "3. Training the Model using local compute\n", - "4. Exploring the results\n", - "5. Testing the fitted model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the experiment\n", - "experiment_name = 'automl-local-regression'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-regression'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Read Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n", - "from sklearn.datasets import load_diabetes\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.metrics import mean_squared_error\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "X, y = load_diabetes(return_X_y = True)\n", - "\n", - "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Regression supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error
normalized_root_mean_squared_log_error|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task='regression',\n", - " max_time_sec = 600,\n", - " iterations = 10,\n", - " primary_metric = 'spearman_correlation', \n", - " n_cross_validations = 5,\n", - " debug_log = 'automl.log',\n", - " verbosity = logging.INFO,\n", - " X = X_train, \n", - " y = y_train,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - " \n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric\n", - "Show the run and model that has the smallest `root_mean_squared_error` (which turned out to be the same as the one with largest `spearman_correlation` value):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"root_mean_squared_error\"\n", - "best_run, fitted_model = local_run.get_output(metric=lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration\n", - "\n", - "Simply show the run and model from the 3rd iteration:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 3\n", - "third_run, third_model = local_run.get_output(iteration = iteration)\n", - "print(third_run)\n", - "print(third_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Predict on training and test set, and calculate residual values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred_train = fitted_model.predict(X_train)\n", - "y_residual_train = y_train - y_pred_train\n", - "\n", - "y_pred_test = fitted_model.predict(X_test)\n", - "y_residual_test = y_test - y_pred_test" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn import datasets\n", - "from sklearn.metrics import mean_squared_error, r2_score\n", - "\n", - "# set up a multi-plot chart\n", - "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n", - "f.suptitle('Regression Residual Values', fontsize = 18)\n", - "f.set_figheight(6)\n", - "f.set_figwidth(16)\n", - "\n", - "# plot residual values of training set\n", - "a0.axis([0, 360, -200, 200])\n", - "a0.plot(y_residual_train, 'bo', alpha = 0.5)\n", - "a0.plot([-10,360],[0,0], 'r-', lw = 3)\n", - "a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n", - "a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n", - "a0.set_xlabel('Training samples', fontsize = 12)\n", - "a0.set_ylabel('Residual Values', fontsize = 12)\n", - "# plot histogram\n", - "a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n", - "a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n", - "\n", - "# plot residual values of test set\n", - "a1.axis([0, 90, -200, 200])\n", - "a1.plot(y_residual_test, 'bo', alpha = 0.5)\n", - "a1.plot([-10,360],[0,0], 'r-', lw = 3)\n", - "a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n", - "a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n", - "a1.set_xlabel('Test samples', fontsize = 12)\n", - "a1.set_yticklabels([])\n", - "# plot histogram\n", - "a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n", - "a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n", - "\n", - "plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/03.auto-ml-remote-execution.ipynb b/automl/03.auto-ml-remote-execution.ipynb index f3c8b75f..cc892a1a 100644 --- a/automl/03.auto-ml-remote-execution.ipynb +++ b/automl/03.auto-ml-remote-execution.ipynb @@ -1,471 +1,480 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 03: Remote Execution using DSVM (Ubuntu)\n", + "\n", + "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you wiil learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Attach an existing DSVM to a workspace.\n", + "3. Configure AutoML using `AutoMLConfig`.\n", + "4. Train the model using the DSVM.\n", + "5. Explore the results.\n", + "6. Test the best fitted model.\n", + "\n", + "In addition, this notebook showcases the following features:\n", + "- **Parallel** executions for iterations\n", + "- **Asynchronous** tracking of progress\n", + "- **Cancellation** of individual iterations or the entire run\n", + "- Retrieving models for any iteration or logged metric\n", + "- Specifying AutoML settings as `**kwargs`\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the run history container in the workspace.\n", + "experiment_name = 'automl-remote-dsvm4'\n", + "project_folder = './sample_projects/automl-remote-dsvm4'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a Remote Linux DSVM\n", + "**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import DsvmCompute\n", + "\n", + "dsvm_name = 'mydsvm'\n", + "try:\n", + " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", + " print('Found an existing DSVM.')\n", + "except:\n", + " print('Creating a new DSVM.')\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", + " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", + " dsvm_compute.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Get Data File\n", + "For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", + "In this example, the `get_data()` function returns data from scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists(project_folder):\n", + " os.makedirs(project_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $project_folder/get_data.py\n", + "\n", + "from sklearn import datasets\n", + "from scipy import sparse\n", + "import numpy as np\n", + "\n", + "def get_data():\n", + " \n", + " digits = datasets.load_digits()\n", + " X_train = digits.data[100:,:]\n", + " y_train = digits.target[100:]\n", + "\n", + " return { \"X\" : X_train, \"y\" : y_train }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML \n", + "\n", + "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "\n", + "**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be less than the number of cores on the DSVM.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"max_time_sec\": 600,\n", + " \"iterations\": 20,\n", + " \"n_cross_validations\": 5,\n", + " \"primary_metric\": 'AUC_weighted',\n", + " \"preprocess\": False,\n", + " \"concurrent_iterations\": 2,\n", + " \"verbosity\": logging.INFO\n", + "}\n", + "\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " path = project_folder, \n", + " compute_target = dsvm_compute,\n", + " data_script = project_folder + \"/get_data.py\",\n", + " **automl_settings\n", + " )\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** The first run on a new DSVM may take several minutes to prepare the environment." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n", + "\n", + "In this example, we specify `show_output = False` to suppress console output while the run is in progress." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run = experiment.submit(automl_config, show_output = False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results\n", + "\n", + "#### Loading Executed Runs\n", + "In case you need to load a previously executed run, enable the cell below and replace the `run_id` value." + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "remote_run = AutoMLRun(experiment=experiment, run_id = 'AutoML_480d3ed6-fc94-44aa-8f4e-0b945db9d3ef')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(remote_run).show() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Wait until the run finishes.\n", + "remote_run.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(remote_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cancelling Runs\n", + "\n", + "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Cancel the ongoing experiment and stop scheduling new iterations.\n", + "# remote_run.cancel()\n", + "\n", + "# Cancel iteration 1 and move onto iteration 2.\n", + "# remote_run.cancel_iteration(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = remote_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `log_loss` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lookup_metric = \"log_loss\"\n", + "best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 3\n", + "third_run, third_model = remote_run.get_output(iteration = iteration)\n", + "print(third_run)\n", + "print(third_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model \n", + "\n", + "#### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test Our Best Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Randomly select digits and test.\n", + "for index in np.random.choice(len(y_test), 2, replace = False):\n", + " print(index)\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", + " fig = plt.figure(1, figsize=(3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 03: Remote Execution using DSVM (Ubuntu)\n", - "\n", - "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Attaching an existing DSVM to a workspace\n", - "3. Instantiating AutoMLConfig \n", - "4. Training the Model using the DSVM\n", - "5. Exploring the results\n", - "6. Testing the fitted model\n", - "\n", - "In addition this notebook showcases the following features\n", - "- **Parallel** Executions for iterations\n", - "- Asyncronous tracking of progress\n", - "- **Cancelling** individual iterations or the entire run\n", - "- Retrieving models for any iteration or logged metric\n", - "- specify automl settings as **kwargs**\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a workspace. For AutoML you would need to create a Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the run history container in the workspace\n", - "experiment_name = 'automl-remote-dsvm4'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-remote-dsvm4'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a Remote Linux DSVM\n", - "Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", - "\n", - "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import DsvmCompute\n", - "\n", - "dsvm_name = 'mydsvm'\n", - "try:\n", - " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", - " print('found existing dsvm.')\n", - "except:\n", - " print('creating new dsvm.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", - " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", - " dsvm_compute.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Get Data File\n", - "For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $project_folder/get_data.py\n", - "\n", - "from sklearn import datasets\n", - "from scipy import sparse\n", - "import numpy as np\n", - "\n", - "def get_data():\n", - " \n", - " digits = datasets.load_digits()\n", - " X_digits = digits.data[100:,:]\n", - " y_digits = digits.target[100:]\n", - "\n", - " return { \"X\" : X_digits, \"y\" : y_digits }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate AutoML \n", - "\n", - "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n", - "\n", - "Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"max_time_sec\": 600,\n", - " \"iterations\": 20,\n", - " \"n_cross_validations\": 5,\n", - " \"primary_metric\": 'AUC_weighted',\n", - " \"preprocess\": False,\n", - " \"concurrent_iterations\": 2,\n", - " \"verbosity\": logging.INFO\n", - "}\n", - "\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " path=project_folder, \n", - " compute_target = dsvm_compute,\n", - " data_script = project_folder + \"/get_data.py\",\n", - " **automl_settings\n", - " )\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that the first run on a new DSVM may take a several minutes to preparing the environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "remote_run = experiment.submit(automl_config, show_output=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the Results\n", - "\n", - "#### Loading executed runs\n", - "In case you need to load a previously executed run given a run id please enable the below cell" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "remote_run = AutoMLRun(experiment=experiment, run_id='AutoML_480d3ed6-fc94-44aa-8f4e-0b945db9d3ef')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(remote_run).show() " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# wait till the run finishes\n", - "remote_run.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(remote_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Canceling runs\n", - "\n", - "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Cancel the ongoing experiment and stop scheduling new iterations\n", - "# remote_run.cancel()\n", - "\n", - "# Cancel iteration 1 and move onto iteration 2\n", - "# remote_run.cancel_iteration(1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = remote_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric\n", - "Show the run/model which has the smallest `log_loss` value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration\n", - "Show the run and model from the 3rd iteration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 3\n", - "third_run, third_model = remote_run.get_output(iteration=iteration)\n", - "print(third_run)\n", - "print(third_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n", - "\n", - "#### Load Test Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Testing our best pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "for index in np.random.choice(len(y_digits), 2):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " title = \"Label value = %d Predicted value = %d \" % ( label,predicted)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/03b.auto-ml-remote-batchai.ipynb b/automl/03b.auto-ml-remote-batchai.ipynb index c63fbc37..9870cc48 100644 --- a/automl/03b.auto-ml-remote-batchai.ipynb +++ b/automl/03b.auto-ml-remote-batchai.ipynb @@ -1,522 +1,525 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 03: Remote Execution using Batch AI\n", + "\n", + "In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple classification problem.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you would see\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Attach an existing Batch AI compute to a workspace.\n", + "3. Configure AutoML using `AutoMLConfig`.\n", + "4. Train the model using Batch AI.\n", + "5. Explore the results.\n", + "6. Test the best fitted model.\n", + "\n", + "In addition this notebook showcases the following features\n", + "- **Parallel** executions for iterations\n", + "- **Asynchronous** tracking of progress\n", + "- **Cancellation** of individual iterations or the entire run\n", + "- Retrieving models for any iteration or logged metric\n", + "- Specifying AutoML settings as `**kwargs`\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the run history container in the workspace.\n", + "experiment_name = 'automl-remote-batchai'\n", + "project_folder = './sample_projects/automl-remote-batchai'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Batch AI Cluster\n", + "The cluster is created as Machine Learning Compute and will appear under your workspace.\n", + "\n", + "**Note:** The creation of the Batch AI cluster can take over 10 minutes, please be patient.\n", + "\n", + "As with other Azure services, there are limits on certain resources (e.g. Batch AI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import BatchAiCompute\n", + "from azureml.core.compute import ComputeTarget\n", + "\n", + "# Choose a name for your cluster.\n", + "batchai_cluster_name = \"mybatchai\"\n", + "\n", + "found = False\n", + "# Check if this compute target already exists in the workspace.\n", + "for ct_name, ct in ws.compute_targets().items():\n", + " print(ct.name, ct.type)\n", + " if (ct.name == batchai_cluster_name and ct.type == 'BatchAI'):\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = ct\n", + " break\n", + " \n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", + " #vm_priority = 'lowpriority', # optional\n", + " autoscale_enabled = True,\n", + " cluster_min_nodes = 1, \n", + " cluster_max_nodes = 4)\n", + "\n", + " # Create the cluster.\n", + " compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n", + " \n", + " # Can poll for a minimum number of nodes and for a specific timeout.\n", + " # If no min_node_count is provided, it will use the scale settings for the cluster.\n", + " compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", + " \n", + " # For a more detailed view of current Batch AI cluster status, use the 'status' property." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Get Data File\n", + "For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", + "In this example, the `get_data()` function returns data from scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists(project_folder):\n", + " os.makedirs(project_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $project_folder/get_data.py\n", + "\n", + "from sklearn import datasets\n", + "from scipy import sparse\n", + "import numpy as np\n", + "\n", + "def get_data():\n", + " \n", + " digits = datasets.load_digits()\n", + " X_train = digits.data\n", + " y_train = digits.target\n", + "\n", + " return { \"X\" : X_train, \"y\" : y_train }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Instantiate AutoML \n", + "\n", + "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "\n", + "**Note:** When using Batch AI, you can't pass Numpy arrays directly to the fit method.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"max_time_sec\": 120,\n", + " \"iterations\": 20,\n", + " \"n_cross_validations\": 5,\n", + " \"primary_metric\": 'AUC_weighted',\n", + " \"preprocess\": False,\n", + " \"concurrent_iterations\": 5,\n", + " \"verbosity\": logging.INFO\n", + "}\n", + "\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " path = project_folder,\n", + " compute_target = compute_target,\n", + " data_script = project_folder + \"/get_data.py\",\n", + " **automl_settings\n", + " )\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n", + "In this example, we specify `show_output = False` to suppress console output while the run is in progress." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run = experiment.submit(automl_config, show_output = False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results\n", + "\n", + "#### Loading executed runs\n", + "In case you need to load a previously executed run, enable the cell below and replace the `run_id` value." + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(remote_run).show() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Wait until the run finishes.\n", + "remote_run.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(remote_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cancelling runs\n", + "\n", + "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Cancel the ongoing experiment and stop scheduling new iterations.\n", + "# remote_run.cancel()\n", + "\n", + "# Cancel iteration 1 and move onto iteration 2.\n", + "# remote_run.cancel_iteration(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = remote_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `log_loss` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lookup_metric = \"log_loss\"\n", + "best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 3\n", + "third_run, third_model = remote_run.get_output(iteration=iteration)\n", + "print(third_run)\n", + "print(third_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "remote_run.register_model(description = description, tags = tags)\n", + "remote_run.model_id # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing the Fitted Model \n", + "\n", + "#### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Testing Our Best Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Randomly select digits and test.\n", + "for index in np.random.choice(len(y_test), 2, replace = False):\n", + " print(index)\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", + " fig = plt.figure(1, figsize=(3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 03: Remote Execution using Batch AI\n", - "\n", - "In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple classification problem.\n", - "\n", - "Make sure you have executed the [setup](setup.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Attaching an existing Batch AI compute to a workspace\n", - "3. Instantiating AutoMLConfig \n", - "4. Training the Model using the Batch AI\n", - "5. Exploring the results\n", - "6. Testing the fitted model\n", - "\n", - "In addition this notebook showcases the following features\n", - "- **Parallel** Executions for iterations\n", - "- Asyncronous tracking of progress\n", - "- **Cancelling** individual iterations or the entire run\n", - "- Retrieving models for any iteration or logged metric\n", - "- specify automl settings as **kwargs**\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a workspace. For AutoML you would need to create a Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the run history container in the workspace\n", - "experiment_name = 'automl-remote-batchai'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-remote-batchai'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Batch AI Cluster\n", - "The cluster is created as Machine Learning Compute and will appear under your workspace.\n", - "\n", - "Note: The cluster creation can take over 10 minutes, please be patient.\n", - "\n", - "As with other Azure services, there are limits on certain resources (for eg. BatchAI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import BatchAiCompute\n", - "from azureml.core.compute import ComputeTarget\n", - "\n", - "# choose a name for your cluster\n", - "batchai_cluster_name = ws.name + \"cpu\"\n", - "\n", - "found = False\n", - "# see if this compute target already exists in the workspace\n", - "for ct in ws.compute_targets():\n", - " print(ct.name, ct.type)\n", - " if (ct.name == batchai_cluster_name and ct.type == 'BatchAI'):\n", - " found = True\n", - " print('found compute target. just use it.')\n", - " compute_target = ct\n", - " break\n", - " \n", - "if not found:\n", - " print('creating a new compute target...')\n", - " provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " autoscale_enabled = True,\n", - " cluster_min_nodes = 1, \n", - " cluster_max_nodes = 4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws,batchai_cluster_name, provisioning_config)\n", - " \n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it will use the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - " \n", - " # For a more detailed view of current BatchAI cluster status, use the 'status' property " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Get Data File\n", - "For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $project_folder/get_data.py\n", - "\n", - "from sklearn import datasets\n", - "from scipy import sparse\n", - "import numpy as np\n", - "\n", - "def get_data():\n", - " \n", - " digits = datasets.load_digits()\n", - " X_digits = digits.data\n", - " y_digits = digits.target\n", - "\n", - " return { \"X\" : X_digits, \"y\" : y_digits }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate AutoML \n", - "\n", - "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n", - "\n", - "Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"max_time_sec\": 120,\n", - " \"iterations\": 20,\n", - " \"n_cross_validations\": 5,\n", - " \"primary_metric\": 'AUC_weighted',\n", - " \"preprocess\": False,\n", - " \"concurrent_iterations\": 5,\n", - " \"verbosity\": logging.INFO\n", - "}\n", - "\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " path=project_folder,\n", - " compute_target = compute_target,\n", - " data_script = project_folder + \"/get_data.py\",\n", - " **automl_settings\n", - " )\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "remote_run = experiment.submit(automl_config, show_output=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the Results\n", - "\n", - "#### Loading executed runs\n", - "In case you need to load a previously executed run given a run id please enable the below cell" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "remote_run = AutoMLRun(experiment=experiment, run_id='AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "remote_run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(remote_run).show() " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# wait till the run finishes\n", - "remote_run.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(remote_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Canceling runs\n", - "\n", - "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Cancel the ongoing experiment and stop scheduling new iterations\n", - "# remote_run.cancel()\n", - "\n", - "# Cancel iteration 1 and move onto iteration 2\n", - "# remote_run.cancel_iteration(1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = remote_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric\n", - "Show the run/model which has the smallest `log_loss` value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration\n", - "Show the run and model from the 3rd iteration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 3\n", - "third_run, third_model = remote_run.get_output(iteration=iteration)\n", - "print(third_run)\n", - "print(third_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "remote_run.register_model(description=description, tags=tags)\n", - "remote_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n", - "\n", - "#### Load Test Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Testing our best pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "for index in np.random.choice(len(y_digits), 2):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " title = \"Label value = %d Predicted value = %d \" % ( label,predicted)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/04.auto-ml-remote-execution-text-data-blob-store.ipynb b/automl/04.auto-ml-remote-execution-text-data-blob-store.ipynb index 6e4bb178..0465c64b 100644 --- a/automl/04.auto-ml-remote-execution-text-data-blob-store.ipynb +++ b/automl/04.auto-ml-remote-execution-text-data-blob-store.ipynb @@ -1,495 +1,488 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Auto ML 04: Remote Execution with Text Data from Azure Blob Storage\n", + "\n", + "In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from Azure Blob Storage.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Attach an existing DSVM to a workspace.\n", + "3. Configure AutoML using `AutoMLConfig`.\n", + "4. Train the model using the DSVM.\n", + "5. Explore the results.\n", + "6. Test the best fitted model.\n", + "\n", + "In addition this notebook showcases the following features\n", + "- **Parallel** executions for iterations\n", + "- **Asynchronous** tracking of progress\n", + "- **Cancellation** of individual iterations or the entire run\n", + "- Retrieving models for any iteration or logged metric\n", + "- Specifying AutoML settings as `**kwargs`\n", + "- Handling **text** data using the `preprocess` flag\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the run history container in the workspace.\n", + "experiment_name = 'automl-remote-dsvm-blobstore'\n", + "project_folder = './sample_projects/automl-remote-dsvm-blobstore'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Attach a Remote Linux DSVM\n", + "To use a remote Docker compute target:\n", + "1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n", + "2. Enter the IP address, user name and password below.\n", + "\n", + "**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on changing SSH ports for security reasons." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import RemoteCompute\n", + "\n", + "# Add your VM information below\n", + "dsvm_name = 'mydsvm1'\n", + "dsvm_ip_addr = '<>'\n", + "dsvm_ssh_port = 22\n", + "dsvm_username = '<>'\n", + "dsvm_password = '<>'\n", + "\n", + "dsvm_compute = RemoteCompute.attach(workspace=ws, name=dsvm_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=dsvm_ssh_port)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Get Data File\n", + "For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", + "In this example, the `get_data()` function returns a [dictionary](README.md#getdata)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists(project_folder):\n", + " os.makedirs(project_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $project_folder/get_data.py\n", + "\n", + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import LabelEncoder\n", + "\n", + "def get_data():\n", + " # Load Burning Man 2016 data.\n", + " df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", + " delimiter=\"\\t\", quotechar='\"')\n", + " # Get integer labels.\n", + " le = LabelEncoder()\n", + " le.fit(df[\"Label\"].values)\n", + " y = le.transform(df[\"Label\"].values)\n", + " df = df.drop([\"Label\"], axis=1)\n", + "\n", + " df_train, _, y_train, _ = train_test_split(df, y, test_size = 0.1, random_state = 42)\n", + "\n", + " return { \"X\" : df, \"y\" : y }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### View data\n", + "\n", + "You can execute the `get_data()` function locally to view the training data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%run $project_folder/get_data.py\n", + "data_dict = get_data()\n", + "df = data_dict[\"X\"]\n", + "y = data_dict[\"y\"]\n", + "pd.set_option('display.max_colwidth', 15)\n", + "df['Label'] = pd.Series(y, index=df.index)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML \n", + "\n", + "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "\n", + "**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n", + "|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n", + "|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*; you can set it to *-1* to use all cores.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"max_time_sec\": 3600,\n", + " \"iterations\": 10,\n", + " \"n_cross_validations\": 5,\n", + " \"primary_metric\": 'AUC_weighted',\n", + " \"preprocess\": True,\n", + " \"max_cores_per_iteration\": 2\n", + "}\n", + "\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " path = project_folder,\n", + " compute_target = dsvm_compute,\n", + " data_script = project_folder + \"/get_data.py\",\n", + " **automl_settings\n", + " )\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model \n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run = experiment.submit(automl_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exploring the results \n", + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(remote_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(remote_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cancelling runs\n", + "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Cancel the ongoing experiment and stop scheduling new iterations.\n", + "remote_run.cancel()\n", + "\n", + "# Cancel iteration 1 and move onto iteration 2.\n", + "# remote_run.cancel_iteration(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = remote_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `accuracy` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# lookup_metric = \"accuracy\"\n", + "# best_run, fitted_model = remote_run.get_output(metric = lookup_metric)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 0\n", + "zero_run, zero_model = remote_run.get_output(iteration = iteration)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "remote_run.register_model(description = description, tags = tags)\n", + "print(remote_run.model_id) # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing the Fitted Model \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sklearn\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from pandas_ml import ConfusionMatrix\n", + "\n", + "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", + " delimiter=\"\\t\", quotechar='\"')\n", + "\n", + "# get integer labels\n", + "le = LabelEncoder()\n", + "le.fit(df[\"Label\"].values)\n", + "y = le.transform(df[\"Label\"].values)\n", + "df = df.drop([\"Label\"], axis=1)\n", + "\n", + "_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n", + "\n", + "\n", + "ypred = fitted_model.predict(df_test.values)\n", + "\n", + "\n", + "ypred_strings = le.inverse_transform(ypred)\n", + "ytest_strings = le.inverse_transform(y_test)\n", + "\n", + "cm = ConfusionMatrix(ytest_strings, ypred_strings)\n", + "\n", + "print(cm)\n", + "\n", + "cm.plot()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Auto ML : Remote Execution with Text data from Blobstorage\n", - "\n", - "In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from a Azure blobstorage.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Attaching an existing DSVM to a workspace\n", - "3. Instantiating AutoMLConfig \n", - "4. Training the Model using the DSVM\n", - "5. Exploring the results\n", - "6. Testing the fitted model\n", - "\n", - "In addition this notebook showcases the following features\n", - "- **Parallel** Executions for iterations\n", - "- Asyncronous tracking of progress\n", - "- **Cancelling** individual iterations or the entire run\n", - "- Retrieving models for any iteration or logged metric\n", - "- specify automl settings as **kwargs**\n", - "- handling **text** data with **preprocess** flag\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the run history container in the workspace\n", - "experiment_name = 'automl-remote-dsvm-blobstore'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-remote-dsvm-blobstore'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Attach a Remote Linux DSVM\n", - "To use remote docker commpute target:\n", - "1. Create a Linux DSVM in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS. Make sure that disk space is available under /tmp because AutoML creates files under /tmp/azureml_runs. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4Gb per core.\n", - "2. Enter the IP address, username and password below\n", - "\n", - "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import RemoteCompute\n", - "\n", - "# Add your VM information below\n", - "dsvm_name = 'mydsvm1'\n", - "dsvm_ip_addr = '<>'\n", - "dsvm_username = '<>'\n", - "dsvm_password = '<>'\n", - "\n", - "dsvm_compute = RemoteCompute.attach(workspace=ws, name=dsvm_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=22)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Get Data File\n", - "For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", - "\n", - "The *get_data()* function returns a [dictionary](README.md#getdata)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $project_folder/get_data.py\n", - "\n", - "import pandas as pd\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import LabelEncoder\n", - "\n", - "def get_data():\n", - " # Burning man 2016 data\n", - " df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", - " delimiter=\"\\t\", quotechar='\"')\n", - " # get integer labels\n", - " le = LabelEncoder()\n", - " le.fit(df[\"Label\"].values)\n", - " y = le.transform(df[\"Label\"].values)\n", - " df = df.drop([\"Label\"], axis=1)\n", - "\n", - " df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n", - "\n", - " return { \"X\" : df, \"y\" : y }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View data\n", - "\n", - "You can execute the *get_data()* function locally to view the *train* data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%run $project_folder/get_data.py\n", - "data_dict = get_data()\n", - "df = data_dict[\"X\"]\n", - "y = data_dict[\"y\"]\n", - "pd.set_option('display.max_colwidth', 15)\n", - "df['Label'] = pd.Series(y, index=df.index)\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate AutoML \n", - "\n", - "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n", - "\n", - "Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n", - "|**preprocess**| *True/False*
Setting this to *True* enables AutoML to perform preprocessing
on the input to handle *missing data*, and perform some common *feature extraction*|\n", - "|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*, you can set it to *-1* to use all cores|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"max_time_sec\": 3600,\n", - " \"iterations\": 10,\n", - " \"n_cross_validations\": 5,\n", - " \"primary_metric\": 'AUC_weighted',\n", - " \"preprocess\": True,\n", - " \"max_cores_per_iteration\": 2\n", - "}\n", - "\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " path=project_folder,\n", - " compute_target = dsvm_compute,\n", - " data_script = project_folder + \"/get_data.py\",\n", - " **automl_settings\n", - " )\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model \n", - "\n", - "For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "remote_run = experiment.submit(automl_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the Results \n", - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(remote_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(remote_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Canceling runs\n", - "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Cancel the ongoing experiment and stop scheduling new iterations\n", - "remote_run.cancel()\n", - "\n", - "# Cancel iteration 1 and move onto iteration 2\n", - "# remote_run.cancel_iteration(1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = remote_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# lookup_metric = \"accuracy\"\n", - "# best_run, fitted_model = remote_run.get_output(metric=lookup_metric)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 0\n", - "zero_run, zero_model = remote_run.get_output(iteration=iteration)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "remote_run.register_model(description=description, tags=tags)\n", - "remote_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import sklearn\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import LabelEncoder\n", - "from pandas_ml import ConfusionMatrix\n", - "\n", - "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", - " delimiter=\"\\t\", quotechar='\"')\n", - "\n", - "# get integer labels\n", - "le = LabelEncoder()\n", - "le.fit(df[\"Label\"].values)\n", - "y = le.transform(df[\"Label\"].values)\n", - "df = df.drop([\"Label\"], axis=1)\n", - "\n", - "_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n", - "\n", - "\n", - "ypred = fitted_model.predict(df_test.values)\n", - "\n", - "\n", - "ypred_strings = le.inverse_transform(ypred)\n", - "ytest_strings = le.inverse_transform(y_test)\n", - "\n", - "cm = ConfusionMatrix(ytest_strings, ypred_strings)\n", - "\n", - "print(cm)\n", - "\n", - "cm.plot()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb b/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb index 22bff7a0..521d6681 100644 --- a/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb +++ b/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb @@ -1,396 +1,395 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 05: Blacklisting Models, Early Termination, and Handling Missing Data\n", + "\n", + "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "4. Train the model.\n", + "5. Explore the results.\n", + "6. Test the best fitted model.\n", + "\n", + "In addition this notebook showcases the following features\n", + "- **Blacklisting** certain pipelines\n", + "- Specifying **target metrics** to indicate stopping criteria\n", + "- Handling **missing data** in the input\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the experiment.\n", + "experiment_name = 'automl-local-missing-data'\n", + "project_folder = './sample_projects/automl-local-missing-data'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating missing data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from scipy import sparse\n", + "\n", + "digits = datasets.load_digits()\n", + "X_train = digits.data[10:,:]\n", + "y_train = digits.target[10:]\n", + "\n", + "# Add missing values in 75% of the lines.\n", + "missing_rate = 0.75\n", + "n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n", + "missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n", + "rng = np.random.RandomState(0)\n", + "rng.shuffle(missing_samples)\n", + "missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n", + "X_train[np.where(missing_samples)[0], missing_features] = np.nan" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame(data = X_train)\n", + "df['Label'] = pd.Series(y_train, index=df.index)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n", + "|**exit_score**|*double* value indicating the target for *primary_metric*.
Once the target is surpassed the run terminates.|\n", + "|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for AutoML.

Allowed values for **Classification**
LogisticRegression
SGDClassifierWrapper
NBWrapper
BernoulliNB
SVCWrapper
LinearSVMWrapper
KNeighborsClassifier
DecisionTreeClassifier
RandomForestClassifier
ExtraTreesClassifier
LightGBMClassifier

Allowed values for **Regression**
ElasticNet
GradientBoostingRegressor
DecisionTreeRegressor
KNeighborsRegressor
LassoLars
SGDRegressor
RandomForestRegressor
ExtraTreesRegressor|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 3600,\n", + " iterations = 20,\n", + " n_cross_validations = 5,\n", + " preprocess = True,\n", + " exit_score = 0.994,\n", + " blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `accuracy` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# lookup_metric = \"accuracy\"\n", + "# best_run, fitted_model = local_run.get_output(metric = lookup_metric)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# iteration = 3\n", + "# best_run, fitted_model = local_run.get_output(iteration = iteration)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "local_run.register_model(description = description, tags = tags)\n", + "local_run.model_id # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing the Fitted Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]\n", + "\n", + "# Randomly select digits and test.\n", + "for index in np.random.choice(len(y_test), 2, replace = False):\n", + " print(index)\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", + " fig = plt.figure(1, figsize=(3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 05 : Blacklisting models, Early termination and handling missing data\n", - "\n", - "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metric so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a black list of algos that AutoML will ignore for this run.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Instantiating AutoMLConfig\n", - "4. Training the Model\n", - "5. Exploring the results\n", - "6. Testing the fitted model\n", - "\n", - "In addition this notebook showcases the following features\n", - "- **Blacklist** certain pipelines\n", - "- Specify a **target metrics** to indicate stopping criteria\n", - "- Handling **Missing Data** in the input\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the experiment\n", - "experiment_name = 'automl-local-missing-data'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-missing-data'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Creating Missing Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from scipy import sparse\n", - "\n", - "digits = datasets.load_digits()\n", - "X_digits = digits.data[10:,:]\n", - "y_digits = digits.target[10:]\n", - "\n", - "# Add missing values in 75% of the lines\n", - "missing_rate = 0.75\n", - "n_missing_samples = int(np.floor(X_digits.shape[0] * missing_rate))\n", - "missing_samples = np.hstack((np.zeros(X_digits.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n", - "rng = np.random.RandomState(0)\n", - "rng.shuffle(missing_samples)\n", - "missing_features = rng.randint(0, X_digits.shape[1], n_missing_samples)\n", - "X_digits[np.where(missing_samples)[0], missing_features] = np.nan" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df = pd.DataFrame(data=X_digits)\n", - "df['Label'] = pd.Series(y_digits, index=df.index)\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "\n", - "This defines the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**preprocess**| *True/False*
Setting this to *True* enables Auto ML to perform preprocessing
on the input to handle *missing data*, and perform some common *feature extraction*|\n", - "|**exit_score**|*double* value indicating the target for *primary_metric*.
Once the target is surpassed the run terminates|\n", - "|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.

Allowed values for **Classification**
LogisticRegression
SGDClassifierWrapper
NBWrapper
BernoulliNB
SVCWrapper
LinearSVMWrapper
KNeighborsClassifier
DecisionTreeClassifier
RandomForestClassifier
ExtraTreesClassifier
LightGBMClassifier

Allowed values for **Regression**
ElasticNet
GradientBoostingRegressor
DecisionTreeRegressor
KNeighborsRegressor
LassoLars
SGDRegressor
RandomForestRegressor
ExtraTreesRegressor|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " primary_metric = 'AUC_weighted',\n", - " max_time_sec = 3600,\n", - " iterations = 20,\n", - " n_cross_validations = 5,\n", - " preprocess = True,\n", - " exit_score = 0.994,\n", - " blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n", - " verbosity = logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "NOTE: The widget will display a link at the bottom. This will not currently work, but will eventually link to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. Each pipeline is a tuple of three elements. The first element is the score for the pipeline the second element is the string description of the pipeline and the last element are the pipeline objects used for each fold in the cross-validation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# lookup_metric = \"accuracy\"\n", - "# best_run, fitted_model = local_run.get_output(metric=lookup_metric)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# iteration = 3\n", - "# best_run, fitted_model = local_run.get_output(iteration=iteration)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "local_run.register_model(description=description, tags=tags)\n", - "local_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]\n", - "\n", - "#Randomly select digits and test\n", - "for index in np.random.choice(len(y_digits), 2):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " title = \"Label value = %d Predicted value = %d \" % ( label,predicted)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb b/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb index ac683123..4ab76d6e 100644 --- a/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb +++ b/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb @@ -1,418 +1,418 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 06: Custom CV Splits and Handling Sparse Data\n", + "\n", + "In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML for handling sparse data and how to specify custom cross validations splits.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "4. Train the model.\n", + "5. Explore the results.\n", + "6. Test the best fitted model.\n", + "\n", + "In addition this notebook showcases the following features\n", + "- **Custom CV** splits \n", + "- Handling **sparse data** in the input" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# choose a name for the experiment\n", + "experiment_name = 'automl-local-missing-data'\n", + "# project folder\n", + "project_folder = './sample_projects/automl-local-missing-data'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating Sparse Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_20newsgroups\n", + "from sklearn.feature_extraction.text import HashingVectorizer\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "remove = ('headers', 'footers', 'quotes')\n", + "categories = [\n", + " 'alt.atheism',\n", + " 'talk.religion.misc',\n", + " 'comp.graphics',\n", + " 'sci.space',\n", + "]\n", + "data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n", + " shuffle = True, random_state = 42,\n", + " remove = remove)\n", + "\n", + "X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n", + "\n", + "\n", + "vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n", + " n_features = 2**16)\n", + "X_train = vectorizer.transform(X_train)\n", + "X_validation = vectorizer.transform(X_validation)\n", + "\n", + "summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n", + "summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n", + "summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n", + "summary_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.
**Note:** If input data is sparse, you cannot use *True*.|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", + "|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n", + "|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification for the custom validation set.|\n", + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 3600,\n", + " iterations = 5,\n", + " preprocess = False,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " X_valid = X_validation, \n", + " y_valid = y_validation, \n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + " \n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `accuracy` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# lookup_metric = \"accuracy\"\n", + "# best_run, fitted_model = local_run.get_output(metric = lookup_metric)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the third iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# iteration = 3\n", + "# best_run, fitted_model = local_run.get_output(iteration = iteration)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "local_run.register_model(description = description, tags = tags)\n", + "local_run.model_id # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing the Fitted Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load test data.\n", + "import sklearn\n", + "from pandas_ml import ConfusionMatrix\n", + "\n", + "remove = ('headers', 'footers', 'quotes')\n", + "categories = [\n", + " 'alt.atheism',\n", + " 'talk.religion.misc',\n", + " 'comp.graphics',\n", + " 'sci.space',\n", + "]\n", + "\n", + "\n", + "data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n", + " shuffle = True, random_state = 42,\n", + " remove = remove)\n", + "\n", + "vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n", + " n_features = 2**16)\n", + "\n", + "X_test = vectorizer.transform(data_test.data)\n", + "y_test = data_test.target\n", + "\n", + "# Test our best pipeline.\n", + "\n", + "y_pred = fitted_model.predict(X_test)\n", + "y_pred_strings = [data_test.target_names[i] for i in y_pred]\n", + "y_test_strings = [data_test.target_names[i] for i in y_test]\n", + "\n", + "cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n", + "print(cm)\n", + "cm.plot()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 06: Custom CV splits, handling sparse data\n", - "\n", - "In this example we use the scikit learn's [20newsgroup](In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling sparse data and specify custom cross validation splits.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Instantiating AutoMLConfig\n", - "4. Training the Model\n", - "5. Exploring the results\n", - "6. Testing the fitted model\n", - "\n", - "In addition this notebook showcases the following features\n", - "- **Custom CV** splits \n", - "- Handling **Sparse Data** in the input" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for the experiment\n", - "experiment_name = 'automl-local-missing-data'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-missing-data'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating Sparse Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import fetch_20newsgroups\n", - "from sklearn.feature_extraction.text import HashingVectorizer\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "remove = ('headers', 'footers', 'quotes')\n", - "categories = [\n", - " 'alt.atheism',\n", - " 'talk.religion.misc',\n", - " 'comp.graphics',\n", - " 'sci.space',\n", - "]\n", - "data_train = fetch_20newsgroups(subset='train', categories=categories,\n", - " shuffle=True, random_state=42,\n", - " remove=remove)\n", - "\n", - "X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size=0.33, random_state=42)\n", - "\n", - "\n", - "vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,\n", - " n_features=2**16)\n", - "X_train = vectorizer.transform(X_train)\n", - "X_validation = vectorizer.transform(X_validation)\n", - "\n", - "summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n", - "summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n", - "summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n", - "summary_df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "This defines the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**preprocess**| *True/False*
Setting this to *True* enables Auto ML to perform preprocessing
on the input to handle *missing data*, and perform some common *feature extraction*
*Note: If input data is Sparse you cannot use preprocess=True*|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n", - "|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom Validation set|\n", - "|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. for the custom Validation set|\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log='automl_errors.log',\n", - " primary_metric='AUC_weighted',\n", - " max_time_sec=3600,\n", - " iterations=5,\n", - " preprocess=False,\n", - " verbosity=logging.INFO,\n", - " X = X_train, \n", - " y = y_train,\n", - " X_valid = X_validation, \n", - " y_valid = y_validation, \n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - " \n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# lookup_metric = \"accuracy\"\n", - "# best_run, fitted_model = local_run.get_output(metric=lookup_metric)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# iteration = 3\n", - "# best_run, fitted_model = local_run.get_output(iteration=iteration)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "local_run.register_model(description=description, tags=tags)\n", - "local_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()### Testing the Fitted Model\n", - "\n", - "#### Load Test Data\n", - "import sklearn\n", - "from pandas_ml import ConfusionMatrix\n", - "\n", - "remove = ('headers', 'footers', 'quotes')\n", - "categories = [\n", - " 'alt.atheism',\n", - " 'talk.religion.misc',\n", - " 'comp.graphics',\n", - " 'sci.space',\n", - "]\n", - "\n", - "\n", - "data_test = fetch_20newsgroups(subset='test', categories=categories,\n", - " shuffle=True, random_state=42,\n", - " remove=remove)\n", - "\n", - "vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,\n", - " n_features=2**16)\n", - "\n", - "X_test = vectorizer.transform(data_test.data)\n", - "y_test = data_test.target\n", - "\n", - "#### Testing our best pipeline\n", - "\n", - "ypred = fitted_model.predict(X_test)\n", - "ypred_strings = [categories[i] for i in ypred]\n", - "ytest_strings = [categories[i] for i in y_test]\n", - "\n", - "cm = ConfusionMatrix(ytest_strings, ypred_strings)\n", - "print(cm)\n", - "cm.plot()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/07.auto-ml-exploring-previous-runs.ipynb b/automl/07.auto-ml-exploring-previous-runs.ipynb index 2258385d..d720c3b3 100644 --- a/automl/07.auto-ml-exploring-previous-runs.ipynb +++ b/automl/07.auto-ml-exploring-previous-runs.ipynb @@ -1,326 +1,327 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 07: Exploring Previous Runs\n", + "\n", + "In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. List all experiments in a workspace.\n", + "2. List all AutoML runs in an experiment.\n", + "3. Get details for an AutoML run, including settings, run widget, and all metrics.\n", + "4. Download a fitted pipeline for any iteration.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# List all AutoML Experiments in a Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "import re\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.run import Run\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "experiment_list = Experiment.list(workspace=ws)\n", + "\n", + "summary_df = pd.DataFrame(index = ['No of Runs'])\n", + "pattern = re.compile('^AutoML_[^_]*$')\n", + "for experiment in experiment_list:\n", + " all_runs = list(experiment.get_runs())\n", + " automl_runs = []\n", + " for run in all_runs:\n", + " if(pattern.match(run.id)):\n", + " automl_runs.append(run) \n", + " summary_df[experiment.name] = [len(automl_runs)]\n", + " \n", + "pd.set_option('display.max_colwidth', -1)\n", + "summary_df.T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# List AutoML runs for an experiment\n", + "Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n", + "\n", + "proj = ws.experiments()[experiment_name]\n", + "summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n", + "pattern = re.compile('^AutoML_[^_]*$')\n", + "all_runs = list(proj.get_runs(properties={'azureml.runsource': 'automl'}))\n", + "for run in all_runs:\n", + " if(pattern.match(run.id)):\n", + " properties = run.get_properties()\n", + " tags = run.get_tags()\n", + " amlsettings = eval(properties['RawAMLSettingsString'])\n", + " if 'iterations' in tags:\n", + " iterations = tags['iterations']\n", + " else:\n", + " iterations = properties['num_iterations']\n", + " summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n", + " \n", + "from IPython.display import HTML\n", + "projname_html = HTML(\"

{}

\".format(proj.name))\n", + "\n", + "from IPython.display import display\n", + "display(projname_html)\n", + "display(summary_df.T)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Get details for an AutoML run\n", + "\n", + "Copy the project name and run id from the previous cell output to find more details on a particular run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_id = '' # Filling your own run_id from above run ids\n", + "assert (run_id in summary_df.keys()),\"Run id not found! Please set run id to a value from above run ids\"\n", + "\n", + "from azureml.train.widgets import RunDetails\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n", + "\n", + "summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n", + "properties = ml_run.get_properties()\n", + "tags = ml_run.get_tags()\n", + "status = ml_run.get_details()\n", + "amlsettings = eval(properties['RawAMLSettingsString'])\n", + "if 'iterations' in tags:\n", + " iterations = tags['iterations']\n", + "else:\n", + " iterations = properties['num_iterations']\n", + "start_time = None\n", + "if 'startTimeUtc' in status:\n", + " start_time = status['startTimeUtc']\n", + "end_time = None\n", + "if 'endTimeUtc' in status:\n", + " end_time = status['endTimeUtc']\n", + "summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n", + "display(HTML('

Runtime Details

'))\n", + "display(summary_df)\n", + "\n", + "#settings_df = pd.DataFrame(data = amlsettings, index = [''])\n", + "display(HTML('

AutoML Settings

'))\n", + "display(amlsettings)\n", + "\n", + "display(HTML('

Iterations

'))\n", + "RunDetails(ml_run).show() \n", + "\n", + "children = list(ml_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "display(HTML('

Metrics

'))\n", + "display(rundata)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Download fitted models" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download the Best Model for Any Given Metric" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metric = 'AUC_weighted' # Replace with a metric name.\n", + "best_run, fitted_model = ml_run.get_output(metric = metric)\n", + "fitted_model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download the Model for Any Given Iteration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 4 # Replace with an iteration number.\n", + "best_run, fitted_model = ml_run.get_output(iteration = iteration)\n", + "fitted_model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register fitted model for deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "ml_run.register_model(description = description, tags = tags)\n", + "ml_run.model_id # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the Best Model for Any Given Metric" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metric = 'AUC_weighted' # Replace with a metric name.\n", + "description = 'AutoML Model'\n", + "tags = None\n", + "ml_run.register_model(description = description, tags = tags, metric = metric)\n", + "print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the Model for Any Given Iteration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 4 # Replace with an iteration number.\n", + "description = 'AutoML Model'\n", + "tags = None\n", + "ml_run.register_model(description = description, tags = tags, iteration = iteration)\n", + "print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 07: Exploring previous runs\n", - "\n", - "In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. List all Experiments for the workspace\n", - "2. List AutoML runs for an Experiment\n", - "3. Get details for a AutoML Run. (Automl settings, run widget & all metrics)\n", - "4. Download fitted pipeline for any iteration\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# List all AutoML Experiments in a Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "import re\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.run import Run\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "experiment_list = Experiment.list(workspace=ws)\n", - "\n", - "summary_df = pd.DataFrame(index = ['No of Runs'])\n", - "pattern = re.compile('^AutoML_[^_]*$')\n", - "for experiment in experiment_list:\n", - " all_runs = list(experiment.get_runs())\n", - " automl_runs = []\n", - " for run in all_runs:\n", - " if(pattern.match(run.id)):\n", - " automl_runs.append(run) \n", - " summary_df[experiment.name] = [len(automl_runs)]\n", - " \n", - "pd.set_option('display.max_colwidth', -1)\n", - "summary_df.T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# List AutoML runs for an Experiment\n", - "You can set Experiment name with any experiment name from the result of the Experiment.list cell to load the AutoML runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell\n", - "\n", - "proj = ws.experiments()[experiment_name]\n", - "summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n", - "pattern = re.compile('^AutoML_[^_]*$')\n", - "all_runs = list(proj.get_runs(properties={'azureml.runsource': 'automl'}))\n", - "for run in all_runs:\n", - " if(pattern.match(run.id)):\n", - " properties = run.get_properties()\n", - " tags = run.get_tags()\n", - " amlsettings = eval(properties['RawAMLSettingsString'])\n", - " if 'iterations' in tags:\n", - " iterations = tags['iterations']\n", - " else:\n", - " iterations = properties['num_iterations']\n", - " summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n", - " \n", - "from IPython.display import HTML\n", - "projname_html = HTML(\"

{}

\".format(proj.name))\n", - "\n", - "from IPython.display import display\n", - "display(projname_html)\n", - "display(summary_df.T)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Get Details for a Auto ML Run\n", - "\n", - "Copy the project name and run id from the previous cell output to find more details on a particular run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_id = '' # Filling your own run_id\n", - "\n", - "from azureml.train.widgets import RunDetails\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "ml_run = AutoMLRun(experiment=experiment, run_id=run_id)\n", - "\n", - "summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n", - "properties = ml_run.get_properties()\n", - "tags = ml_run.get_tags()\n", - "status = ml_run.get_details()\n", - "amlsettings = eval(properties['RawAMLSettingsString'])\n", - "if 'iterations' in tags:\n", - " iterations = tags['iterations']\n", - "else:\n", - " iterations = properties['num_iterations']\n", - "start_time = None\n", - "if 'startTimeUtc' in status:\n", - " start_time = status['startTimeUtc']\n", - "end_time = None\n", - "if 'endTimeUtc' in status:\n", - " end_time = status['endTimeUtc']\n", - "summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n", - "display(HTML('

Runtime Details

'))\n", - "display(summary_df)\n", - "\n", - "#settings_df = pd.DataFrame(data=amlsettings, index=[''])\n", - "display(HTML('

AutoML Settings

'))\n", - "display(amlsettings)\n", - "\n", - "display(HTML('

Iterations

'))\n", - "RunDetails(ml_run).show() \n", - "\n", - "children = list(ml_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "display(HTML('

Metrics

'))\n", - "display(rundata)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Download fitted models" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download best model for any given metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metric = 'AUC_weighted' # Replace with a metric name\n", - "best_run, fitted_model = ml_run.get_output(metric=metric)\n", - "fitted_model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download model for any given iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 4 # Replace with an interation number\n", - "best_run, fitted_model = ml_run.get_output(iteration=iteration)\n", - "fitted_model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "ml_run.register_model(description=description, tags=tags)\n", - "ml_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register best model for any given metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metric = 'AUC_weighted' # Replace with a metric name\n", - "description = 'AutoML Model'\n", - "tags = None\n", - "ml_run.register_model(description=description, tags=tags, metric=metric)\n", - "ml_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register model for any given iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 4 # Replace with an interation number\n", - "description = 'AutoML Model'\n", - "tags = None\n", - "ml_run.register_model(description=description, tags=tags, iteration=iteration)\n", - "ml_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb b/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb index 9eead8a1..d0113534 100644 --- a/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb +++ b/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb @@ -1,480 +1,478 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 08: Remote Execution with Text File\n", + "\n", + "This sample accesses a data file on a remote DSVM. This is more efficient than reading the file from Azure Blob storage in the `get_data` method.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Configure the DSVM to allow files to be accessed directly by the `get_data` function.\n", + "2. Using `get_data` to return data from a local file.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# choose a name for experiment\n", + "experiment_name = 'automl-remote-dsvm-file'\n", + "# project folder\n", + "project_folder = './sample_projects/automl-remote-dsvm-file'\n", + "\n", + "experiment=Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a Remote Linux DSVM\n", + "**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import DsvmCompute\n", + "\n", + "dsvm_name = 'mydsvm'\n", + "try:\n", + " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", + " print('Found existing DSVM.')\n", + "except:\n", + " print('Creating a new DSVM.')\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", + " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", + " dsvm_compute.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Copy the Data File to the DSVM\n", + "Download the data file and copy the data file to the `/tmp/data` on the DSVM." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", + " delimiter=\"\\t\", quotechar='\"')\n", + "df.to_csv(\"data.tsv\", sep=\"\\t\", quotechar='\"', index=False)\n", + "\n", + "# Now copy the file data.tsv to the folder /tmp/data on the DSVM." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create the `get_data.py` File\n", + "For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", + "In this example, the `get_data()` function returns a [dictionary](README.md#getdata)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists(project_folder):\n", + " os.makedirs(project_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $project_folder/get_data.py\n", + "\n", + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import LabelEncoder\n", + "import os\n", + "\n", + "def get_data():\n", + " # Burning man 2016 data\n", + " df = pd.read_csv('/tmp/data/data.tsv',\n", + " delimiter=\"\\t\", quotechar='\"')\n", + " # get integer labels\n", + " le = LabelEncoder()\n", + " le.fit(df[\"Label\"].values)\n", + " y = le.transform(df[\"Label\"].values)\n", + " df = df.drop([\"Label\"], axis=1)\n", + "\n", + " df_train, _, y_train, _ = train_test_split(df, y, test_size = 0.1, random_state = 42)\n", + "\n", + " return { \"X\" : df.values, \"y\" : y }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML \n", + "\n", + "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "\n", + "**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n", + "|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and perform some common *feature extraction*.|\n", + "|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*, you can set it to *-1* to use all cores.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"max_time_sec\": 3600,\n", + " \"iterations\": 10,\n", + " \"n_cross_validations\": 5,\n", + " \"primary_metric\": 'AUC_weighted',\n", + " \"preprocess\": True,\n", + " \"max_cores_per_iteration\": 2,\n", + " \"verbosity\": logging.INFO\n", + "}\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " path = project_folder,\n", + " compute_target = dsvm_compute,\n", + " data_script = project_folder + \"/get_data.py\",\n", + " **automl_settings)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model \n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n", + "\n", + "In this example, we specify `show_output = False` to suppress console output while the run is in progress." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run = experiment.submit(automl_config, show_output = False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exploring the results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(remote_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(remote_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cancelling Runs\n", + "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Cancel the ongoing experiment and stop scheduling new iterations.\n", + "# remote_run.cancel()\n", + "\n", + "# Cancel iteration 1 and move onto iteration 2.\n", + "# remote_run.cancel_iteration(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = remote_run.get_output()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model which has the smallest `accuracy` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# lookup_metric = \"accuracy\"\n", + "# best_run, fitted_model = remote_run.get_output(metric = lookup_metric)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the first iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# iteration = 1\n", + "# best_run, fitted_model = remote_run.get_output(iteration = iteration)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "remote_run.register_model(description = description, tags = tags)\n", + "remote_run.model_id # Use this id to deploy the model as a web service in Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sklearn\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from pandas_ml import ConfusionMatrix\n", + "\n", + "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", + " delimiter=\"\\t\", quotechar='\"')\n", + "\n", + "# get integer labels\n", + "le = LabelEncoder()\n", + "le.fit(df[\"Label\"].values)\n", + "y = le.transform(df[\"Label\"].values)\n", + "df = df.drop([\"Label\"], axis = 1)\n", + "\n", + "_, df_test, _, y_test = train_test_split(df, y, test_size = 0.1, random_state = 42)\n", + "\n", + "y_pred = fitted_model.predict(df_test.values)\n", + "\n", + "y_pred_strings = le.inverse_transform(y_pred)\n", + "y_test_strings = le.inverse_transform(y_test)\n", + "\n", + "cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n", + "\n", + "print(cm)\n", + "\n", + "cm.plot()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 08: Remote Execution with Text file\n", - "\n", - "In this sample accesses a data file on a remote DSVM. This is more efficient than reading the file from Blob storage in the get_data method.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Configuring the DSVM to allow files to be access directly by the get_data method.\n", - "2. get_data returning data from a local file.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'automl-remote-dsvm-file'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-remote-dsvm-file'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a Remote Linux DSVM\n", - "Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", - "\n", - "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import DsvmCompute\n", - "\n", - "dsvm_name = 'mydsvm'\n", - "try:\n", - " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", - " print('found existing dsvm.')\n", - "except:\n", - " print('creating new dsvm.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", - " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", - " dsvm_compute.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy data file to the DSVM\n", - "Download the data file.\n", - "Copy the data file to the DSVM under the folder: /tmp/data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", - " delimiter=\"\\t\", quotechar='\"')\n", - "df.to_csv(\"data.tsv\", sep=\"\\t\", quotechar='\"', index=False)\n", - "\n", - "# Now copy the file data.tsv to the folder /tmp/data on the DSVM" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Get Data File\n", - "For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", - "\n", - "The *get_data()* function returns a [dictionary](README.md#getdata)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $project_folder/get_data.py\n", - "\n", - "import pandas as pd\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import LabelEncoder\n", - "import os\n", - "\n", - "def get_data():\n", - " # Burning man 2016 data\n", - " df = pd.read_csv('/tmp/data/data.tsv',\n", - " delimiter=\"\\t\", quotechar='\"')\n", - " # get integer labels\n", - " le = LabelEncoder()\n", - " le.fit(df[\"Label\"].values)\n", - " y = le.transform(df[\"Label\"].values)\n", - " df = df.drop([\"Label\"], axis=1)\n", - "\n", - " df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n", - "\n", - " return { \"X\" : df.values, \"y\" : y }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate AutoML \n", - "\n", - "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n", - "\n", - "Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n", - "|**preprocess**| *True/False*
Setting this to *True* enables Auto ML to perform preprocessing
on the input to handle *missing data*, and perform some common *feature extraction*|\n", - "|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*, you can set it to *-1* to use all cores|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"max_time_sec\": 3600,\n", - " \"iterations\": 10,\n", - " \"n_cross_validations\": 5,\n", - " \"primary_metric\": 'AUC_weighted',\n", - " \"preprocess\": True,\n", - " \"max_cores_per_iteration\": 2,\n", - " \"verbosity\": logging.INFO\n", - "}\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " path=project_folder,\n", - " compute_target = dsvm_compute,\n", - " data_script = project_folder + \"/get_data.py\",\n", - " **automl_settings\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model \n", - "\n", - "For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "remote_run = experiment.submit(automl_config, show_output=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the Results \n", - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(remote_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "#### Retrieve All Child Runs\n", - "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(remote_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Canceling runs\n", - "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Cancel the ongoing experiment and stop scheduling new iterations\n", - "# remote_run.cancel()\n", - "\n", - "# Cancel iteration 1 and move onto iteration 2\n", - "# remote_run.cancel_iteration(1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = remote_run.get_output()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# lookup_metric = \"accuracy\"\n", - "# best_run, fitted_model = remote_run.get_output(metric=lookup_metric)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model from a specific iteration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# iteration = 1\n", - "# best_run, fitted_model = remote_run.get_output(iteration=iteration)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "remote_run.register_model(description=description, tags=tags)\n", - "remote_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import sklearn\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import LabelEncoder\n", - "from pandas_ml import ConfusionMatrix\n", - "\n", - "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", - " delimiter=\"\\t\", quotechar='\"')\n", - "\n", - "# get integer labels\n", - "le = LabelEncoder()\n", - "le.fit(df[\"Label\"].values)\n", - "y = le.transform(df[\"Label\"].values)\n", - "df = df.drop([\"Label\"], axis=1)\n", - "\n", - "_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n", - "\n", - "ypred = fitted_model.predict(df_test.values)\n", - "\n", - "ypred_strings = le.inverse_transform(ypred)\n", - "ytest_strings = le.inverse_transform(y_test)\n", - "\n", - "cm = ConfusionMatrix(ytest_strings, ypred_strings)\n", - "\n", - "print(cm)\n", - "\n", - "cm.plot()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/09.auto-ml-classification-with-deployment.ipynb b/automl/09.auto-ml-classification-with-deployment.ipynb index 220ad834..34abb474 100644 --- a/automl/09.auto-ml-classification-with-deployment.ipynb +++ b/automl/09.auto-ml-classification-with-deployment.ipynb @@ -1,500 +1,494 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 09: Classification with Deployment\n", + "\n", + "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI).\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an experiment using an existing workspace.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "3. Train the model using local compute.\n", + "4. Explore the results.\n", + "5. Register the model.\n", + "6. Create a container image and create and ACI service.\n", + "7. Test the ACI service.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# choose a name for experiment\n", + "experiment_name = 'automl-local-classification'\n", + "# project folder\n", + "project_folder = './sample_projects/automl-local-classification'\n", + "\n", + "experiment=Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_train = digits.data[10:,:]\n", + "y_train = digits.target[10:]\n", + "\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " name = experiment_name,\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 1200,\n", + " iterations = 10,\n", + " n_cross_validations = 2,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Model\n", + "\n", + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register the Fitted Model for Deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "description = 'AutoML Model'\n", + "tags = None\n", + "model = local_run.register_model(description = description, tags = tags, iteration = 8)\n", + "local_run.model_id # This will be written to the script file later in the notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Scoring Script" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy\n", + "from sklearn.externals import joblib\n", + "from azureml.core.model import Model\n", + "\n", + "\n", + "def init():\n", + " global model\n", + " model_path = Model.get_model_path(model_name = '<>') # this name is model.id of model that we want to deploy\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "def run(rawdata):\n", + " try:\n", + " data = json.loads(rawdata)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " except Exception as e:\n", + " result = str(e)\n", + " return json.dumps({\"error\": result})\n", + " return json.dumps({\"result\":result.tolist()})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a YAML File for the Environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To ensure the consistency of the fit results with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'automl-local-classification'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", + " print('{}\\t{}'.format(p, dependencies[p]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile myenv.yml\n", + "name: myenv\n", + "channels:\n", + " - defaults\n", + "dependencies:\n", + " - pip:\n", + " - numpy==1.14.2\n", + " - scikit-learn==0.19.2\n", + " - pynacl==1.2.1\n", + " - azureml-sdk[notebooks,automl]==<>" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Substitute the actual version number in the environment file.\n", + "\n", + "conda_env_file_name = 'myenv.yml'\n", + "\n", + "with open(conda_env_file_name, 'r') as cefr:\n", + " content = cefr.read()\n", + "\n", + "with open(conda_env_file_name, 'w') as cefw:\n", + " cefw.write(content.replace('<>', dependencies['azureml-sdk']))\n", + "\n", + "# Substitute the actual model id in the script file.\n", + "\n", + "script_file_name = 'score.py'\n", + "\n", + "with open(script_file_name, 'r') as cefr:\n", + " content = cefr.read()\n", + "\n", + "with open(script_file_name, 'w') as cefw:\n", + " cefw.write(content.replace('<>', local_run.model_id))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Container Image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import Image, ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", + " execution_script = script_file_name,\n", + " conda_file = conda_env_file_name,\n", + " tags = {'area': \"digits\", 'type': \"automl_classification\"},\n", + " description = \"Image for automl classification sample\")\n", + "\n", + "image = Image.create(name = \"automlsampleimage\",\n", + " # this is the model object \n", + " models = [model],\n", + " image_config = image_config, \n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy the Image as a Web Service on Azure Container Instance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n", + " description = 'sample service for Automl Classification')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'automl-sample-01'\n", + "print(aci_service_name)\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete a Web Service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get Logs from a Deployed Web Service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#aci_service.get_logs()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test a Web Service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Randomly select digits and test\n", + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]\n", + "\n", + "for index in np.random.choice(len(y_test), 3, replace = False):\n", + " print(index)\n", + " test_sample = json.dumps({'data':X_test[index:index + 1].tolist()})\n", + " predicted = aci_service.run(input_data = test_sample)\n", + " label = y_test[index]\n", + " predictedDict = json.loads(predicted)\n", + " title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n", + " fig = plt.figure(1, figsize = (3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 09: Classification with deployment\n", - "\n", - "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Creating an Experiment using an existing Workspace\n", - "2. Instantiating AutoMLConfig\n", - "3. Training the Model using local compute\n", - "4. Exploring the results\n", - "5. Registering the model\n", - "6. Creating Image and creating aci service\n", - "7. Testing the aci service\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'automl-local-classification'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-classification'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration|\n", - "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", - "|**n_cross_validations**|Number of cross validation splits|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[10:,:]\n", - "y_digits = digits.target[10:]\n", - "\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " name=experiment_name,\n", - " debug_log='automl_errors.log',\n", - " primary_metric='AUC_weighted',\n", - " max_time_sec=1200,\n", - " iterations=10,\n", - " n_cross_validations=2,\n", - " verbosity=logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register fitted model for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "model = local_run.register_model(description=description, tags=tags, iteration=8)\n", - "local_run.model_id # This will be written to the script file later in the notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Scoring script ###" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from azureml.core.model import Model\n", - "\n", - "\n", - "def init():\n", - " global model\n", - " model_path = Model.get_model_path(model_name = '<>') # this name is model.id of model that we want to deploy\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "def run(rawdata):\n", - " try:\n", - " data = json.loads(rawdata)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " except Exception as e:\n", - " result = str(e)\n", - " return json.dumps({\"error\": result})\n", - " return json.dumps({\"result\":result.tolist()})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create yml file for env" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To ensure the consistence the fit results with the training results, the sdk dependence versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook 12.auto-ml-retrieve-the-training-sdk-versions.ipynb." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'automl-local-classification'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "ml_run = AutoMLRun(experiment=experiment, run_id=local_run.id)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dependencies = ml_run.get_run_sdk_dependencies(iteration=7)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", - " print('{}\\t{}'.format(p, dependencies[p]))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile myenv.yml\n", - "name: myenv\n", - "channels:\n", - " - defaults\n", - "dependencies:\n", - " - pip:\n", - " - numpy==1.14.2\n", - " - scikit-learn==0.19.2\n", - " - azureml-sdk[notebooks,automl]==<> " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Substitute the actual version number in the environment file.\n", - "\n", - "conda_env_file_name = 'myenv.yml'\n", - "\n", - "with open(conda_env_file_name, 'r') as cefr:\n", - " content = cefr.read()\n", - "\n", - "with open(conda_env_file_name, 'w') as cefw:\n", - " cefw.write(content.replace('<>', dependencies['azureml-sdk']))\n", - "\n", - "# Substitute the actual model id in the script file.\n", - "\n", - "script_file_name = 'score.py'\n", - "\n", - "with open(script_file_name, 'r') as cefr:\n", - " content = cefr.read()\n", - "\n", - "with open(script_file_name, 'w') as cefw:\n", - " cefw.write(content.replace('<>', local_run.model_id))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Image ###" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"digits\", 'type': \"automl_classification\"},\n", - " description = \"Image for automl classification sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy Image as web service on Azure Container Instance ###" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n", - " description = 'sample service for Automl Classification')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'automl-sample-01'\n", - "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### To delete a service ##" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### To get logs from deployed service ###" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.get_logs()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test Web Service ###" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]\n", - "\n", - "for index in np.random.choice(len(y_digits), 3):\n", - " print(index)\n", - " test_sample = json.dumps({'data':X_digits[index:index + 1].tolist()})\n", - " predicted = aci_service.run(input_data = test_sample)\n", - " label = y_digits[index]\n", - " predictedDict = json.loads(predicted)\n", - " title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/10.auto-ml-multi-output-example.ipynb b/automl/10.auto-ml-multi-output-example.ipynb index e699b7f5..d4b3e430 100644 --- a/automl/10.auto-ml-multi-output-example.ipynb +++ b/automl/10.auto-ml-multi-output-example.ipynb @@ -1,292 +1,287 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 10: Multi-output\n", + "\n", + "This notebook shows how to use AutoML to train multi-output problems by leveraging the correlation between the outputs using indicator vectors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Transformer Functions\n", + "The transformations of inputs `X` and `y` are happening as follows, e.g. `y = {y_1, y_2}`, then `X` becomes\n", + " \n", + "`X 1 0`\n", + " \n", + "`X 0 1`\n", + "\n", + "and `y` becomes,\n", + "\n", + "`y_1`\n", + "\n", + "`y_2`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from scipy import sparse\n", + "from scipy import linalg\n", + "\n", + "#Transformer functions\n", + "def multi_output_transform_x_y(X, y):\n", + " X_new = multi_output_transformer_x(X, y.shape[1])\n", + " y_new = multi_output_transform_y(y)\n", + " return X_new, y_new\n", + "\n", + "def multi_output_transformer_x(X, number_of_columns_y):\n", + " indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_y))\n", + " if sparse.issparse(X):\n", + " X_new = sparse.vstack(np.tile(X, number_of_columns_y))\n", + " indicator_vecs = sparse.coo_matrix(indicator_vecs)\n", + " X_new = sparse.hstack((X_new, indicator_vecs))\n", + " else:\n", + " X_new = np.tile(X, (number_of_columns_y, 1))\n", + " X_new = np.hstack((X_new, indicator_vecs))\n", + " return X_new\n", + "\n", + "def multi_output_transform_y(y):\n", + " return y.reshape(-1, order=\"F\")\n", + "\n", + "def multi_output_inverse_transform_y(y, number_of_columns_y):\n", + " return y.reshape((-1, number_of_columns_y), order = \"F\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AutoML Experiment Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the experiment and specify the project folder.\n", + "experiment_name = 'automl-local-multi-output'\n", + "project_folder = './sample_projects/automl-local-multi-output'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a Random Dataset for Test Purposes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rng = np.random.RandomState(1)\n", + "X_train = np.sort(200 * rng.rand(600, 1) - 100, axis = 0)\n", + "y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n", + "y_train += (0.5 - rng.rand(*y_train.shape))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Perform X and y transformation using the transformer function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Configure AutoML using the transformed results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'regression',\n", + " debug_log = 'automl_errors_multi.log',\n", + " primary_metric = 'r2_score',\n", + " iterations = 10,\n", + " n_cross_validations = 2,\n", + " verbosity = logging.INFO,\n", + " X = X_train_transformed,\n", + " y = y_train_transformed,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Fit the Transformed Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the best fit model.\n", + "best_run, fitted_model = local_run.get_output()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate random data set for predicting.\n", + "X_test = np.sort(200 * rng.rand(200, 1) - 100, axis = 0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Transform predict data.\n", + "X_test_transformed = multi_output_transformer_x(X_test, y_train.shape[1])\n", + "\n", + "# Predict and inverse transform the prediction.\n", + "y_predict = fitted_model.predict(X_test_transformed)\n", + "y_predict = multi_output_inverse_transform_y(y_predict, y_train.shape[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(y_predict)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 10: Multi output Example for AutoML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This notebook shows an example to use AutoML to train the multi output problems by leveraging the correlation between the outputs using indicator vectors." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Transformer functions\n", - "The transformation of the input are happening for input X and Y as following, e.g. Y = {y_1, y_2}, then X becomes\n", - " \n", - "X 1 0\n", - " \n", - "X 0 1\n", - "\n", - "and Y becomes,\n", - "\n", - "y_1\n", - "\n", - "y_2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from scipy import sparse\n", - "from scipy import linalg\n", - "\n", - "#Transformer functions\n", - "def multi_output_transform_x_y(X, Y):\n", - " X_new = multi_output_transformer_x(X, Y.shape[1])\n", - " y_new = multi_output_transform_y(Y)\n", - " return X_new, y_new\n", - "\n", - "def multi_output_transformer_x(X, number_of_columns_Y):\n", - " indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_Y))\n", - " if sparse.issparse(X):\n", - " X_new = sparse.vstack(np.tile(X, number_of_columns_Y))\n", - " indicator_vecs = sparse.coo_matrix(indicator_vecs)\n", - " X_new = sparse.hstack((X_new, indicator_vecs))\n", - " else:\n", - " X_new = np.tile(X, (number_of_columns_Y, 1))\n", - " X_new = np.hstack((X_new, indicator_vecs))\n", - " return X_new\n", - "\n", - "def multi_output_transform_y(Y):\n", - " return Y.reshape(-1, order=\"F\")\n", - " \n", - "def multi_output_inverse_transform_y(y, number_of_columns_y):\n", - " return y.reshape((-1, number_of_columns_y), order=\"F\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## AutoML experiment set up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'automl-local-multi-output'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-multi-output'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a random dataset for the test purpose " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rng = np.random.RandomState(1)\n", - "X_train = np.sort(200 * rng.rand(600, 1) - 100, axis=0)\n", - "Y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n", - "Y_train += (0.5 - rng.rand(*Y_train.shape))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Perform X and Y transformation using transformer function" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, Y_train)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'regression',\n", - " debug_log='automl_errors_multi.log',\n", - " primary_metric='r2_score',\n", - " iterations=10,\n", - " n_cross_validations=2,\n", - " verbosity=logging.INFO,\n", - " X=X_train_transformed,\n", - " y=y_train_transformed,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Fit the transformed data " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the best fit model\n", - "best_run, fitted_model = local_run.get_output()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Generate random data set for predicting\n", - "X_predict = np.sort(200 * rng.rand(200, 1) - 100, axis=0)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Transform predict data\n", - "X_predict_transformed = multi_output_transformer_x(X_predict, Y_train.shape[1])\n", - "# Predict and inverse transform the prediction\n", - "y_predict = fitted_model.predict(X_predict_transformed)\n", - "Y_predict = multi_output_inverse_transform_y(y_predict, Y_train.shape[1])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(Y_predict)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/11.auto-ml-sample-weight.ipynb b/automl/11.auto-ml-sample-weight.ipynb index 2afce723..b7cd3730 100644 --- a/automl/11.auto-ml-sample-weight.ipynb +++ b/automl/11.auto-ml-sample-weight.ipynb @@ -1,251 +1,246 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 11: Sample Weight\n", + "\n", + "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.\n", + "\n", + "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose names for the regular and the sample weight experiments.\n", + "experiment_name = 'non_sample_weight_experiment'\n", + "sample_weight_experiment_name = 'sample_weight_experiment'\n", + "\n", + "project_folder = './sample_projects/automl-local-classification'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_train = digits.data[100:,:]\n", + "y_train = digits.target[100:]\n", + "\n", + "# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n", + "# This makes the model more likely to classify as 4 if the image it not clear.\n", + "sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])\n", + "\n", + "automl_classifier = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 3600,\n", + " iterations = 10,\n", + " n_cross_validations = 2,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)\n", + "\n", + "automl_sample_weight = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 3600,\n", + " iterations = 10,\n", + " n_cross_validations = 2,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " sample_weight = sample_weight,\n", + " path = project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the Models\n", + "\n", + "Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_classifier, show_output = True)\n", + "sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n", + "\n", + "best_run, fitted_model = local_run.get_output()\n", + "best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model\n", + "\n", + "#### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_test = digits.data[:100, :]\n", + "y_test = digits.target[:100]\n", + "images = digits.images[:100]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Compare the Pipelines\n", + "The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Randomly select digits and test.\n", + "for index in range(0,len(y_test)):\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n", + " title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n", + " fig = plt.figure(1, figsize=(3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 11: Sample weight\n", - "\n", - "In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use sample weight with the AutoML Classifier.\n", - "Sample weight is used where some sample values are more important than others.\n", - "\n", - "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. How to specifying sample_weight\n", - "2. The difference that it makes to test results\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'non_sample_weight_experiment'\n", - "sample_weight_experiment_name = 'sample_weight_experiment'\n", - "\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-classification'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate Auto ML Config\n", - "\n", - "Instantiate two AutoMLConfig Objects. One will be used with sample_weight and one without." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[100:,:]\n", - "y_digits = digits.target[100:]\n", - "\n", - "# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n", - "# This makes the model more likely to classify as 4 if the image it not clear.\n", - "sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_digits])\n", - "\n", - "automl_classifier = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " primary_metric = 'AUC_weighted',\n", - " max_time_sec = 3600,\n", - " iterations = 10,\n", - " n_cross_validations = 2,\n", - " verbosity = logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " path=project_folder)\n", - "\n", - "automl_sample_weight = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " primary_metric = 'AUC_weighted',\n", - " max_time_sec = 3600,\n", - " iterations = 10,\n", - " n_cross_validations = 2,\n", - " verbosity = logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " sample_weight = sample_weight,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Models\n", - "\n", - "Call the submit method on the experiment and pass the configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", - "You will see the currently running iterations printing to the console." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_classifier, show_output=True)\n", - "sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output=True)\n", - "\n", - "best_run, fitted_model = local_run.get_output()\n", - "best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Models\n", - "\n", - "#### Load Test Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:100, :]\n", - "y_digits = digits.target[:100]\n", - "images = digits.images[:100]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Compare the pipelines\n", - "The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "for index in range(0,len(y_digits)):\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " predicted_sample_weight = fitted_model_sample_weight.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n", - " title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % ( label,predicted,predicted_sample_weight)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/12.auto-ml-retrieve-the-training-sdk-versions.ipynb b/automl/12.auto-ml-retrieve-the-training-sdk-versions.ipynb index 4d6b5eb8..b661b447 100644 --- a/automl/12.auto-ml-retrieve-the-training-sdk-versions.ipynb +++ b/automl/12.auto-ml-retrieve-the-training-sdk-versions.ipynb @@ -1,240 +1,242 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 12: Retrieving Training SDK Versions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "import random\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn import datasets\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.train.automl.run import AutoMLRun\n", + "from azureml.train.automl.utilities import get_sdk_dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Retrieve the SDK versions in the current environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To retrieve the SDK versions in the current environment, run `get_sdk_dependencies`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "get_sdk_dependencies()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. Train model using AutoML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# Choose a name for the experiment and specify the project folder.\n", + "experiment_name = 'automl-local-classification'\n", + "project_folder = './sample_projects/automl-local-classification'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits = datasets.load_digits()\n", + "X_train = digits.data[10:,:]\n", + "y_train = digits.target[10:]\n", + "\n", + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " primary_metric = 'AUC_weighted',\n", + " iterations = 3,\n", + " n_cross_validations = 2,\n", + " verbosity = logging.INFO,\n", + " X = X_train, \n", + " y = y_train,\n", + " path = project_folder)\n", + "\n", + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3. Retrieve the SDK versions from RunHistory" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get the SDK versions from RunHistory, first the run id needs to be recorded. This can either be done by copying it from the output message or by retrieving it after each run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use a run id copied from an output message.\n", + "#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n", + "\n", + "# Retrieve the run id from a run.\n", + "run_id = local_run.id\n", + "print(run_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Initialize a new `AutoMLRun` object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'automl-local-classification'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "ml_run = AutoMLRun(experiment = experiment, run_id = run_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get parent training SDK versions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ml_run.get_run_sdk_dependencies()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get the traning SDK versions of a specific run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ml_run.get_run_sdk_dependencies(iteration = 2)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 12: Retrieving Training SDK Versions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun\n", - "from azureml.train.automl.utilities import get_sdk_dependencies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 1. Retrieve the SDK versions in the current env" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To retrieve the SDK versions in the current env, simple running get_sdk_dependencies()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "get_sdk_dependencies()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 2. Training Model Using AutoML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# choose a name for experiment\n", - "experiment_name = 'automl-local-classification'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-local-classification'\n", - "\n", - "experiment=Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits = datasets.load_digits()\n", - "X_digits = digits.data[10:,:]\n", - "y_digits = digits.target[10:]\n", - "\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log='automl_errors.log',\n", - " primary_metric='AUC_weighted',\n", - " iterations=3,\n", - " n_cross_validations=2,\n", - " verbosity=logging.INFO,\n", - " X = X_digits, \n", - " y = y_digits,\n", - " path=project_folder)\n", - "\n", - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 3. Retrieve the SDK versions from RunHistory" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To get the SDK versions from RunHistory, first the RunId need to be recorded. This can either be done by copy it from the output message or retieve if after each run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_id = local_run.id\n", - "print(run_id)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Initialize a new AutoMLRunClass." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'automl-local-classification'\n", - "#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "ml_run = AutoMLRun(experiment=experiment, run_id=run_id)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get parent training SDK versions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ml_run.get_run_sdk_dependencies()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the traning SDK versions of a specific run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ml_run.get_run_sdk_dependencies(iteration=2)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/13.auto-ml-dataprep.ipynb b/automl/13.auto-ml-dataprep.ipynb index bd8c6175..7b0e6c92 100644 --- a/automl/13.auto-ml-dataprep.ipynb +++ b/automl/13.auto-ml-dataprep.ipynb @@ -1,567 +1,557 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AutoML 13: Prepare Data using `azureml.dataprep`\n", + "In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n", + "\n", + "Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n", + "2. Pass the `Dataflow` to AutoML for a local run.\n", + "3. Pass the `Dataflow` to AutoML for a remote run." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install `azureml.dataprep` SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install azureml-dataprep" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security of future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "set_diagnostics_collection(send_diagnostics = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Experiment\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "import os\n", + "\n", + "import pandas as pd\n", + "\n", + "import azureml.core\n", + "from azureml.core.compute import DsvmCompute\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.runconfig import CondaDependencies\n", + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.workspace import Workspace\n", + "import azureml.dataprep as dprep\n", + "from azureml.train.automl import AutoMLConfig" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + " \n", + "# choose a name for experiment\n", + "experiment_name = 'automl-dataprep-classification'\n", + "# project folder\n", + "project_folder = './sample_projects/automl-dataprep-classification'\n", + " \n", + "experiment = Experiment(ws, experiment_name)\n", + " \n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace Name'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data = output, index = ['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading Data using DataPrep" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file.\n", + "# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n", + "simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n", + "X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n", + "\n", + "# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n", + "# and convert column types manually.\n", + "# Here we read a comma delimited file and convert all columns to integers.\n", + "y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Review the Data Preparation Result\n", + "\n", + "You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X.skip(1).head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure AutoML\n", + "\n", + "This creates a general AutoML settings object applicable for both local and remote runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"max_time_sec\": 600,\n", + " \"iterations\": 2,\n", + " \"primary_metric\": 'AUC_weighted',\n", + " \"preprocess\": False,\n", + " \"verbosity\": logging.INFO,\n", + " \"n_cross_validations\": 3\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Local Run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pass Data with `Dataflow` Objects\n", + "\n", + "The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " X = X,\n", + " y = y,\n", + " **automl_settings)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Remote Run\n", + "*Note: This feature might not work properly in your workspace region before the October update. You may jump to the \"Exploring the results\" section below to explore other features AutoML and DataPrep has to offer.*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach a Remote Linux DSVM" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dsvm_name = 'mydsvm'\n", + "try:\n", + " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", + " print('Found existing DVSM.')\n", + "except:\n", + " print('Creating a new DSVM.')\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", + " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", + " dsvm_compute.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Update Conda Dependency file to have AutoML and DataPrep SDK\n", + "\n", + "Currently the AutoML and DataPrep SDKs are not installed with the Azure ML SDK by default. To circumvent this limitation, we update the conda dependency file to add these dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cd = CondaDependencies()\n", + "cd.add_pip_package(pip_package='azureml-dataprep')\n", + "cd.add_pip_package(pip_package='tornado==4.5.1')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a `RunConfiguration` with DSVM name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_config = RunConfiguration(conda_dependencies=cd)\n", + "run_config.target = dsvm_compute\n", + "run_config.auto_prepare_environment = True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pass Data with `Dataflow` Objects\n", + "\n", + "The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_config = AutoMLConfig(task = 'classification',\n", + " debug_log = 'automl_errors.log',\n", + " path = project_folder,\n", + " run_configuration = run_config,\n", + " X = X,\n", + " y = y,\n", + " **automl_settings)\n", + "# Please uncomment the line below to try out remote run with dataprep. \n", + "# This feature might not work properly in your workspace region before the October update.\n", + "# remote_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Retrieve All Child Runs\n", + "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + " \n", + "import pandas as pd\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Best Model Based on Any Other Metric\n", + "Show the run and the model that has the smallest `log_loss` value:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lookup_metric = \"log_loss\"\n", + "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model from a Specific Iteration\n", + "Show the run and the model from the first iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "iteration = 0\n", + "best_run, fitted_model = local_run.get_output(iteration = iteration)\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the Best Fitted Model\n", + "\n", + "#### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn import datasets\n", + "\n", + "digits = datasets.load_digits()\n", + "X_test = digits.data[:10, :]\n", + "y_test = digits.target[:10]\n", + "images = digits.images[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Testing Our Best Pipeline\n", + "We will try to predict 2 digits and see how our model works." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Randomly select digits and test\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import random\n", + "import numpy as np\n", + "\n", + "for index in np.random.choice(len(y_test), 2, replace = False):\n", + " print(index)\n", + " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", + " label = y_test[index]\n", + " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", + " fig = plt.figure(1, figsize=(3,3))\n", + " ax1 = fig.add_axes((0,0,.8,.8))\n", + " ax1.set_title(title)\n", + " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Capture the `Dataflow` Objects for Later Use in AutoML\n", + "\n", + "`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# sklearn.digits.data + target\n", + "digits_complete = dprep.smart_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "digits_complete.to_pandas_dataframe().shape\n", + "labels_column = 'Column64'\n", + "dflow_X = digits_complete.drop_columns(columns = [labels_column])\n", + "dflow_y = digits_complete.keep_columns(columns = [labels_column])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML 13: Prepare Data using `azureml.dataprep`\n", - "In this example we showcase how you can use `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone - full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n", - "\n", - "Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n", - "\n", - "In this notebook you would see\n", - "1. Defining data loading and preparation steps in a `Dataflow` using `azureml.dataprep`\n", - "2. Passing the `Dataflow` to AutoML for local run\n", - "3. Passing the `Dataflow` to AutoML for remote run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Install `azureml.dataprep` SDK" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Please restart your kernel after the below installs.\n", - "\n", - "Tornado must be downgraded to a pre-5 version due to a known Tornado x Jupyter event loop bug." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install azureml-dataprep\n", - "!pip install tornado==4.5.1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Experiment\n", - "\n", - "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "\n", - "import pandas as pd\n", - "\n", - "import azureml.core\n", - "from azureml.core.compute import DsvmCompute\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.runconfig import CondaDependencies\n", - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.workspace import Workspace\n", - "import azureml.dataprep as dprep\n", - "from azureml.train.automl import AutoMLConfig" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - " \n", - "# choose a name for experiment\n", - "experiment_name = 'automl-dataprep-classification'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-dataprep-classification'\n", - " \n", - "experiment = Experiment(ws, experiment_name)\n", - " \n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Loading Data using DataPrep" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file\n", - "# data pulled from sklearn.datasets.load_digits()\n", - "simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n", - "X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # remove header\n", - "\n", - "# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter).\n", - "# and convert column types manually.\n", - "# Here we read a comma delimited file and convert all columns to integers.\n", - "y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Review the Data Preparation Result\n", - "\n", - "You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X.skip(1).head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instantiate AutoML Settings\n", - "\n", - "This creates a general Auto ML Settings applicable for both Local and Remote runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"max_time_sec\": 600,\n", - " \"iterations\": 2,\n", - " \"primary_metric\": 'AUC_weighted',\n", - " \"preprocess\": False,\n", - " \"verbosity\": logging.INFO,\n", - " \"n_cross_validations\" : 3\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Local Run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pass data with Dataflows\n", - "\n", - "The `Dataflow` objects captured above can be passed to `submit` method for local run. AutoML will retrieve the results from the `Dataflow` for model training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " X = X,\n", - " y = y,\n", - " **automl_settings)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Remote Run\n", - "*Note: This feature might not work properly in your workspace region before the October update. You may jump to the \"Exploring the results\" section below to explore other features AutoML and DataPrep has to offer.*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create or Attach a Remote Linux DSVM" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dsvm_name = 'mydsvm'\n", - "try:\n", - " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", - " print('found existing dsvm.')\n", - "except:\n", - " print('creating new dsvm.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", - " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", - " dsvm_compute.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Update Conda Dependency file to have AutoML and DataPrep SDK\n", - "\n", - "Currently AutoML and DataPrep SDK is not installed with Azure ML SDK by default. Due to this we update the conda dependency file to add such dependencies." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cd = CondaDependencies()\n", - "cd.add_pip_package(pip_package='azureml-dataprep')\n", - "cd.add_pip_package(pip_package='tornado==4.5.1')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a RunConfiguration with DSVM name" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run_config = RunConfiguration(conda_dependencies=cd)\n", - "run_config.target = dsvm_compute\n", - "run_config.auto_prepare_environment = True" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pass data with Dataflows\n", - "\n", - "The `Dataflow` objects captured above can also be passed to `submit` method for remote run. AutoML will serialize the `Dataflow` and send to remote compute target. The `Dataflow` will not be evaluated locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " path = project_folder,\n", - " run_configuration = run_config,\n", - " X = X,\n", - " y = y,\n", - " **automl_settings)\n", - "# Please uncomment the line below to try out remote run with dataprep. \n", - "# This feature might not work properly in your workspace region before the October update.\n", - "# remote_run = experiment.submit(automl_config, show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for monitoring runs\n", - "\n", - "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", - "\n", - "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Retrieve all child runs\n", - "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", - " metricslist[int(properties['iteration'])] = metrics\n", - " \n", - "import pandas as pd\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model\n", - "\n", - "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any other metric\n", - "Give me the run and the model that has the smallest `log_loss`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model based on any iteration\n", - "Give me the run and the model from the 1st iteration:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "iteration = 0\n", - "best_run, fitted_model = local_run.get_output(iteration = iteration)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Testing the Fitted Model \n", - "\n", - "#### Load Test Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn import datasets\n", - "\n", - "digits = datasets.load_digits()\n", - "X_digits = digits.data[:10, :]\n", - "y_digits = digits.target[:10]\n", - "images = digits.images[:10]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Testing our best pipeline\n", - "We will try to predict 2 digits and see how our model works." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Randomly select digits and test\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import random\n", - "import numpy as np\n", - "\n", - "for index in np.random.choice(len(y_digits), 2):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n", - " label = y_digits[index]\n", - " title = \"Label value = %d Predicted value = %d \" % ( label,predicted)\n", - " fig = plt.figure(1, figsize=(3,3))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n", - " plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Appendix" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Capture the Dataflows to use for AutoML later\n", - "\n", - "`Dataflow` objects are immutable. Each of them is composed of a list of data preparation steps. A `Dataflow` can be branched at any point for further usage." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# sklearn.digits.data + target\n", - "digits_complete = dprep.smart_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`digits_complete` (sourced from `sklearn.datasets.load_digits()`)is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "digits_complete.to_pandas_dataframe().shape\n", - "labels_column = 'Column64'\n", - "dflow_X = digits_complete.drop_columns(columns = [labels_column])\n", - "dflow_y = digits_complete.keep_columns(columns = [labels_column])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/automl/README.md b/automl/README.md index 4b844185..729ad62d 100644 --- a/automl/README.md +++ b/automl/README.md @@ -1,52 +1,24 @@ # Table of Contents -1. [Automated ML Introduction](#introduction) -1. [Running samples in Azure Notebooks](#jupyter) -1. [Running samples in a Local Conda environment](#localconda) -1. [Automated ML SDK Sample Notebooks](#samples) -1. [Documentation](#documentation) -1. [Running using python command](#pythoncommand) -1. [Troubleshooting](#troubleshooting) - - -# Automated ML introduction -Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions. +1. [Auto ML Introduction](#introduction) +2. [Running samples in a Local Conda environment](#localconda) +3. [Auto ML SDK Sample Notebooks](#samples) +4. [Documentation](#documentation) +5. [Running using python command](#pythoncommand) +6. [Troubleshooting](#troubleshooting) +# Auto ML Introduction +AutoML builds high quality Machine Learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, AutoML will give you a high quality machine learning model that you can use for predictions. If you are new to Data Science, AutoML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use. If you are an experienced data scientist, AutoML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. AutoML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire. - -## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud -1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks) -[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks. -1. Follow the instructions in the [../00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace. -1. Open one of the sample notebooks. - - **Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook. - - ![set kernel to Python 3.6](../images/python36.png) +# Running samples in a Local Conda environment - -## Running samples in a Local Conda environment - -To run these notebook on your own notebook server, use these installation instructions. - -The instructions below will install everything you need and then start a Jupyter notebook. To start your Jupyter notebook manually, use: - -``` -conda activate azure_automl -jupyter notebook -``` - -or on Mac: - -``` -source activate azure_automl -jupyter notebook -``` +You can run these notebooks in Azure Notebooks without any extra installation. To run these notebook on your own notebook server, use these installation instructions. +It is best if you create a new conda environment locally to try this SDK, so it doesn't mess up with your existing Python environment. ### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher. - **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda. @@ -76,19 +48,19 @@ bash automl_setup_mac.sh cd to the **automl** folder where the sample notebooks were extracted and then run: ``` -bash automl_setup_linux.sh +automl_setup_linux.sh ``` ### 4. Running configuration.ipynb - Before running any samples you next need to run the configuration notebook. Click on 00.configuration.ipynb notebook +- Please make sure you use the Python [conda env:azure_automl] kernel when running this notebook. - Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*) ### 5. Running Samples - Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks. - Follow the instructions in the individual notebooks to explore various features in AutoML - -# Automated ML SDK Sample Notebooks +# Auto ML SDK Sample Notebooks - [00.configuration.ipynb](00.configuration.ipynb) - Register Machine Learning Services Resource Provider - Create new Azure ML Workspace @@ -115,7 +87,7 @@ bash automl_setup_linux.sh - [03b.auto-ml-remote-batchai.ipynb](03b.auto-ml-remote-batchai.ipynb) - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) - - Example of using automated ML for classification using a remote Batch AI compute for training + - Example of using Auto ML for classification using a remote Batch AI compute for training - Parallel execution of iterations - Async tracking of progress - Cancelling individual iterations or entire run @@ -171,17 +143,20 @@ bash automl_setup_linux.sh - [13.auto-ml-dataprep.ipynb](13.auto-ml-dataprep.ipynb) - Using DataPrep for reading data - -# Documentation +- [14a.auto-ml-classification-ensemble.ipynb](14a.auto-ml-classification-ensemble.ipynb) + - Classification with ensembling + +- [14b.auto-ml-regression-ensemble.ipynb](14b.auto-ml-regression-ensemble.ipynb) + - Regression with ensembling + +# Documentation ## Table of Contents -1. [Automated ML Settings ](#automlsettings) -1. [Cross validation split options](#cvsplits) -1. [Get Data Syntax](#getdata) -1. [Data pre-processing and featurization](#preprocessing) - - -## Automated ML Settings +1. [Auto ML Settings ](#automlsettings) +2. [Cross validation split options](#cvsplits) +3. [Get Data Syntax](#getdata) +4. [Data pre-processing and featurization](#preprocessing) +## Auto ML Settings |Property|Description|Default| |-|-|-| |**primary_metric**|This is the metric that you want to optimize.

Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted

Regression supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error
normalized_root_mean_squared_log_error| Classification: accuracy

Regression: spearman_correlation @@ -195,8 +170,7 @@ bash automl_setup_linux.sh |**exit_score**|*double* value indicating the target for *primary_metric*.
Once the target is surpassed the run terminates|None| |**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.

Allowed values for **Classification**
LogisticRegression
SGDClassifierWrapper
NBWrapper
BernoulliNB
SVCWrapper
LinearSVMWrapper
KNeighborsClassifier
DecisionTreeClassifier
RandomForestClassifier
ExtraTreesClassifier
gradient boosting
LightGBMClassifier

Allowed values for **Regression**
ElasticNet
GradientBoostingRegressor
DecisionTreeRegressor
KNeighborsRegressor
LassoLars
SGDRegressor
RandomForestRegressor
ExtraTreesRegressor|None| - -## Cross validation split options +## Cross validation split options ### K-Folds Cross Validation Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set. @@ -206,8 +180,7 @@ Use *validation_size* to specify the percentage of the training data set that sh ### Custom train and validation set You can specify seperate train and validation set either through the get_data() or directly to the fit method. - -## get_data() syntax +## get_data() syntax The *get_data()* function can be used to return a dictionary with these values: |Key|Type|Dependency|Mutually Exclusive with|Description| @@ -223,23 +196,21 @@ The *get_data()* function can be used to return a dictionary with these values: |columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features| |cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation| - -## Data pre-processing and featurization -If you use `preprocess=True`, the following data preprocessing steps are performed automatically for you: +## Data pre-processing and featurization +If you use "preprocess=True", the following data preprocessing steps are performed automatically for you: +### 1. Dropping high cardinality or no variance features +- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs). +### 2. Missing value imputation +- For numerical features, missing values are imputed with average of values in the column. +- For categorical features, missing values are imputed with most frequent value. +### 3. Generating additional features +- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second. +- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer. +### 4. Transformations and encodings +- Numeric features with very few unique values are transformed into categorical features. +- Depending on cardinality of categorical features label encoding or (hashing) one-hot encoding is performed. -1. Dropping high cardinality or no variance features - - Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs). -2. Missing value imputation - - For numerical features, missing values are imputed with average of values in the column. - - For categorical features, missing values are imputed with most frequent value. -3. Generating additional features - - For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second. - - For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer. -4. Transformations and encodings - - Numeric features with very few unique values are transformed into categorical features. - - -# Running using python command +# Running using python command Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file. You can then run this file using the python command. However, on Windows the file needs to be modified before it can be run. @@ -249,8 +220,7 @@ The following condition must be added to the main code in the file: The main code of the file must be indented so that it is under this condition. - -# Troubleshooting +# Troubleshooting ## Iterations fail and the log contains "MemoryError" This can be caused by insufficient memory on the DSVM. AutoML loads all training data into memory. So, the available memory should be more than the training data size. If you are using a remote DSVM, memory is needed for each concurrent iteration. The concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and concurrent_iterations is set to 10, the minimum memory required is at least 80Gb. diff --git a/automl/automl_env.yml b/automl/automl_env.yml index 4d038465..dd61cc35 100644 --- a/automl/automl_env.yml +++ b/automl/automl_env.yml @@ -8,7 +8,7 @@ dependencies: - numpy>=1.11.0,<1.16.0 - scipy>=0.19.0,<0.20.0 - scikit-learn>=0.18.0,<=0.19.1 -- pandas>=0.19.0,<0.23.0 +- pandas>=0.22.0,<0.23.0 - pip: # Required packages for AzureML execution, history, and data preparation. diff --git a/automl/automl_setup.cmd b/automl/automl_setup.cmd index 201a06fe..77a6530b 100644 --- a/automl/automl_setup.cmd +++ b/automl/automl_setup.cmd @@ -6,7 +6,8 @@ IF "%conda_env_name%"=="" SET conda_env_name="azure_automl" call conda activate %conda_env_name% 2>nul: if not errorlevel 1 ( - call conda env update --file automl_env.yml -n %conda_env_name% + echo Upgrading azureml-sdk[automl] in existing conda environment %conda_env_name% + call pip install --upgrade azureml-sdk[automl] if errorlevel 1 goto ErrorExit ) else ( call conda env create -f automl_env.yml -n %conda_env_name% diff --git a/automl/automl_setup_linux.sh b/automl/automl_setup_linux.sh index 6e030054..fe57fe92 100644 --- a/automl/automl_setup_linux.sh +++ b/automl/automl_setup_linux.sh @@ -9,7 +9,8 @@ fi if source activate $CONDA_ENV_NAME 2> /dev/null then - conda env update -file automl_env.yml -n $CONDA_ENV_NAME + echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME + pip install --upgrade azureml-sdk[automl] else conda env create -f automl_env.yml -n $CONDA_ENV_NAME && source activate $CONDA_ENV_NAME && diff --git a/automl/automl_setup_mac.sh b/automl/automl_setup_mac.sh index 789f143f..452206ec 100644 --- a/automl/automl_setup_mac.sh +++ b/automl/automl_setup_mac.sh @@ -9,7 +9,8 @@ fi if source activate $CONDA_ENV_NAME 2> /dev/null then - conda env update -file automl_env.yml -n $CONDA_ENV_NAME + echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME + pip install --upgrade azureml-sdk[automl] else conda env create -f automl_env.yml -n $CONDA_ENV_NAME && source activate $CONDA_ENV_NAME && diff --git a/onnx/README.md b/onnx/README.md deleted file mode 100644 index 0b6d7890..00000000 --- a/onnx/README.md +++ /dev/null @@ -1,23 +0,0 @@ -# ONNX Runtime on Azure Machine Learning (AML) - -These tutorials show how to deploy pretrained [ONNX](http://onnx.ai) models on Azure virtual machines using [ONNX Runtime](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx) for inference. By the end of the tutorial, you will deploy a state-of-the-art deep learning model on a virtual machine in Azure Machine Learning, using ONNX Runtime for Inference. You can ping the model with your own images to be analyzed! - -## Tutorials -- [Handwritten Digit Classification (MNIST) using ONNX Runtime on AzureML](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-mnist.ipynb) -- [Facial Expression Recognition using ONNX Runtime on AzureML](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-emotion-recognition.ipynb) - -## Documentation -- [ONNX Runtime Python API Documentation](http://aka.ms/onnxruntime-python) -- [Azure Machine Learning API Documentation](http://aka.ms/aml-docs) - -## Related Articles -- [Building and Deploying ONNX Runtime Models](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx) -- [Azure AI – Making AI Real for Business](https://aka.ms/aml-blog-overview) -- [What’s new in Azure Machine Learning](https://aka.ms/aml-blog-whats-new) - - -## License - -Copyright (c) Microsoft Corporation. All rights reserved. -Licensed under the MIT License. - diff --git a/onnx/onnx-inference-emotion-recognition.ipynb b/onnx/onnx-inference-emotion-recognition.ipynb index 68787744..dde8259a 100644 --- a/onnx/onnx-inference-emotion-recognition.ipynb +++ b/onnx/onnx-inference-emotion-recognition.ipynb @@ -1,807 +1,732 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Facial Expression Recognition (Emotion FER+) using ONNX Runtime on Azure ML\n", - "\n", - "This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", - "\n", - "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", - "\n", - "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization. We use the CPU version of ONNX Runtime in this tutorial, but will soon be releasing an additional tutorial for deploying this model using ONNX Runtime GPU.\n", - "\n", - "#### Tutorial Objectives:\n", - "\n", - "1. Describe the FER+ dataset and pretrained Convolutional Neural Net ONNX model for Emotion Recognition, stored in the ONNX model zoo.\n", - "2. Deploy and run the pretrained FER+ ONNX model on an Azure Machine Learning instance\n", - "3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "### 1. Install Azure ML SDK and create a new workspace\n", - "Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n", - "\n", - "### 2. Install additional packages needed for this Notebook\n", - "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", - "\n", - "```sh\n", - "(myenv) $ pip install matplotlib onnx opencv-python\n", - "```\n", - "\n", - "**Debugging tip**: Make sure that to activate your virtual environment (myenv) before you re-launch this notebook using the `jupyter notebook` comand. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", - "\n", - "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", - "\n", - "In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# urllib is a built-in Python library to download files from URLs\n", - "\n", - "# Objective: retrieve the latest version of the ONNX Emotion FER+ model files from the\n", - "# ONNX Model Zoo and save it in the same folder as this tutorial\n", - "\n", - "import urllib.request\n", - "\n", - "onnx_model_url = \"https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz\"\n", - "\n", - "urllib.request.urlretrieve(onnx_model_url, filename=\"emotion_ferplus.tar.gz\")\n", - "\n", - "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", - "# code from the command line instead of the notebook kernel\n", - "\n", - "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", - "\n", - "!tar xvzf emotion_ferplus.tar.gz" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy a VM with your ONNX model in the Cloud\n", - "\n", - "### Load Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Registering your model with Azure ML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_dir = \"emotion_ferplus\" # replace this with the location of your model files\n", - "\n", - "# leave as is if it's in the same folder as this notebook" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(model_path = model_dir + \"/\" + \"model.onnx\",\n", - " model_name = \"onnx_emotion\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"FER+ emotion recognition CNN from ONNX Model Zoo\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Optional: Displaying your registered models\n", - "\n", - "This step is not required, so feel free to skip it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models()\n", - "for m in models:\n", - " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ONNX FER+ Model Methodology\n", - "\n", - "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n", - "\n", - "The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n", - "\n", - "You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n", - "\n", - "![](https://raw.githubusercontent.com/Microsoft/FERPlus/master/FER+vsFER.png)\n", - "\n", - "***Input: Photos of cropped faces from FER+ Dataset***\n", - "\n", - "***Task: Classify each facial image into its appropriate emotions in the emotion table***\n", - "\n", - "``` emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7} ```\n", - "\n", - "***Output: Emotion prediction for input image***\n", - "\n", - "\n", - "Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# for images and plots in this notebook\n", - "import matplotlib.pyplot as plt \n", - "from IPython.display import Image\n", - "\n", - "# display images inline\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Model Description\n", - "\n", - "The FER+ model from the ONNX Model Zoo is summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image from Barsoum et. al's paper [\"Training Deep Networks for Facial Expression Recognition\n", - "with Crowd-Sourced Label Distribution\"](https://arxiv.org/pdf/1608.01041.pdf), with our (64 x 64) input images and our output probabilities for each of the labels." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![](https://raw.githubusercontent.com/vinitra/FERPlus/master/emotion_model_img.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify our Score and Environment Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", - "\n", - "### Write Score File\n", - "\n", - "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import onnxruntime\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import time\n", - "\n", - "def init():\n", - " global session, input_name, output_name\n", - " model = Model.get_model_path(model_name = 'onnx_emotion')\n", - " session = onnxruntime.InferenceSession(model, None)\n", - " input_name = session.get_inputs()[0].name\n", - " output_name = session.get_outputs()[0].name \n", - " \n", - "def run(input_data):\n", - " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", - " We will call the run function later from our Jupyter Notebook \n", - " so our azure service can evaluate our model input in the cloud. '''\n", - "\n", - " try:\n", - " # load in our data, convert to readable format\n", - " data = np.array(json.loads(input_data)['data']).astype('float32')\n", - " \n", - " start = time.time()\n", - " r = session.run([output_name], {input_name : data})\n", - " end = time.time()\n", - " \n", - " result = emotion_map(postprocess(r[0]))\n", - " \n", - " result_dict = {\"result\": result,\n", - " \"time_in_sec\": [end - start]}\n", - " except Exception as e:\n", - " result_dict = {\"error\": str(e)}\n", - " \n", - " return json.dumps(result_dict)\n", - "\n", - "def emotion_map(classes, N=1):\n", - " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", - " top N emotional labels that fit the picture.\"\"\"\n", - " \n", - " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", - " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", - " \n", - " emotion_keys = list(emotion_table.keys())\n", - " emotions = []\n", - " for i in range(N):\n", - " emotions.append(emotion_keys[classes[i]])\n", - " return emotions\n", - "\n", - "def softmax(x):\n", - " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", - " x = x.reshape(-1)\n", - " e_x = np.exp(x - np.max(x))\n", - " return e_x / e_x.sum(axis=0)\n", - "\n", - "def postprocess(scores):\n", - " \"\"\"This function takes the scores generated by the network and \n", - " returns the class IDs in decreasing order of probability.\"\"\"\n", - " prob = softmax(scores)\n", - " prob = np.squeeze(prob)\n", - " classes = np.argsort(prob)[::-1]\n", - " return classes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write Environment File" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies()\n", - "myenv.add_pip_package(\"numpy\")\n", - "myenv.add_pip_package(\"azureml-core\")\n", - "myenv.add_pip_package(\"onnxruntime\")\n", - "\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the Container Image\n", - "\n", - "This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"Emotion ONNX Runtime container\",\n", - " tags = {\"demo\": \"onnx\"})\n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnxtest\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'ONNX for emotion recognition model')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'onnx-demo-emotion'\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - "\n", - " # If your deployment fails, make sure to delete your aci_service before trying again!\n", - " # aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Success!\n", - "\n", - "If you've made it this far, you've deployed a working VM with a facial emotion recognition model running in the cloud using Azure ML. Congratulations!\n", - "\n", - "Let's see how well our model deals with our test images." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Testing and Evaluation\n", - "\n", - "### Useful Helper Functions\n", - "\n", - "We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def emotion_map(classes, N=1):\n", - " \"\"\"Take the most probable labels (output of postprocess) and returns the \n", - " top N emotional labels that fit the picture.\"\"\"\n", - " \n", - " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n", - " 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", - " \n", - " emotion_keys = list(emotion_table.keys())\n", - " emotions = []\n", - " for i in range(N):\n", - " emotions.append(emotion_keys[classes[i]])\n", - " \n", - " return emotions\n", - "\n", - "def softmax(x):\n", - " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", - " x = x.reshape(-1)\n", - " e_x = np.exp(x - np.max(x))\n", - " return e_x / e_x.sum(axis=0)\n", - "\n", - "def postprocess(scores):\n", - " \"\"\"This function takes the scores generated by the network and \n", - " returns the class IDs in decreasing order of probability.\"\"\"\n", - " prob = softmax(scores)\n", - " prob = np.squeeze(prob)\n", - " classes = np.argsort(prob)[::-1]\n", - " return classes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load Test Data\n", - "\n", - "These are already in your directory from your ONNX model download (from the model zoo).\n", - "\n", - "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to manipulate our arrays\n", - "import numpy as np \n", - "\n", - "# read in test data protobuf files included with the model\n", - "import onnx\n", - "from onnx import numpy_helper\n", - "\n", - "# to use parsers to read in our model/data\n", - "import json\n", - "import os\n", - "\n", - "test_inputs = []\n", - "test_outputs = []\n", - "\n", - "# read in 3 testing images from .pb files\n", - "test_data_size = 3\n", - "\n", - "for i in np.arange(test_data_size):\n", - " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", - " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", - " \n", - " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", - " tensor = onnx.TensorProto()\n", - " with open(input_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " input_data = numpy_helper.to_array(tensor)\n", - " test_inputs.append(input_data)\n", - " \n", - " with open(output_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " output_data = numpy_helper.to_array(tensor)\n", - " output_processed = emotion_map(postprocess(output_data))[0]\n", - " test_outputs.append(output_processed)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Facial Expression Recognition using ONNX Runtime on AzureML\n", + "\n", + "This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", + "\n", + "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", + "\n", + "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization. We use the CPU version of ONNX Runtime in this tutorial, but will soon be releasing an additional tutorial for deploying this model using ONNX Runtime GPU.\n", + "\n", + "#### Tutorial Objectives:\n", + "\n", + "1. Describe the FER+ dataset and pretrained Convolutional Neural Net ONNX model for Emotion Recognition, stored in the ONNX model zoo.\n", + "2. Deploy and run the pretrained FER+ ONNX model on an Azure Machine Learning instance\n", + "3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "### 1. Install Azure ML SDK and create a new workspace\n", + "Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n", + "\n", + "\n", + "### 2. Install additional packages needed for this Notebook\n", + "You need to install the popular plotting library `matplotlib`, the image manipulation library `PIL`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", + "\n", + "```sh\n", + "(myenv) $ pip install matplotlib onnx Pillow\n", + "```\n", + "\n", + "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", + "\n", + "[Download the ONNX Emotion FER+ model and corresponding test data](https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n", + "\n", + "```sh\n", + "(myenv) $ tar xvzf emotion_ferplus.tar.gz\n", + "```\n", + "\n", + "More information can be found about the ONNX FER+ model on [github](https://github.com/onnx/models/tree/master/emotion_ferplus). For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.location, ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Registering your model with Azure ML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_dir = \"emotion_ferplus\" # replace this with the location of your model files\n", + "\n", + "# leave as is if it's in the same folder as this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path = model_dir + \"/\" + \"model.onnx\",\n", + " model_name = \"onnx_emotion\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"FER+ emotion recognition CNN from ONNX Model Zoo\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optional: Displaying your registered models\n", + "\n", + "This step is not required, so feel free to skip it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models()\n", + "for m in models:\n", + " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ONNX FER+ Model Methodology\n", + "\n", + "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n", + "\n", + "The original Facial Emotion Recognition (FER) Dataset was released in 2013, but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a better basis for ground truth. \n", + "\n", + "You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n", + "\n", + "![](https://raw.githubusercontent.com/Microsoft/FERPlus/master/FER+vsFER.png)\n", + "\n", + "***Input: Photos of cropped faces from FER+ Dataset***\n", + "\n", + "***Task: Classify each facial image into its appropriate emotions in the emotion table***\n", + "\n", + "``` emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7} ```\n", + "\n", + "***Output: Emotion prediction for input image***\n", + "\n", + "\n", + "Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# for images and plots in this notebook\n", + "import matplotlib.pyplot as plt \n", + "from IPython.display import Image\n", + "\n", + "# display images inline\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Description\n", + "\n", + "The FER+ model from the ONNX Model Zoo is summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image from Barsoum et. al's paper [\"Training Deep Networks for Facial Expression Recognition\n", + "with Crowd-Sourced Label Distribution\"](https://arxiv.org/pdf/1608.01041.pdf), with our (64 x 64) input images and our output probabilities for each of the labels." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://raw.githubusercontent.com/vinitra/FERPlus/master/emotion_model_img.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy our model on Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n", + "\n", + "You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", + "\n", + "### Write Score File\n", + "\n", + "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime GPU inference session to evaluate the data passed in on our function calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import onnxruntime\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import time\n", + "\n", + "def init():\n", + " global session, input_name, output_name\n", + " model = Model.get_model_path(model_name = 'onnx_emotion')\n", + " session = onnxruntime.InferenceSession(model, None)\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name \n", + " \n", + "def run(input_data):\n", + " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", + " We will call the run function later from our Jupyter Notebook \n", + " so our azure service can evaluate our model input in the cloud. '''\n", + "\n", + " try:\n", + " # load in our data, convert to readable format\n", + " data = np.array(json.loads(input_data)['data']).astype('float32')\n", + " start = time.time()\n", + " r = session.run([output_name], {input_name : data})\n", + " end = time.time()\n", + " result = emotion_map(postprocess(r[0]))\n", + " result_dict = {\"result\": result,\n", + " \"time_in_sec\": [end - start]}\n", + " except Exception as e:\n", + " result_dict = {\"error\": str(e)}\n", + " \n", + " return json.dumps(result_dict)\n", + "\n", + "def emotion_map(classes, N=1):\n", + " \"\"\"Take the most probable labels (output of postprocess) and returns the top N emotional labels that fit the picture.\"\"\"\n", + " \n", + " emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n", + " emotion_keys = list(emotion_table.keys())\n", + " emotions = []\n", + " for i in range(N):\n", + " emotions.append(emotion_keys[classes[i]])\n", + " return emotions\n", + "\n", + "def softmax(x):\n", + " \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n", + " x = x.reshape(-1)\n", + " e_x = np.exp(x - np.max(x))\n", + " return e_x / e_x.sum(axis=0)\n", + "\n", + "def postprocess(scores):\n", + " \"\"\"This function takes the scores generated by the network and returns the class IDs in decreasing \n", + " order of probability.\"\"\"\n", + " prob = softmax(scores)\n", + " prob = np.squeeze(prob)\n", + " classes = np.argsort(prob)[::-1]\n", + " return classes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write Environment File" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies()\n", + "myenv.add_pip_package(\"numpy\")\n", + "myenv.add_pip_package(\"azureml-core\")\n", + "myenv.add_pip_package(\"onnxruntime\")\n", + "\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the Container Image\n", + "\n", + "This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"test\",\n", + " tags = {\"demo\": \"onnx\"})\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnxtest\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Debugging\n", + "\n", + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all set! Let's get our model chugging.\n", + "\n", + "## Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'ONNX for emotion recognition model')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'onnx-demo-emotion'\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + "\n", + " # If your deployment fails, make sure to delete your aci_service before trying again!\n", + " # aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Success!\n", + "\n", + "If you've made it this far, you've deployed a working VM with a facial emotion recognition model running in the cloud using Azure ML. Congratulations!\n", + "\n", + "Let's see how well our model deals with our test images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Testing and Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Useful Helper Functions\n", + "\n", + "We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# to manipulate our arrays\n", + "import numpy as np \n", + "\n", + "# read in test data protobuf files included with the model\n", + "import onnx\n", + "from onnx import numpy_helper\n", + "\n", + "# to use parsers to read in our model/data\n", + "import json\n", + "import os\n", + "\n", + "from score import emotion_map, softmax, postprocess\n", + "\n", + "test_inputs = []\n", + "test_outputs = []\n", + "\n", + "# read in 3 testing images from .pb files\n", + "test_data_size = 3\n", + "\n", + "for i in np.arange(test_data_size):\n", + " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", + " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", + " \n", + " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", + " tensor = onnx.TensorProto()\n", + " with open(input_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " input_data = numpy_helper.to_array(tensor)\n", + " test_inputs.append(input_data)\n", + " \n", + " with open(output_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " output_data = numpy_helper.to_array(tensor)\n", + " output_processed = emotion_map(postprocess(output_data))[0]\n", + " test_outputs.append(output_processed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### Show some sample images\n", + "We use `matplotlib` to plot 3 test images from the model zoo with their labels over them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "plt.figure(figsize = (20, 20))\n", + "for test_image in np.arange(3):\n", + " test_inputs[test_image].reshape(1, 64, 64)\n", + " plt.subplot(1, 8, test_image+1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n", + " plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run evaluation / prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6), frameon=False)\n", + "plt.subplot(1, 8, 1)\n", + "\n", + "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", + "plt.text(x = 6, y = 18, s = \"(64 x 64)\", fontsize = 12, color = 'black')\n", + "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", + "\n", + "\n", + "for i in np.arange(test_data_size):\n", + " \n", + " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", + "\n", + " # predict using the deployed model\n", + " r = json.loads(aci_service.run(input_data))\n", + " \n", + " if \"error\" in r:\n", + " print(r['error'])\n", + " break\n", + " \n", + " result = r['result'][0][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " \n", + " ground_truth = test_outputs[i]\n", + " \n", + " # compare actual value vs. the predicted values:\n", + " plt.subplot(1, 8, i+2)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + "\n", + " # use different color for misclassified sample\n", + " font_color = 'red' if ground_truth != result else 'black'\n", + " clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n", + "\n", + " # ground truth labels are in blue\n", + " plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n", + " \n", + " # predictions are in black if correct, red if incorrect\n", + " plt.text(x = 10, y = -45, s = result, fontsize = 18, color = font_color)\n", + " plt.text(x = 5, y = -22, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", + "\n", + " \n", + " plt.imshow(test_inputs[i].reshape(64, 64), cmap = clr_map)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Try classifying your own images!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "\n", + "def preprocess(image_path):\n", + " input_shape = (1, 1, 64, 64)\n", + " img = Image.open(image_path)\n", + " img = img.resize((64, 64), Image.ANTIALIAS)\n", + " img_data = np.array(img)\n", + " img_data = np.resize(img_data, input_shape)\n", + " return img_data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Replace the following string with your own path/test image\n", + "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", + "\n", + "# Any PNG or JPG image file should work\n", + "# Make sure to include the entire path with // instead of /\n", + "\n", + "# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//emotion_test_images//img_1.png\"\n", + "\n", + "your_test_image = \"\"\n", + "\n", + "if your_test_image != \"\":\n", + " img = preprocess(your_test_image)\n", + " plt.subplot(1,3,1)\n", + " plt.imshow(img.reshape((64,64)), cmap = plt.cm.gray)\n", + "else:\n", + " img = None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if img is None:\n", + " print(\"Add the path for your image data.\")\n", + "else:\n", + " input_data = json.dumps({'data': img.tolist()})\n", + "\n", + " try:\n", + " r = json.loads(aci_service.run(input_data))\n", + " result = r['result'][0][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " except Exception as e:\n", + " print(str(e))\n", + "\n", + " plt.figure(figsize = (16, 6))\n", + " plt.subplot(1,8,1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = -10, y = -35, s = \"Model prediction: \", fontsize = 14)\n", + " plt.text(x = -10, y = -20, s = \"Inference time: \", fontsize = 14)\n", + " plt.text(x = 100, y = -35, s = str(result), fontsize = 14)\n", + " plt.text(x = 100, y = -20, s = str(time_ms) + \" ms\", fontsize = 14)\n", + " plt.text(x = -10, y = -8, s = \"Input image: \", fontsize = 14)\n", + " plt.imshow(img.reshape(64, 64), cmap = plt.cm.gray) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# remember to delete your service after you are done using it!\n", + "\n", + "# aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Congratulations!\n", + "\n", + "In this tutorial, you have:\n", + "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", + "- understood a state-of-the-art convolutional neural net image classification model (FER+ in ONNX) and deployed it in the Azure ML cloud\n", + "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", + "\n", + "Next steps:\n", + "- If you have not already, check out another interesting ONNX/AML application that lets you set up a state-of-the-art [handwritten image classification model (MNIST)](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-mnist.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model for handwritten digit classification in an Azure ML virtual machine.\n", + "- Keep an eye out for an updated version of this tutorial that uses ONNX Runtime GPU.\n", + "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" + ] } - }, - "source": [ - "### Show some sample images\n", - "We use `matplotlib` to plot 3 test images from the dataset." - ] + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "msauthor": "vinitra.swamy" }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "plt.figure(figsize = (20, 20))\n", - "for test_image in np.arange(3):\n", - " test_inputs[test_image].reshape(1, 64, 64)\n", - " plt.subplot(1, 8, test_image+1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n", - " plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.gray)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run evaluation / prediction" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6), frameon=False)\n", - "plt.subplot(1, 8, 1)\n", - "\n", - "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", - "plt.text(x = 6, y = 18, s = \"(64 x 64)\", fontsize = 12, color = 'black')\n", - "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", - "\n", - "\n", - "for i in np.arange(test_data_size):\n", - " \n", - " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", - "\n", - " # predict using the deployed model\n", - " r = json.loads(aci_service.run(input_data))\n", - " \n", - " if \"error\" in r:\n", - " print(r['error'])\n", - " break\n", - " \n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " \n", - " ground_truth = test_outputs[i]\n", - " \n", - " # compare actual value vs. the predicted values:\n", - " plt.subplot(1, 8, i+2)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - "\n", - " # use different color for misclassified sample\n", - " font_color = 'red' if ground_truth != result else 'black'\n", - " clr_map = plt.cm.Greys if ground_truth != result else plt.cm.gray\n", - "\n", - " # ground truth labels are in blue\n", - " plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n", - " \n", - " # predictions are in black if correct, red if incorrect\n", - " plt.text(x = 10, y = -45, s = result, fontsize = 18, color = font_color)\n", - " plt.text(x = 5, y = -22, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", - "\n", - " \n", - " plt.imshow(test_inputs[i].reshape(64, 64), cmap = clr_map)\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Try classifying your own images!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Preprocessing functions take your image and format it so it can be passed\n", - "# as input into our ONNX model\n", - "\n", - "import cv2\n", - "\n", - "def rgb2gray(rgb):\n", - " \"\"\"Convert the input image into grayscale\"\"\"\n", - " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", - "\n", - "def resize_img(img):\n", - " \"\"\"Resize image to MNIST model input dimensions\"\"\"\n", - " img = cv2.resize(img, dsize=(64, 64), interpolation=cv2.INTER_AREA)\n", - " img.resize((1, 1, 64, 64))\n", - " return img\n", - "\n", - "def preprocess(img):\n", - " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", - " if img.shape == (64, 64):\n", - " img.resize((1, 1, 64, 64))\n", - " return img\n", - " \n", - " grayscale = rgb2gray(img)\n", - " processed_img = resize_img(grayscale)\n", - " return processed_img" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Replace the following string with your own path/test image\n", - "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", - "\n", - "# Any PNG or JPG image file should work\n", - "# Make sure to include the entire path with // instead of /\n", - "\n", - "# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//emotion_test_images//img_1.png\"\n", - "\n", - "import matplotlib.image as mpimg\n", - "\n", - "if your_test_image != \"\":\n", - " img = mpimg.imread(your_test_image)\n", - " plt.subplot(1,3,1)\n", - " plt.imshow(img, cmap = plt.cm.Greys)\n", - " print(\"Old Dimensions: \", img.shape)\n", - " img = preprocess(img)\n", - " print(\"New Dimensions: \", img.shape)\n", - "else:\n", - " img = None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if img is None:\n", - " print(\"Add the path for your image data.\")\n", - "else:\n", - " input_data = json.dumps({'data': img.tolist()})\n", - "\n", - " try:\n", - " r = json.loads(aci_service.run(input_data))\n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " except Exception as e:\n", - " print(str(e))\n", - "\n", - " plt.figure(figsize = (16, 6))\n", - " plt.subplot(1,8,1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = -10, y = -40, s = \"Model prediction: \", fontsize = 14)\n", - " plt.text(x = -10, y = -25, s = \"Inference time: \", fontsize = 14)\n", - " plt.text(x = 100, y = -40, s = str(result), fontsize = 14)\n", - " plt.text(x = 100, y = -25, s = str(time_ms) + \" ms\", fontsize = 14)\n", - " plt.text(x = -10, y = -10, s = \"Model Input image: \", fontsize = 14)\n", - " plt.imshow(img.reshape((64, 64)), cmap = plt.cm.gray) \n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# remember to delete your service after you are done using it!\n", - "\n", - "aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Congratulations!\n", - "\n", - "In this tutorial, you have:\n", - "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", - "- understood a state-of-the-art convolutional neural net image classification model (FER+ in ONNX) and deployed it in the Azure ML cloud\n", - "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", - "\n", - "Next steps:\n", - "- If you have not already, check out another interesting ONNX/AML application that lets you set up a state-of-the-art [handwritten image classification model (MNIST)](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-mnist.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model for handwritten digit classification in an Azure ML virtual machine.\n", - "- Keep an eye out for an updated version of this tutorial that uses ONNX Runtime GPU.\n", - "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python [conda env:myenv]", - "language": "python", - "name": "conda-env-myenv-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "vinitra.swamy" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/onnx/onnx-inference-mnist.ipynb b/onnx/onnx-inference-mnist.ipynb index 42a8c2dc..7cd81c74 100644 --- a/onnx/onnx-inference-mnist.ipynb +++ b/onnx/onnx-inference-mnist.ipynb @@ -1,799 +1,768 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Handwritten Digit Classification (MNIST) using ONNX Runtime on Azure ML\n", - "\n", - "This example shows how to deploy an image classification neural network using the Modified National Institute of Standards and Technology ([MNIST](http://yann.lecun.com/exdb/mnist/)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing number from 0 to 9. This tutorial will show you how to deploy a MNIST model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", - "\n", - "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", - "\n", - "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization.\n", - "\n", - "#### Tutorial Objectives:\n", - "\n", - "- Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n", - "- Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n", - "- Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "### 1. Install Azure ML SDK and create a new workspace\n", - "Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n", - "\n", - "### 2. Install additional packages needed for this tutorial notebook\n", - "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n", - "\n", - "```sh\n", - "(myenv) $ pip install matplotlib onnx opencv-python\n", - "```\n", - "\n", - "**Debugging tip**: Make sure that you run the \"jupyter notebook\" command to launch this notebook after activating your virtual environment. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n", - "\n", - "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", - "\n", - "In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# urllib is a built-in Python library to download files from URLs\n", - "\n", - "# Objective: retrieve the latest version of the ONNX MNIST model files from the\n", - "# ONNX Model Zoo and save it in the same folder as this tutorial\n", - "\n", - "import urllib.request\n", - "\n", - "onnx_model_url = \"https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz\"\n", - "\n", - "urllib.request.urlretrieve(onnx_model_url, filename=\"mnist.tar.gz\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# the ! magic command tells our jupyter notebook kernel to run the following line of \n", - "# code from the command line instead of the notebook kernel\n", - "\n", - "# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n", - "\n", - "!tar xvzf mnist.tar.gz" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy a VM with your ONNX model in the Cloud\n", - "\n", - "### Load Azure ML workspace\n", - "\n", - "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Registering your model with Azure ML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_dir = \"mnist\" # replace this with the location of your model files\n", - "\n", - "# leave as is if it's in the same folder as this notebook" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model.register(workspace = ws,\n", - " model_path = model_dir + \"/\" + \"model.onnx\",\n", - " model_name = \"mnist_1\",\n", - " tags = {\"onnx\": \"demo\"},\n", - " description = \"MNIST image classification CNN from ONNX Model Zoo\",)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Optional: Displaying your registered models\n", - "\n", - "This step is not required, so feel free to skip it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models()\n", - "for m in models:\n", - " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Handwritten Digit Classification (MNIST) using ONNX Runtime on AzureML\n", + "\n", + "This example shows how to deploy an image classification neural network using the Modified National Institute of Standards and Technology ([MNIST](http://yann.lecun.com/exdb/mnist/)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing number from 0 to 9. This tutorial will show you how to deploy a MNIST model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n", + "\n", + "Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai) and [open source files](https://github.com/onnx).\n", + "\n", + "[ONNX Runtime](https://aka.ms/onnxruntime-python) is the runtime engine that enables evaluation of trained machine learning (Traditional ML and Deep Learning) models with high performance and low resource utilization.\n", + "\n", + "#### Tutorial Objectives:\n", + "\n", + "1. Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n", + "2. Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n", + "3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "### 1. Install Azure ML SDK and create a new workspace\n", + "Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n", + "\n", + "### 2. Install additional packages needed for this Notebook\n", + "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", + "\n", + "```sh\n", + "(myenv) $ pip install matplotlib onnx opencv-python\n", + "```\n", + "\n", + "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", + "\n", + "[Download the ONNX MNIST model and corresponding test data](https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n", + "\n", + "```sh\n", + "(myenv) $ tar xvzf mnist.tar.gz\n", + "```\n", + "\n", + "More information can be found about the ONNX MNIST model on [github](https://github.com/onnx/models/tree/master/mnist). For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Azure ML workspace\n", + "\n", + "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Registering your model with Azure ML" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_dir = \"mnist\" # replace this with the location of your model files\n", + "\n", + "# leave as is if it's in the same folder as this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path = model_dir + \"//model.onnx\",\n", + " model_name = \"mnist_1\",\n", + " tags = {\"onnx\": \"demo\"},\n", + " description = \"MNIST image classification CNN from ONNX Model Zoo\",\n", + " workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optional: Displaying your registered models\n", + "\n", + "This step is not required, so feel free to skip it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "models = ws.models()\n", + "for m in models:\n", + " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### ONNX MNIST Model Methodology\n", + "\n", + "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n", + "\n", + "***Input: Handwritten Images from MNIST Dataset***\n", + "\n", + "***Task: Classify each MNIST image into an appropriate digit***\n", + "\n", + "***Output: Digit prediction for input image***\n", + "\n", + "Run the cell below to look at some of the sample images from the MNIST dataset that we used to train this ONNX model. Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# for images and plots in this notebook\n", + "import matplotlib.pyplot as plt \n", + "from IPython.display import Image\n", + "\n", + "# display images inline\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image(url=\"http://3.bp.blogspot.com/_UpN7DfJA0j4/TJtUBWPk0SI/AAAAAAAAABY/oWPMtmqJn3k/s1600/mnist_originals.png\", width=200, height=200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy our model on Azure ML" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n", + "\n", + "You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", + "\n", + "### Write Score File\n", + "\n", + "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import onnxruntime\n", + "import sys\n", + "import os\n", + "from azureml.core.model import Model\n", + "import time\n", + "\n", + "\n", + "def init():\n", + " global session, input_name, output_name\n", + " model = Model.get_model_path(model_name = 'mnist_1')\n", + " session = onnxruntime.InferenceSession(model, None)\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name \n", + " \n", + "def run(input_data):\n", + " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", + " We will call the run function later from our Jupyter Notebook \n", + " so our azure service can evaluate our model input in the cloud. '''\n", + "\n", + " try:\n", + " # load in our data, convert to readable format\n", + " data = np.array(json.loads(input_data)['data']).astype('float32')\n", + "\n", + " start = time.time()\n", + " r = session.run([output_name], {input_name: data})[0]\n", + " end = time.time()\n", + " result = choose_class(r[0])\n", + " result_dict = {\"result\": [result],\n", + " \"time_in_sec\": [end - start]}\n", + " except Exception as e:\n", + " result_dict = {\"error\": str(e)}\n", + " \n", + " return json.dumps(result_dict)\n", + "\n", + "def choose_class(result_prob):\n", + " \"\"\"We use argmax to determine the right label to choose from our output, after calling softmax on the 10 numbers we receive\"\"\"\n", + " return int(np.argmax(result_prob, axis=0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write Environment File" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This step creates a YAML file that specifies which dependencies we would like to see in our Linux Virtual Machine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies()\n", + "myenv.add_pip_package(\"numpy\")\n", + "myenv.add_pip_package(\"azureml-core\")\n", + "myenv.add_pip_package(\"onnxruntime\")\n", + "\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the Container Image\n", + "\n", + "This step will likely take a few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"test\",\n", + " tags = {\"demo\": \"onnx\"}) )\n", + "\n", + "\n", + "image = ContainerImage.create(name = \"onnxtest\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Debugging\n", + "\n", + "In case you need to debug your code, the next line of code accesses the log file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(image.image_build_log_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We're all set! Let's get our model chugging.\n", + "\n", + "## Deploy the container image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'demo': 'onnx'}, \n", + " description = 'ONNX for mnist model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will likely take a few minutes to run as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "\n", + "aci_service_name = 'onnx-demo-mnist'\n", + "print(\"Service\", aci_service_name)\n", + "\n", + "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", + " image = image,\n", + " name = aci_service_name,\n", + " workspace = ws)\n", + "\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())\n", + "\n", + " # If your deployment fails, make sure to delete your aci_service or rename your service before trying again!\n", + " # aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Success!\n", + "\n", + "If you've made it this far, you've deployed a working VM with a handwritten digit classifier running in the cloud using Azure ML. Congratulations!\n", + "\n", + "Let's see how well our model deals with our test images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Testing and Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Test Data\n", + "\n", + "These are already in your directory from your ONNX model download (from the model zoo). If you didn't place your model and test data in the same directory as this notebook, edit the \"model_dir\" filename below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# to manipulate our arrays\n", + "import numpy as np \n", + "\n", + "# read in test data protobuf files included with the model\n", + "import onnx\n", + "from onnx import numpy_helper\n", + "\n", + "# to use parsers to read in our model/data\n", + "import json\n", + "import os\n", + "\n", + "test_inputs = []\n", + "test_outputs = []\n", + "\n", + "# read in 3 testing images from .pb files\n", + "test_data_size = 3\n", + "\n", + "for i in np.arange(test_data_size):\n", + " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", + " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", + " \n", + " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", + " tensor = onnx.TensorProto()\n", + " with open(input_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " input_data = numpy_helper.to_array(tensor)\n", + " test_inputs.append(input_data)\n", + " \n", + " with open(output_test_data, 'rb') as f:\n", + " tensor.ParseFromString(f.read())\n", + " \n", + " output_data = numpy_helper.to_array(tensor)\n", + " test_outputs.append(output_data)\n", + " \n", + "if len(test_inputs) == test_data_size:\n", + " print('Test data loaded successfully.')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "### Show some sample images\n", + "We use `matplotlib` to plot 3 test images from the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6))\n", + "for test_image in np.arange(3):\n", + " plt.subplot(1, 15, test_image+1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.imshow(test_inputs[test_image].reshape(28, 28), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run evaluation / prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize = (16, 6), frameon=False)\n", + "plt.subplot(1, 8, 1)\n", + "\n", + "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", + "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", + "plt.text(x = 6, y = 18, s = \"(28 x 28)\", fontsize = 12, color = 'black')\n", + "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", + "\n", + "\n", + "for i in np.arange(test_data_size):\n", + " \n", + " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", + " \n", + " # predict using the deployed model\n", + " r = json.loads(aci_service.run(input_data))\n", + " \n", + " if \"error\" in r:\n", + " print(r['error'])\n", + " break\n", + " \n", + " result = r['result'][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " \n", + " ground_truth = int(np.argmax(test_outputs[i]))\n", + " \n", + " # compare actual value vs. the predicted values:\n", + " plt.subplot(1, 8, i+2)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + "\n", + " # use different color for misclassified sample\n", + " font_color = 'red' if ground_truth != result else 'black'\n", + " clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n", + "\n", + " # ground truth labels are in blue\n", + " plt.text(x = 10, y = -30, s = ground_truth, fontsize = 18, color = 'blue')\n", + " \n", + " # predictions are in black if correct, red if incorrect\n", + " plt.text(x = 10, y = -20, s = result, fontsize = 18, color = font_color)\n", + " plt.text(x = 5, y = -10, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", + "\n", + " \n", + " plt.imshow(test_inputs[i].reshape(28, 28), cmap = clr_map)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Try classifying your own images!\n", + "\n", + "Create your own handwritten image and pass it into the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Preprocessing functions\n", + "import cv2\n", + "\n", + "def rgb2gray(rgb):\n", + " \"\"\"Convert the input image into grayscale\"\"\"\n", + " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", + "\n", + "def resize_img(img):\n", + " img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n", + " img.resize((1, 1, 28, 28))\n", + " return img\n", + "\n", + "def preprocess(img):\n", + " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", + " grayscale = rgb2gray(img)\n", + " processed_img = resize_img(grayscale)\n", + " return processed_img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Replace this string with your own path/test image\n", + "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", + "\n", + "# Any PNG or JPG image file should work\n", + "# Make sure to include the entire path with // instead of /\n", + "\n", + "# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//digit.png\"\n", + "\n", + "your_test_image = \"\"\n", + "\n", + "import matplotlib.image as mpimg\n", + "\n", + "if your_test_image != \"\":\n", + " img = mpimg.imread(your_test_image)\n", + " plt.subplot(1,3,1)\n", + " plt.imshow(img, cmap = plt.cm.Greys)\n", + " print(\"Old Dimensions: \", img.shape)\n", + " img = preprocess(img)\n", + " print(\"New Dimensions: \", img.shape)\n", + "else:\n", + " img = None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if img is None:\n", + " print(\"Add the path for your image data.\")\n", + "else:\n", + " input_data = json.dumps({'data': img.tolist()})\n", + "\n", + " try:\n", + " r = json.loads(aci_service.run(input_data))\n", + " result = r['result'][0]\n", + " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", + " except Exception as e:\n", + " print(str(e))\n", + "\n", + " plt.figure(figsize = (16, 6))\n", + " plt.subplot(1, 15,1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = -100, y = -20, s = \"Model prediction: \", fontsize = 14)\n", + " plt.text(x = -100, y = -10, s = \"Inference time: \", fontsize = 14)\n", + " plt.text(x = 0, y = -20, s = str(result), fontsize = 14)\n", + " plt.text(x = 0, y = -10, s = str(time_ms) + \" ms\", fontsize = 14)\n", + " plt.text(x = -100, y = 14, s = \"Input image: \", fontsize = 14)\n", + " plt.imshow(img.reshape(28, 28), cmap = plt.cm.gray) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optional: How does our ONNX MNIST model work? \n", + "#### A brief explanation of Convolutional Neural Networks\n", + "\n", + "A [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN, or ConvNet) is a type of [feed-forward](https://en.wikipedia.org/wiki/Feedforward_neural_network) artificial neural network made up of neurons that have learnable weights and biases. The CNNs take advantage of the spatial nature of the data. In nature, we perceive different objects by their shapes, size and colors. For example, objects in a natural scene are typically edges, corners/vertices (defined by two of more edges), color patches etc. These primitives are often identified using different detectors (e.g., edge detection, color detector) or combination of detectors interacting to facilitate image interpretation (object classification, region of interest detection, scene description etc.) in real world vision related tasks. These detectors are also known as filters. Convolution is a mathematical operator that takes an image and a filter as input and produces a filtered output (representing say edges, corners, or colors in the input image). \n", + "\n", + "Historically, these filters are a set of weights that were often hand crafted or modeled with mathematical functions (e.g., [Gaussian](https://en.wikipedia.org/wiki/Gaussian_filter) / [Laplacian](http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm) / [Canny](https://en.wikipedia.org/wiki/Canny_edge_detector) filter). The filter outputs are mapped through non-linear activation functions mimicking human brain cells called [neurons](https://en.wikipedia.org/wiki/Neuron). Popular deep CNNs or ConvNets (such as [AlexNet](https://en.wikipedia.org/wiki/AlexNet), [VGG](https://arxiv.org/abs/1409.1556), [Inception](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf), [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf)) that are used for various [computer vision](https://en.wikipedia.org/wiki/Computer_vision) tasks have many of these architectural primitives (inspired from biology). \n", + "\n", + "### Convolution Layer\n", + "\n", + "A convolution layer is a set of filters. Each filter is defined by a weight (**W**) matrix, and bias ($b$).\n", + "\n", + "![](https://www.cntk.ai/jup/cntk103d_filterset_v2.png)\n", + "\n", + "These filters are scanned across the image performing the dot product between the weights and corresponding input value ($x$). The bias value is added to the output of the dot product and the resulting sum is optionally mapped through an activation function. This process is illustrated in the following animation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image(url=\"https://www.cntk.ai/jup/cntk103d_conv2d_final.gif\", width= 200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Description\n", + "\n", + "The MNIST model from the ONNX Model Zoo uses maxpooling to update the weights in its convolutions, summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image, with our input images and our output probabilities of each of our 10 labels. If you're interested in exploring the logic behind creating a Deep Learning model further, please look at the [training tutorial for our ONNX MNIST Convolutional Neural Network](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Max-Pooling for Convolutional Neural Nets\n", + "\n", + "![](http://www.cntk.ai/jup/c103d_max_pooling.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Pre-Trained Model Architecture\n", + "\n", + "![](http://www.cntk.ai/jup/conv103d_mnist-conv-mp.png)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# remember to delete your service after you are done using it!\n", + "\n", + "# aci_service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Congratulations!\n", + "\n", + "In this tutorial, you have:\n", + "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", + "- understood a state-of-the-art convolutional neural net image classification model (MNIST in ONNX) and deployed it in Azure ML cloud\n", + "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", + "\n", + "Next steps:\n", + "- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine with GPU support.\n", + "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" + ] } - }, - "source": [ - "### ONNX MNIST Model Methodology\n", - "\n", - "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n", - "\n", - "***Input: Handwritten Images from MNIST Dataset***\n", - "\n", - "***Task: Classify each MNIST image into an appropriate digit***\n", - "\n", - "***Output: Digit prediction for input image***\n", - "\n", - "Run the cell below to look at some of the sample images from the MNIST dataset that we used to train this ONNX model. Remember, once the application is deployed in Azure ML, you can use your own images as input for the model to classify!" - ] + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "msauthor": "vinitra.swamy" }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# for images and plots in this notebook\n", - "import matplotlib.pyplot as plt \n", - "from IPython.display import Image\n", - "\n", - "# display images inline\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Image(url=\"http://3.bp.blogspot.com/_UpN7DfJA0j4/TJtUBWPk0SI/AAAAAAAAABY/oWPMtmqJn3k/s1600/mnist_originals.png\", width=200, height=200)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify our Score and Environment Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n", - "\n", - "### Write Score File\n", - "\n", - "A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import onnxruntime\n", - "import sys\n", - "import os\n", - "from azureml.core.model import Model\n", - "import time\n", - "\n", - "\n", - "def init():\n", - " global session, input_name, output_name\n", - " model = Model.get_model_path(model_name = 'mnist_1')\n", - " session = onnxruntime.InferenceSession(model, None)\n", - " input_name = session.get_inputs()[0].name\n", - " output_name = session.get_outputs()[0].name \n", - " \n", - "def run(input_data):\n", - " '''Purpose: evaluate test input in Azure Cloud using onnxruntime.\n", - " We will call the run function later from our Jupyter Notebook \n", - " so our azure service can evaluate our model input in the cloud. '''\n", - "\n", - " try:\n", - " # load in our data, convert to readable format\n", - " data = np.array(json.loads(input_data)['data']).astype('float32')\n", - "\n", - " start = time.time()\n", - " r = session.run([output_name], {input_name: data})[0]\n", - " end = time.time()\n", - " result = choose_class(r[0])\n", - " result_dict = {\"result\": [result],\n", - " \"time_in_sec\": [end - start]}\n", - " except Exception as e:\n", - " result_dict = {\"error\": str(e)}\n", - " \n", - " return json.dumps(result_dict)\n", - "\n", - "def choose_class(result_prob):\n", - " \"\"\"We use argmax to determine the right label to choose from our output\"\"\"\n", - " return int(np.argmax(result_prob, axis=0))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write Environment File\n", - "\n", - "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies()\n", - "myenv.add_pip_package(\"numpy\")\n", - "myenv.add_pip_package(\"azureml-core\")\n", - "myenv.add_pip_package(\"onnxruntime\")\n", - "\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the Container Image\n", - "This step will likely take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "help(ContainerImage.image_configuration)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " description = \"MNIST ONNX Runtime container\",\n", - " tags = {\"demo\": \"onnx\"}) \n", - "\n", - "\n", - "image = ContainerImage.create(name = \"onnxtest\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case you need to debug your code, the next line of code accesses the log file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n", - "\n", - "### Deploy the container image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'demo': 'onnx'}, \n", - " description = 'ONNX for mnist model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following cell will likely take a few minutes to run as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'onnx-demo-mnist20'\n", - "print(\"Service\", aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if aci_service.state != 'Healthy':\n", - " # run this command for debugging.\n", - " print(aci_service.get_logs())\n", - "\n", - " # If your deployment fails, make sure to delete your aci_service or rename your service before trying again!\n", - " # aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Success!\n", - "\n", - "If you've made it this far, you've deployed a working VM with a handwritten digit classifier running in the cloud using Azure ML. Congratulations!\n", - "\n", - "Let's see how well our model deals with our test images." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Testing and Evaluation\n", - "\n", - "### Load Test Data\n", - "\n", - "These are already in your directory from your ONNX model download (from the model zoo).\n", - "\n", - "Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to manipulate our arrays\n", - "import numpy as np \n", - "\n", - "# read in test data protobuf files included with the model\n", - "import onnx\n", - "from onnx import numpy_helper\n", - "\n", - "# to use parsers to read in our model/data\n", - "import json\n", - "import os\n", - "\n", - "test_inputs = []\n", - "test_outputs = []\n", - "\n", - "# read in 3 testing images from .pb files\n", - "test_data_size = 3\n", - "\n", - "for i in np.arange(test_data_size):\n", - " input_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n", - " output_test_data = os.path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n", - " \n", - " # convert protobuf tensors to np arrays using the TensorProto reader from ONNX\n", - " tensor = onnx.TensorProto()\n", - " with open(input_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " input_data = numpy_helper.to_array(tensor)\n", - " test_inputs.append(input_data)\n", - " \n", - " with open(output_test_data, 'rb') as f:\n", - " tensor.ParseFromString(f.read())\n", - " \n", - " output_data = numpy_helper.to_array(tensor)\n", - " test_outputs.append(output_data)\n", - " \n", - "if len(test_inputs) == test_data_size:\n", - " print('Test data loaded successfully.')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "### Show some sample images\n", - "We use `matplotlib` to plot 3 test images from the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6))\n", - "for test_image in np.arange(3):\n", - " plt.subplot(1, 15, test_image+1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.imshow(test_inputs[test_image].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run evaluation / prediction" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.figure(figsize = (16, 6), frameon=False)\n", - "plt.subplot(1, 8, 1)\n", - "\n", - "plt.text(x = 0, y = -30, s = \"True Label: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -20, s = \"Result: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 0, y = -10, s = \"Inference Time: \", fontsize = 13, color = 'black')\n", - "plt.text(x = 3, y = 14, s = \"Model Input\", fontsize = 12, color = 'black')\n", - "plt.text(x = 6, y = 18, s = \"(28 x 28)\", fontsize = 12, color = 'black')\n", - "plt.imshow(np.ones((28,28)), cmap=plt.cm.Greys) \n", - "\n", - "\n", - "for i in np.arange(test_data_size):\n", - " \n", - " input_data = json.dumps({'data': test_inputs[i].tolist()})\n", - " \n", - " # predict using the deployed model\n", - " r = json.loads(aci_service.run(input_data))\n", - " \n", - " if \"error\" in r:\n", - " print(r['error'])\n", - " break\n", - " \n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " \n", - " ground_truth = int(np.argmax(test_outputs[i]))\n", - " \n", - " # compare actual value vs. the predicted values:\n", - " plt.subplot(1, 8, i+2)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - "\n", - " # use different color for misclassified sample\n", - " font_color = 'red' if ground_truth != result else 'black'\n", - " clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n", - "\n", - " # ground truth labels are in blue\n", - " plt.text(x = 10, y = -30, s = ground_truth, fontsize = 18, color = 'blue')\n", - " \n", - " # predictions are in black if correct, red if incorrect\n", - " plt.text(x = 10, y = -20, s = result, fontsize = 18, color = font_color)\n", - " plt.text(x = 5, y = -10, s = str(time_ms) + ' ms', fontsize = 14, color = font_color)\n", - "\n", - " \n", - " plt.imshow(test_inputs[i].reshape(28, 28), cmap = clr_map)\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Try classifying your own images!\n", - "\n", - "Create your own handwritten image and pass it into the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Preprocessing functions take your image and format it so it can be passed\n", - "# as input into our ONNX model\n", - "\n", - "import cv2\n", - "\n", - "def rgb2gray(rgb):\n", - " \"\"\"Convert the input image into grayscale\"\"\"\n", - " return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n", - "\n", - "def resize_img(img):\n", - " \"\"\"Resize image to MNIST model input dimensions\"\"\"\n", - " img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n", - " img.resize((1, 1, 28, 28))\n", - " return img\n", - "\n", - "def preprocess(img):\n", - " \"\"\"Resize input images and convert them to grayscale.\"\"\"\n", - " if img.shape == (28, 28):\n", - " img.resize((1, 1, 28, 28))\n", - " return img\n", - " \n", - " grayscale = rgb2gray(img)\n", - " processed_img = resize_img(grayscale)\n", - " return processed_img" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Replace this string with your own path/test image\n", - "# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n", - "\n", - "# Any PNG or JPG image file should work\n", - "\n", - "# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/handwritten_digit.png\"\n", - "\n", - "import matplotlib.image as mpimg\n", - "\n", - "if your_test_image != \"\":\n", - " img = mpimg.imread(your_test_image)\n", - " plt.subplot(1,3,1)\n", - " plt.imshow(img, cmap = plt.cm.Greys)\n", - " print(\"Old Dimensions: \", img.shape)\n", - " img = preprocess(img)\n", - " print(\"New Dimensions: \", img.shape)\n", - "else:\n", - " img = None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if img is None:\n", - " print(\"Add the path for your image data.\")\n", - "else:\n", - " input_data = json.dumps({'data': img.tolist()})\n", - "\n", - " try:\n", - " r = json.loads(aci_service.run(input_data))\n", - " result = r['result'][0]\n", - " time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n", - " except Exception as e:\n", - " print(str(e))\n", - "\n", - " plt.figure(figsize = (16, 6))\n", - " plt.subplot(1, 15,1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = -100, y = -20, s = \"Model prediction: \", fontsize = 14)\n", - " plt.text(x = -100, y = -10, s = \"Inference time: \", fontsize = 14)\n", - " plt.text(x = 0, y = -20, s = str(result), fontsize = 14)\n", - " plt.text(x = 0, y = -10, s = str(time_ms) + \" ms\", fontsize = 14)\n", - " plt.text(x = -100, y = 14, s = \"Input image: \", fontsize = 14)\n", - " plt.imshow(img.reshape(28, 28), cmap = plt.cm.gray) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional: How does our ONNX MNIST model work? \n", - "#### A brief explanation of Convolutional Neural Networks\n", - "\n", - "A [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN, or ConvNet) is a type of [feed-forward](https://en.wikipedia.org/wiki/Feedforward_neural_network) artificial neural network made up of neurons that have learnable weights and biases. The CNNs take advantage of the spatial nature of the data. In nature, we perceive different objects by their shapes, size and colors. For example, objects in a natural scene are typically edges, corners/vertices (defined by two of more edges), color patches etc. These primitives are often identified using different detectors (e.g., edge detection, color detector) or combination of detectors interacting to facilitate image interpretation (object classification, region of interest detection, scene description etc.) in real world vision related tasks. These detectors are also known as filters. Convolution is a mathematical operator that takes an image and a filter as input and produces a filtered output (representing say edges, corners, or colors in the input image). \n", - "\n", - "Historically, these filters are a set of weights that were often hand crafted or modeled with mathematical functions (e.g., [Gaussian](https://en.wikipedia.org/wiki/Gaussian_filter) / [Laplacian](http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm) / [Canny](https://en.wikipedia.org/wiki/Canny_edge_detector) filter). The filter outputs are mapped through non-linear activation functions mimicking human brain cells called [neurons](https://en.wikipedia.org/wiki/Neuron). Popular deep CNNs or ConvNets (such as [AlexNet](https://en.wikipedia.org/wiki/AlexNet), [VGG](https://arxiv.org/abs/1409.1556), [Inception](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf), [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf)) that are used for various [computer vision](https://en.wikipedia.org/wiki/Computer_vision) tasks have many of these architectural primitives (inspired from biology). \n", - "\n", - "### Convolution Layer\n", - "\n", - "A convolution layer is a set of filters. Each filter is defined by a weight (**W**) matrix, and bias ($b$).\n", - "\n", - "![](https://www.cntk.ai/jup/cntk103d_filterset_v2.png)\n", - "\n", - "These filters are scanned across the image performing the dot product between the weights and corresponding input value ($x$). The bias value is added to the output of the dot product and the resulting sum is optionally mapped through an activation function. This process is illustrated in the following animation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Image(url=\"https://www.cntk.ai/jup/cntk103d_conv2d_final.gif\", width= 200)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Model Description\n", - "\n", - "The MNIST model from the ONNX Model Zoo uses maxpooling to update the weights in its convolutions, summarized by the graphic below. You can see the entire workflow of our pre-trained model in the following image, with our input images and our output probabilities of each of our 10 labels. If you're interested in exploring the logic behind creating a Deep Learning model further, please look at the [training tutorial for our ONNX MNIST Convolutional Neural Network](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb). " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Max-Pooling for Convolutional Neural Nets\n", - "\n", - "![](http://www.cntk.ai/jup/c103d_max_pooling.gif)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Pre-Trained Model Architecture\n", - "\n", - "![](http://www.cntk.ai/jup/conv103d_mnist-conv-mp.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# remember to delete your service after you are done using it!\n", - "\n", - "aci_service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Congratulations!\n", - "\n", - "In this tutorial, you have:\n", - "- familiarized yourself with ONNX Runtime inference and the pretrained models in the ONNX model zoo\n", - "- understood a state-of-the-art convolutional neural net image classification model (MNIST in ONNX) and deployed it in Azure ML cloud\n", - "- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n", - "\n", - "Next steps:\n", - "- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine.\n", - "- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python [conda env:myenv]", - "language": "python", - "name": "conda-env-myenv-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "vinitra.swamy" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/pipeline/00.pipeline-setup.ipynb b/pipeline/00.pipeline-setup.ipynb index 3dac5110..022522ff 100644 --- a/pipeline/00.pipeline-setup.ipynb +++ b/pipeline/00.pipeline-setup.ipynb @@ -1,76 +1,76 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install pandas\n", + "!pip install requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Widgets\n", + "Install the following widgets to see the status of each run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!jupyter nbextension install --py --user azureml.train.widgets\n", + "!jupyter nbextension enable --py --user azureml.train.widgets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.3" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Packages" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install pandas\n", - "!pip install requests" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Widgets\n", - "Install the following widgets to see the status of each run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!jupyter nbextension install --py --user azureml.train.widgets\n", - "!jupyter nbextension enable --py --user azureml.train.widgets" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.3" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/pipeline/pipeline-batch-scoring.ipynb b/pipeline/pipeline-batch-scoring.ipynb index 3a244ed2..62fd8cea 100644 --- a/pipeline/pipeline-batch-scoring.ipynb +++ b/pipeline/pipeline-batch-scoring.ipynb @@ -1,622 +1,622 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook demonstrates how to run batch scoring job. __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and unlabeled images from __[ImageNet](http://image-net.org/)__ dataset will be used. It registers a pretrained inception model in model registry then uses the model to do batch scoring on images in a blob container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azureml.core import Workspace, Run, Experiment\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", + "\n", + "# Also create a Project and attach to Workspace\n", + "project_folder = \"sample_projects\"\n", + "run_history_name = project_folder\n", + "\n", + "if not os.path.isdir(project_folder):\n", + " os.mkdir(project_folder)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import BatchAiCompute, ComputeTarget\n", + "from azureml.core.datastore import Datastore\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.pipeline.core import Pipeline, PipelineData\n", + "from azureml.pipeline.steps import PythonScriptStep\n", + "from azureml.core.runconfig import CondaDependencies, RunConfiguration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create and attach Compute targets\n", + "Use the below code to create and attach Compute targets. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Batch AI compute\n", + "cluster_name = \"gpu_cluster\"\n", + "try:\n", + " cluster = BatchAiCompute(ws, cluster_name)\n", + " print(\"found existing cluster.\")\n", + "except:\n", + " print(\"creating new cluster\")\n", + " provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", + " autoscale_enabled = True,\n", + " cluster_min_nodes = 0, \n", + " cluster_max_nodes = 1)\n", + "\n", + " # create the cluster\n", + " cluster = ComputeTarget.create(ws, cluster_name, provisioning_config)\n", + " cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python scripts to run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python scripts that run the batch scoring. `batchai_score.py` takes input images in `dataset_path`, pretrained models in `model_dir` and outputs a `results-label.txt` to `output_dir`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $project_folder/batchai_score.py\n", + "import os\n", + "import argparse\n", + "import datetime,time\n", + "import tensorflow as tf\n", + "from math import ceil\n", + "import numpy as np\n", + "import shutil\n", + "from tensorflow.contrib.slim.python.slim.nets import inception_v3\n", + "from azureml.core.model import Model\n", + "\n", + "slim = tf.contrib.slim\n", + "\n", + "parser = argparse.ArgumentParser(description=\"Start a tensorflow model serving\")\n", + "parser.add_argument('--model_name', dest=\"model_name\", required=True)\n", + "parser.add_argument('--label_dir', dest=\"label_dir\", required=True)\n", + "parser.add_argument('--dataset_path', dest=\"dataset_path\", required=True)\n", + "parser.add_argument('--output_dir', dest=\"output_dir\", required=True)\n", + "parser.add_argument('--batch_size', dest=\"batch_size\", type=int, required=True)\n", + "\n", + "args = parser.parse_args()\n", + "\n", + "image_size = 299\n", + "num_channel = 3\n", + "\n", + "# create output directory if it does not exist\n", + "os.makedirs(args.output_dir, exist_ok=True)\n", + "\n", + "def get_class_label_dict(label_file):\n", + " label = []\n", + " proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()\n", + " for l in proto_as_ascii_lines:\n", + " label.append(l.rstrip())\n", + " return label\n", + "\n", + "\n", + "class DataIterator:\n", + " def __init__(self, data_dir):\n", + " self.file_paths = []\n", + " image_list = os.listdir(data_dir)\n", + " total_size = len(image_list)\n", + " self.file_paths = [data_dir + '/' + file_name.rstrip() for file_name in image_list ]\n", + "\n", + " self.labels = [1 for file_name in self.file_paths]\n", + "\n", + " @property\n", + " def size(self):\n", + " return len(self.labels)\n", + "\n", + " def input_pipeline(self, batch_size):\n", + " images_tensor = tf.convert_to_tensor(self.file_paths, dtype=tf.string)\n", + " labels_tensor = tf.convert_to_tensor(self.labels, dtype=tf.int64)\n", + " input_queue = tf.train.slice_input_producer([images_tensor, labels_tensor], shuffle=False)\n", + " labels = input_queue[1]\n", + " images_content = tf.read_file(input_queue[0])\n", + "\n", + " image_reader = tf.image.decode_jpeg(images_content, channels=num_channel, name=\"jpeg_reader\")\n", + " float_caster = tf.cast(image_reader, tf.float32)\n", + " new_size = tf.constant([image_size, image_size], dtype=tf.int32)\n", + " images = tf.image.resize_images(float_caster, new_size)\n", + " images = tf.divide(tf.subtract(images, [0]), [255])\n", + "\n", + " image_batch, label_batch = tf.train.batch([images, labels], batch_size=batch_size, capacity=5 * batch_size)\n", + " return image_batch\n", + "\n", + "def main(_):\n", + " start_time = datetime.datetime.now()\n", + " label_file_name = os.path.join(args.label_dir, \"labels.txt\")\n", + " label_dict = get_class_label_dict(label_file_name)\n", + " classes_num = len(label_dict)\n", + " test_feeder = DataIterator(data_dir=args.dataset_path)\n", + " total_size = len(test_feeder.labels)\n", + " count = 0\n", + " # get model from model registry\n", + " model_path = Model.get_model_path(args.model_name)\n", + " with tf.Session() as sess:\n", + " test_images = test_feeder.input_pipeline(batch_size=args.batch_size)\n", + " with slim.arg_scope(inception_v3.inception_v3_arg_scope()):\n", + " input_images = tf.placeholder(tf.float32, [args.batch_size, image_size, image_size, num_channel])\n", + " logits, _ = inception_v3.inception_v3(input_images,\n", + " num_classes=classes_num,\n", + " is_training=False)\n", + " probabilities = tf.argmax(logits, 1)\n", + "\n", + " sess.run(tf.global_variables_initializer())\n", + " sess.run(tf.local_variables_initializer())\n", + " coord = tf.train.Coordinator()\n", + " threads = tf.train.start_queue_runners(sess=sess, coord=coord)\n", + " saver = tf.train.Saver()\n", + " saver.restore(sess, model_path)\n", + " out_filename = os.path.join(args.output_dir, \"result-labels.txt\")\n", + " with open(out_filename, \"w\") as result_file:\n", + " i = 0\n", + " while count < total_size and not coord.should_stop():\n", + " test_images_batch = sess.run(test_images)\n", + " file_names_batch = test_feeder.file_paths[i*args.batch_size: min(test_feeder.size, (i+1)*args.batch_size)]\n", + " results = sess.run(probabilities, feed_dict={input_images: test_images_batch})\n", + " new_add = min(args.batch_size, total_size-count)\n", + " count += new_add\n", + " i += 1\n", + " for j in range(new_add):\n", + " result_file.write(os.path.basename(file_names_batch[j]) + \": \" + label_dict[results[j]] + \"\\n\")\n", + " result_file.flush()\n", + " coord.request_stop()\n", + " coord.join(threads)\n", + " \n", + " # copy the file to artifacts\n", + " shutil.copy(out_filename, \"./outputs/\")\n", + " # Move the processed data out of the blob so that the next run can process the data.\n", + "\n", + "if __name__ == \"__main__\":\n", + " tf.app.run()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Model and Input data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create directory for model\n", + "model_dir = 'models'\n", + "if not os.path.isdir(model_dir):\n", + " os.mkdir(model_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download Model\n", + "This manual step is required to register the model to the workspace\n", + "\n", + "Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to model_dir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get samples images and upload to Datastore\n", + "This manual step is required to run batchai_score.py\n", + "\n", + "Download and extract sample images from ImageNet evaluation set and **upload** to a blob that will be registered as a Datastore in the next step\n", + "\n", + "A copy of sample images from ImageNet evaluation set can be found at __[BatchAI Samples Blob](https://batchaisamples.blob.core.windows.net/samples/imagenet_samples.zip?st=2017-09-29T18%3A29%3A00Z&se=2099-12-31T08%3A00%3A00Z&sp=rl&sv=2016-05-31&sr=c&sig=PmhL%2BYnYAyNTZr1DM2JySvrI12e%2F4wZNIwCtf7TRI%2BM%3D)__ \n", + "\n", + "There are multiple ways to create folders and upload files into Azure Blob Container - you can use __[Azure Portal](https://ms.portal.azure.com/)__, __[Storage Explorer](http://storageexplorer.com/)__, __[Azure CLI2](https://render.githubusercontent.com/azure-cli-extension)__ or Azure SDK for your preferable programming language. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "account_name = \"batchscoringdata\"\n", + "sample_data = Datastore.register_azure_blob_container(ws, \"sampledata\", \"sampledata\", \n", + " account_name=account_name, \n", + " overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Output datastore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We write the outputs to the default datastore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "default_ds = \"workspaceblobstore\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Specify where the data is stored or will be written to" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.pipeline.core import Pipeline, PipelineData\n", + "from azureml.core import Datastore\n", + "from azureml.core import Experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "input_images = DataReference(datastore=sample_data, \n", + " data_reference_name=\"input_images\",\n", + " path_on_datastore=\"batchscoring/images\",\n", + " mode=\"download\"\n", + " )\n", + "model_dir = DataReference(datastore=sample_data, \n", + " data_reference_name=\"input_model\",\n", + " path_on_datastore=\"batchscoring/models\",\n", + " mode=\"download\" \n", + " )\n", + "label_dir = DataReference(datastore=sample_data, \n", + " data_reference_name=\"input_labels\",\n", + " path_on_datastore=\"batchscoring/labels\",\n", + " mode=\"download\" \n", + " )\n", + "output_dir = PipelineData(name=\"scores\", \n", + " datastore_name=default_ds, \n", + " output_path_on_compute=\"batchscoring/results\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the model with Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "from azureml.core.model import Model\n", + "\n", + "# register downloaded model \n", + "model = Model.register(model_path = \"models/inception_v3.ckpt\",\n", + " model_name = \"inception\", # this is the name the model is registered as\n", + " tags = {'pretrained': \"inception\"},\n", + " description = \"Imagenet trained tensorflow inception\",\n", + " workspace = ws)\n", + "# remove the downloaded dir after registration if you wish\n", + "shutil.rmtree(\"models\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Specify environment to run the script" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.4.0\", \"azureml-defaults\"])\n", + "\n", + "# Runconfig\n", + "batchai_run_config = RunConfiguration(conda_dependencies=cd)\n", + "batchai_run_config.environment.docker.enabled = True\n", + "batchai_run_config.environment.docker.gpu_support = True\n", + "batchai_run_config.environment.docker.base_image = \"microsoft/mmlspark:gpu-0.12\"\n", + "batchai_run_config.environment.spark.precache_packages = False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Steps to run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A subset of the parameters to the python script can be given as input when we re-run a `PublishedPipeline`. In the current example, we define `batch_size` taken by the script as such parameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core.graph import PipelineParameter\n", + "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "inception_model_name = \"inception_v3.ckpt\"\n", + "\n", + "batch_score_step = PythonScriptStep(\n", + " name=\"batch ai scoring\",\n", + " script_name=\"batchai_score.py\",\n", + " arguments=[\"--dataset_path\", input_images, \n", + " \"--model_name\", \"inception\",\n", + " \"--label_dir\", label_dir, \n", + " \"--output_dir\", output_dir, \n", + " \"--batch_size\", batch_size_param],\n", + " target=cluster,\n", + " inputs=[input_images, label_dir],\n", + " outputs=[output_dir],\n", + " runconfig=batchai_run_config,\n", + " source_directory=project_folder\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", + "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_params={\"param_batch_size\": 20})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Monitor run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(pipeline_run).show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Download and review output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "step_run = list(pipeline_run.get_children())[0]\n", + "step_run.download_file(\"./outputs/result-labels.txt\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", + "df.columns = [\"Filename\", \"Prediction\"]\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Publish a pipeline and rerun using a REST call" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a published pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "published_pipeline = pipeline_run.publish_pipeline(\n", + " name=\"Inception v3 scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n", + "\n", + "published_id = published_pipeline.id" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Rerun using REST call" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get AAD token" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.authentication import AzureCliAuthentication\n", + "import requests\n", + "\n", + "cli_auth = AzureCliAuthentication()\n", + "aad_token = cli_auth.get_authentication_header()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run published pipeline using its REST endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import PublishedPipeline\n", + "\n", + "rest_endpoint = PublishedPipeline.get_endpoint(published_id, ws)\n", + "# specify batch size when running the pipeline\n", + "response = requests.post(rest_endpoint, headers=aad_token, json={\"param_batch_size\": 50})\n", + "run_id = response.json()[\"Id\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Monitor the new run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core.run import PipelineRun\n", + "published_pipeline_run = PipelineRun(ws.experiments()[\"batch_scoring\"], run_id)\n", + "\n", + "RunDetails(published_pipeline_run).show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This notebook demonstrates how to run batch scoring job. __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and unlabeled images from __[ImageNet](http://image-net.org/)__ dataset will be used. It registers a pretrained inception model in model registry then uses the model to do batch scoring on images in a blob container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from azureml.core import Workspace, Run, Experiment\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", - "\n", - "# Also create a Project and attach to Workspace\n", - "project_folder = \"sample_projects\"\n", - "run_history_name = project_folder\n", - "\n", - "if not os.path.isdir(project_folder):\n", - " os.mkdir(project_folder)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import BatchAiCompute, ComputeTarget\n", - "from azureml.core.datastore import Datastore\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep\n", - "from azureml.core.runconfig import CondaDependencies, RunConfiguration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create and attach Compute targets\n", - "Use the below code to create and attach Compute targets. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Batch AI compute\n", - "cluster_name = \"gpu_cluster\"\n", - "try:\n", - " cluster = BatchAiCompute(ws, cluster_name)\n", - " print(\"found existing cluster.\")\n", - "except:\n", - " print(\"creating new cluster\")\n", - " provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", - " autoscale_enabled = True,\n", - " cluster_min_nodes = 0, \n", - " cluster_max_nodes = 1)\n", - "\n", - " # create the cluster\n", - " cluster = ComputeTarget.create(ws, cluster_name, provisioning_config)\n", - " cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Python scripts to run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Python scripts that run the batch scoring. `batchai_score.py` takes input images in `dataset_path`, pretrained models in `model_dir` and outputs a `results-label.txt` to `output_dir`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $project_folder/batchai_score.py\n", - "import os\n", - "import argparse\n", - "import datetime,time\n", - "import tensorflow as tf\n", - "from math import ceil\n", - "import numpy as np\n", - "import shutil\n", - "from tensorflow.contrib.slim.python.slim.nets import inception_v3\n", - "from azureml.core.model import Model\n", - "\n", - "slim = tf.contrib.slim\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Start a tensorflow model serving\")\n", - "parser.add_argument('--model_name', dest=\"model_name\", required=True)\n", - "parser.add_argument('--label_dir', dest=\"label_dir\", required=True)\n", - "parser.add_argument('--dataset_path', dest=\"dataset_path\", required=True)\n", - "parser.add_argument('--output_dir', dest=\"output_dir\", required=True)\n", - "parser.add_argument('--batch_size', dest=\"batch_size\", type=int, required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "image_size = 299\n", - "num_channel = 3\n", - "\n", - "# create output directory if it does not exist\n", - "os.makedirs(args.output_dir, exist_ok=True)\n", - "\n", - "def get_class_label_dict(label_file):\n", - " label = []\n", - " proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()\n", - " for l in proto_as_ascii_lines:\n", - " label.append(l.rstrip())\n", - " return label\n", - "\n", - "\n", - "class DataIterator:\n", - " def __init__(self, data_dir):\n", - " self.file_paths = []\n", - " image_list = os.listdir(data_dir)\n", - " total_size = len(image_list)\n", - " self.file_paths = [data_dir + '/' + file_name.rstrip() for file_name in image_list ]\n", - "\n", - " self.labels = [1 for file_name in self.file_paths]\n", - "\n", - " @property\n", - " def size(self):\n", - " return len(self.labels)\n", - "\n", - " def input_pipeline(self, batch_size):\n", - " images_tensor = tf.convert_to_tensor(self.file_paths, dtype=tf.string)\n", - " labels_tensor = tf.convert_to_tensor(self.labels, dtype=tf.int64)\n", - " input_queue = tf.train.slice_input_producer([images_tensor, labels_tensor], shuffle=False)\n", - " labels = input_queue[1]\n", - " images_content = tf.read_file(input_queue[0])\n", - "\n", - " image_reader = tf.image.decode_jpeg(images_content, channels=num_channel, name=\"jpeg_reader\")\n", - " float_caster = tf.cast(image_reader, tf.float32)\n", - " new_size = tf.constant([image_size, image_size], dtype=tf.int32)\n", - " images = tf.image.resize_images(float_caster, new_size)\n", - " images = tf.divide(tf.subtract(images, [0]), [255])\n", - "\n", - " image_batch, label_batch = tf.train.batch([images, labels], batch_size=batch_size, capacity=5 * batch_size)\n", - " return image_batch\n", - "\n", - "def main(_):\n", - " start_time = datetime.datetime.now()\n", - " label_file_name = os.path.join(args.label_dir, \"labels.txt\")\n", - " label_dict = get_class_label_dict(label_file_name)\n", - " classes_num = len(label_dict)\n", - " test_feeder = DataIterator(data_dir=args.dataset_path)\n", - " total_size = len(test_feeder.labels)\n", - " count = 0\n", - " # get model from model registry\n", - " model_path = Model.get_model_path(args.model_name)\n", - " with tf.Session() as sess:\n", - " test_images = test_feeder.input_pipeline(batch_size=args.batch_size)\n", - " with slim.arg_scope(inception_v3.inception_v3_arg_scope()):\n", - " input_images = tf.placeholder(tf.float32, [args.batch_size, image_size, image_size, num_channel])\n", - " logits, _ = inception_v3.inception_v3(input_images,\n", - " num_classes=classes_num,\n", - " is_training=False)\n", - " probabilities = tf.argmax(logits, 1)\n", - "\n", - " sess.run(tf.global_variables_initializer())\n", - " sess.run(tf.local_variables_initializer())\n", - " coord = tf.train.Coordinator()\n", - " threads = tf.train.start_queue_runners(sess=sess, coord=coord)\n", - " saver = tf.train.Saver()\n", - " saver.restore(sess, model_path)\n", - " out_filename = os.path.join(args.output_dir, \"result-labels.txt\")\n", - " with open(out_filename, \"w\") as result_file:\n", - " i = 0\n", - " while count < total_size and not coord.should_stop():\n", - " test_images_batch = sess.run(test_images)\n", - " file_names_batch = test_feeder.file_paths[i*args.batch_size: min(test_feeder.size, (i+1)*args.batch_size)]\n", - " results = sess.run(probabilities, feed_dict={input_images: test_images_batch})\n", - " new_add = min(args.batch_size, total_size-count)\n", - " count += new_add\n", - " i += 1\n", - " for j in range(new_add):\n", - " result_file.write(os.path.basename(file_names_batch[j]) + \": \" + label_dict[results[j]] + \"\\n\")\n", - " result_file.flush()\n", - " coord.request_stop()\n", - " coord.join(threads)\n", - " \n", - " # copy the file to artifacts\n", - " shutil.copy(out_filename, \"./outputs/\")\n", - " # Move the processed data out of the blob so that the next run can process the data.\n", - "\n", - "if __name__ == \"__main__\":\n", - " tf.app.run()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare Model and Input data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create directory for model\n", - "model_dir = 'models'\n", - "if not os.path.isdir(model_dir):\n", - " os.mkdir(model_dir)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download Model\n", - "This manual step is required to register the model to the workspace\n", - "\n", - "Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to model_dir" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get samples images and upload to Datastore\n", - "This manual step is required to run batchai_score.py\n", - "\n", - "Download and extract sample images from ImageNet evaluation set and **upload** to a blob that will be registered as a Datastore in the next step\n", - "\n", - "A copy of sample images from ImageNet evaluation set can be found at __[BatchAI Samples Blob](https://batchaisamples.blob.core.windows.net/samples/imagenet_samples.zip?st=2017-09-29T18%3A29%3A00Z&se=2099-12-31T08%3A00%3A00Z&sp=rl&sv=2016-05-31&sr=c&sig=PmhL%2BYnYAyNTZr1DM2JySvrI12e%2F4wZNIwCtf7TRI%2BM%3D)__ \n", - "\n", - "There are multiple ways to create folders and upload files into Azure Blob Container - you can use __[Azure Portal](https://ms.portal.azure.com/)__, __[Storage Explorer](http://storageexplorer.com/)__, __[Azure CLI2](https://render.githubusercontent.com/azure-cli-extension)__ or Azure SDK for your preferable programming language. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "account_name = \"batchscoringdata\"\n", - "sample_data = Datastore.register_azure_blob_container(ws, \"sampledata\", \"sampledata\", \n", - " account_name=account_name, \n", - " overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Output datastore" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We write the outputs to the default datastore" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "default_ds = \"workspaceblobstore\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Specify where the data is stored or will be written to" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.core import Datastore\n", - "from azureml.core import Experiment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "input_images = DataReference(datastore=sample_data, \n", - " data_reference_name=\"input_images\",\n", - " path_on_datastore=\"batchscoring/images\",\n", - " mode=\"download\"\n", - " )\n", - "model_dir = DataReference(datastore=sample_data, \n", - " data_reference_name=\"input_model\",\n", - " path_on_datastore=\"batchscoring/models\",\n", - " mode=\"download\" \n", - " )\n", - "label_dir = DataReference(datastore=sample_data, \n", - " data_reference_name=\"input_labels\",\n", - " path_on_datastore=\"batchscoring/labels\",\n", - " mode=\"download\" \n", - " )\n", - "output_dir = PipelineData(name=\"scores\", \n", - " datastore_name=default_ds, \n", - " output_path_on_compute=\"batchscoring/results\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register the model with Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "from azureml.core.model import Model\n", - "\n", - "# register downloaded model \n", - "model = Model.register(model_path = \"models/inception_v3.ckpt\",\n", - " model_name = \"inception\", # this is the name the model is registered as\n", - " tags = {'pretrained': \"inception\"},\n", - " description = \"Imagenet trained tensorflow inception\",\n", - " workspace = ws)\n", - "# remove the downloaded dir after registration if you wish\n", - "shutil.rmtree(\"models\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Specify environment to run the script" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.4.0\", \"azureml-defaults\"])\n", - "\n", - "# Runconfig\n", - "batchai_run_config = RunConfiguration(conda_dependencies=cd)\n", - "batchai_run_config.environment.docker.enabled = True\n", - "batchai_run_config.environment.docker.gpu_support = True\n", - "batchai_run_config.environment.docker.base_image = \"microsoft/mmlspark:gpu-0.12\"\n", - "batchai_run_config.environment.spark.precache_packages = False" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Steps to run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A subset of the parameters to the python script can be given as input when we re-run a `PublishedPipeline`. In the current example, we define `batch_size` taken by the script as such parameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.graph import PipelineParameter\n", - "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "inception_model_name = \"inception_v3.ckpt\"\n", - "\n", - "batch_score_step = PythonScriptStep(\n", - " name=\"batch ai scoring\",\n", - " script_name=\"batchai_score.py\",\n", - " arguments=[\"--dataset_path\", input_images, \n", - " \"--model_name\", \"inception\",\n", - " \"--label_dir\", label_dir, \n", - " \"--output_dir\", output_dir, \n", - " \"--batch_size\", batch_size_param],\n", - " target=cluster,\n", - " inputs=[input_images, label_dir],\n", - " outputs=[output_dir],\n", - " runconfig=batchai_run_config,\n", - " source_directory=project_folder\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", - "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_params={\"param_batch_size\": 20})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Monitor run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Download and review output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "step_run = list(pipeline_run.get_children())[0]\n", - "step_run.download_file(\"./outputs/result-labels.txt\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", - "df.columns = [\"Filename\", \"Prediction\"]\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Publish a pipeline and rerun using a REST call" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a published pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline = pipeline_run.publish_pipeline(\n", - " name=\"Inception v3 scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n", - "\n", - "published_id = published_pipeline.id" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Rerun using REST call" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get AAD token" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.authentication import AzureCliAuthentication\n", - "import requests\n", - "\n", - "cli_auth = AzureCliAuthentication()\n", - "aad_token = cli_auth.get_authentication_header()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Run published pipeline using its REST endpoint" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import PublishedPipeline\n", - "\n", - "rest_endpoint = PublishedPipeline.get_endpoint(published_id, ws)\n", - "# specify batch size when running the pipeline\n", - "response = requests.post(rest_endpoint, headers=aad_token, json={\"param_batch_size\": 50})\n", - "run_id = response.json()[\"Id\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Monitor the new run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.run import PipelineRun\n", - "published_pipeline_run = PipelineRun(ws.experiments()[\"batch_scoring\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run).show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/project-brainwave/project-brainwave-custom-weights.ipynb b/project-brainwave/project-brainwave-custom-weights.ipynb index 0c352679..21248261 100644 --- a/project-brainwave/project-brainwave-custom-weights.ipynb +++ b/project-brainwave/project-brainwave-custom-weights.ipynb @@ -1,617 +1,617 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Model Development with Custom Weights" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This example shows how to retrain a model with custom weights and fine-tune the model with quantization, then deploy the model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet50 as a featurizer. We don't use the last layer of ResNet50 in this case and instead add our own classification layer using Keras.\n", + "\n", + "The custom wegiths are trained with ImageNet on ResNet50. We will use the Kaggle Cats and Dogs dataset to retrain and fine-tune the model. The dataset can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\"). \n", + "\n", + "Please set up your environment as described in the [quick start](project-brainwave-quickstart.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import sys\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "from keras import backend as K" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup Environment\n", + "After you train your model in float32, you'll write the weights to a place on disk. We also need a location to store the models that get downloaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_weights_dir = os.path.expanduser(\"~/custom-weights\")\n", + "saved_model_dir = os.path.expanduser(\"~/models\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Data\n", + "Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run relatively quickly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import glob\n", + "import imghdr\n", + "datadir = os.path.expanduser(\"~/catsanddogs\")\n", + "\n", + "cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n", + "dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n", + "\n", + "# Limit the data set to make the notebook execute quickly.\n", + "cat_files = cat_files[:64]\n", + "dog_files = dog_files[:64]\n", + "\n", + "# The data set has a few images that are not jpeg. Remove them.\n", + "cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n", + "dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n", + "\n", + "if(not len(cat_files) or not len(dog_files)):\n", + " print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n", + " raise ValueError(\"Data not found\")\n", + "else:\n", + " print(cat_files[0])\n", + " print(dog_files[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Construct a numpy array as labels\n", + "image_paths = cat_files + dog_files\n", + "total_files = len(cat_files) + len(dog_files)\n", + "labels = np.zeros(total_files)\n", + "labels[len(cat_files):] = 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split images data as training data and test data\n", + "from sklearn.model_selection import train_test_split\n", + "onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n", + "img_train, img_test, label_train, label_test = train_test_split(image_paths, onehot_labels, random_state=42, shuffle=True)\n", + "\n", + "print(len(img_train), len(img_test), label_train.shape, label_test.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct Model\n", + "We use ResNet50 for the featuirzer and build our own classifier using Keras layers. We train the featurizer and the classifier as one model. The weights trained on ImageNet are used as the starting point for the retraining of our featurizer. The weights are loaded from tensorflow chkeckpoint files." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before passing image dataset to the ResNet50 featurizer, we need to preprocess the input file to get it into the form expected by ResNet50. ResNet50 expects float tensors representing the images in BGR, channel last order. We've provided a default implementation of the preprocessing that you can use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.contrib.brainwave.models.utils as utils\n", + "\n", + "def preprocess_images():\n", + " # Convert images to 3D tensors [width,height,channel] - channels are in BGR order.\n", + " in_images = tf.placeholder(tf.string)\n", + " image_tensors = utils.preprocess_array(in_images)\n", + " return in_images, image_tensors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We use Keras layer APIs to construct the classifier. Because we're using the tensorflow backend, we can train this classifier in one session with our Resnet50 model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def construct_classifier(in_tensor):\n", + " from keras.layers import Dropout, Dense, Flatten\n", + " K.set_session(tf.get_default_session())\n", + " \n", + " FC_SIZE = 1024\n", + " NUM_CLASSES = 2\n", + "\n", + " x = Dropout(0.2, input_shape=(1, 1, 2048,))(in_tensor)\n", + " x = Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,))(x)\n", + " x = Flatten()(x)\n", + " preds = Dense(NUM_CLASSES, activation='softmax', input_dim=FC_SIZE, name='classifier_output')(x)\n", + " return preds" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now every component of the model is defined, we can construct the model. Constructing the model with the project brainwave models is two steps - first we import the graph definition, then we restore the weights of the model into a tensorflow session. Because the quantized graph defintion and the float32 graph defintion share the same node names in the graph definitions, we can initally train the weights in float32, and then reload them with the quantized operations (which take longer) to fine-tune the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def construct_model(quantized, starting_weights_directory = None):\n", + " from azureml.contrib.brainwave.models import Resnet50, QuantizedResnet50\n", + " \n", + " # Convert images to 3D tensors [width,height,channel]\n", + " in_images, image_tensors = preprocess_images()\n", + "\n", + " # Construct featurizer using quantized or unquantized ResNet50 model\n", + " if not quantized:\n", + " featurizer = Resnet50(saved_model_dir)\n", + " else:\n", + " featurizer = QuantizedResnet50(saved_model_dir, custom_weights_directory = starting_weights_directory)\n", + "\n", + "\n", + " features = featurizer.import_graph_def(input_tensor=image_tensors)\n", + " # Construct classifier\n", + " preds = construct_classifier(features)\n", + " \n", + " # Initialize weights\n", + " sess = tf.get_default_session()\n", + " tf.global_variables_initializer().run()\n", + "\n", + " featurizer.restore_weights(sess)\n", + "\n", + " return in_images, image_tensors, features, preds, featurizer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train Model\n", + "First we train the model with custom weights but without quantization. Training is done with native float precision (32-bit floats). We load the traing data set and batch the training with 10 epochs. When the performance reaches desired level or starts decredation, we stop the training iteration and save the weights as tensorflow checkpoint files. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def read_files(files):\n", + " \"\"\" Read files to array\"\"\"\n", + " contents = []\n", + " for path in files:\n", + " with open(path, 'rb') as f:\n", + " contents.append(f.read())\n", + " return contents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def train_model(preds, in_images, img_train, label_train, is_retrain = False, train_epoch = 10):\n", + " \"\"\" training model \"\"\"\n", + " from keras.objectives import binary_crossentropy\n", + " from tqdm import tqdm\n", + " \n", + " learning_rate = 0.001 if is_retrain else 0.01\n", + " \n", + " # Specify the loss function\n", + " in_labels = tf.placeholder(tf.float32, shape=(None, 2)) \n", + " cross_entropy = tf.reduce_mean(binary_crossentropy(in_labels, preds))\n", + " optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)\n", + "\n", + " def chunks(a, b, n):\n", + " \"\"\"Yield successive n-sized chunks from a and b.\"\"\"\n", + " if (len(a) != len(b)):\n", + " print(\"a and b are not equal in chunks(a,b,n)\")\n", + " raise ValueError(\"Parameter error\")\n", + "\n", + " for i in range(0, len(a), n):\n", + " yield a[i:i + n], b[i:i + n]\n", + "\n", + " chunk_size = 16\n", + " chunk_num = len(label_train) / chunk_size\n", + "\n", + " sess = tf.get_default_session()\n", + " for epoch in range(train_epoch):\n", + " avg_loss = 0\n", + " for img_chunk, label_chunk in tqdm(chunks(img_train, label_train, chunk_size)):\n", + " contents = read_files(img_chunk)\n", + " _, loss = sess.run([optimizer, cross_entropy],\n", + " feed_dict={in_images: contents,\n", + " in_labels: label_chunk,\n", + " K.learning_phase(): 1})\n", + " avg_loss += loss / chunk_num\n", + " print(\"Epoch:\", (epoch + 1), \"loss = \", \"{:.3f}\".format(avg_loss))\n", + " \n", + " # Reach desired performance\n", + " if (avg_loss < 0.001):\n", + " break" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def test_model(preds, in_images, img_test, label_test):\n", + " \"\"\"Test the model\"\"\"\n", + " from keras.metrics import categorical_accuracy\n", + "\n", + " in_labels = tf.placeholder(tf.float32, shape=(None, 2))\n", + " accuracy = tf.reduce_mean(categorical_accuracy(in_labels, preds))\n", + " contents = read_files(img_test)\n", + "\n", + " accuracy = accuracy.eval(feed_dict={in_images: contents,\n", + " in_labels: label_test,\n", + " K.learning_phase(): 0})\n", + " return accuracy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Launch the training\n", + "tf.reset_default_graph()\n", + "sess = tf.Session(graph=tf.get_default_graph())\n", + "\n", + "with sess.as_default():\n", + " in_images, image_tensors, features, preds, featurizer = construct_model(quantized=False)\n", + " train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10) \n", + " accuracy = test_model(preds, in_images, img_test, label_test) \n", + " print(\"Accuracy:\", accuracy)\n", + " featurizer.save_weights(custom_weights_dir + \"/rn50\", tf.get_default_session())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test Model\n", + "After training, we evaluate the trained model's accuracy on test dataset with quantization. So that we know the model's performance if it is deployed on the FPGA." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.Session(graph=tf.get_default_graph())\n", + "\n", + "with sess.as_default():\n", + " print(\"Testing trained model with quantization\")\n", + " in_images, image_tensors, features, preds, quantized_featurizer = construct_model(quantized=True, starting_weights_directory=custom_weights_dir)\n", + " accuracy = test_model(preds, in_images, img_test, label_test) \n", + " print(\"Accuracy:\", accuracy)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Fine-Tune Model\n", + "Sometimes, the model's accuracy can drop significantly after quantization. In those cases, we need to retrain the model enabled with quantization to get better model accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if (accuracy < 0.93):\n", + " with sess.as_default():\n", + " print(\"Fine-tuning model with quantization\")\n", + " train_model(preds, in_images, img_train, label_train, is_retrain=True, train_epoch=10)\n", + " accuracy = test_model(preds, in_images, img_test, label_test) \n", + " print(\"Accuracy:\", accuracy)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Service Definition\n", + "Like in the QuickStart notebook our service definition pipeline consists of three stages. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage\n", + "\n", + "model_def_path = os.path.join(saved_model_dir, 'model_def.zip')\n", + "\n", + "model_def = ModelDefinition()\n", + "model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n", + "model_def.pipeline.append(BrainWaveStage(sess, quantized_featurizer))\n", + "model_def.pipeline.append(TensorflowStage(sess, features, preds))\n", + "model_def.save(model_def_path)\n", + "print(model_def_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy\n", + "Go to our [GitHub repo](https://aka.ms/aml-real-time-ai) \"docs\" folder to learn how to create a Model Management Account and find the required information below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "from azureml.core.image import Image\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", + "\n", + "model_name = \"catsanddogs-resnet50-model\"\n", + "image_name = \"catsanddogs-resnet50-image\"\n", + "service_name = \"modelbuild-service\"\n", + "\n", + "registered_model = Model.register(ws, service_def_path, model_name)\n", + "\n", + "image_config = BrainwaveImage.image_configuration()\n", + "deployment_config = BrainwaveWebservice.deploy_configuration()\n", + " \n", + "try:\n", + " service = Webservice(ws, service_name)\n", + " service.delete()\n", + " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", + "except WebserviceException:\n", + " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The service is now running in Azure and ready to serve requests. We can check the address and port." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.ipAddress + ':' + str(service.port))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Client\n", + "There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.client import PredictionClient\n", + "client = PredictionClient(service.ipAddress, service.port)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can adapt the client [code](../../pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](../../sample-clients/csharp).\n", + "\n", + "The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Request\n", + "Let's see how our service does on a few images. It may get a few wrong." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify an image to classify\n", + "print('CATS')\n", + "for image_file in cat_files[:8]:\n", + " results = client.score_image(image_file)\n", + " result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n", + " print(result + str(results))\n", + "print('DOGS')\n", + "for image_file in dog_files[:8]:\n", + " results = client.score_image(image_file)\n", + " result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n", + " print(result + str(results))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "Run the cell below to delete your service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "License for plot_confusion_matrix:\n", + "\n", + "New BSD License\n", + "\n", + "Copyright (c) 2007-2018 The scikit-learn developers.\n", + "All rights reserved.\n", + "\n", + "\n", + "Redistribution and use in source and binary forms, with or without\n", + "modification, are permitted provided that the following conditions are met:\n", + "\n", + " a. Redistributions of source code must retain the above copyright notice,\n", + " this list of conditions and the following disclaimer.\n", + " b. Redistributions in binary form must reproduce the above copyright\n", + " notice, this list of conditions and the following disclaimer in the\n", + " documentation and/or other materials provided with the distribution.\n", + " c. Neither the name of the Scikit-learn Developers nor the names of\n", + " its contributors may be used to endorse or promote products\n", + " derived from this software without specific prior written\n", + " permission. \n", + "\n", + "\n", + "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n", + "AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n", + "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n", + "ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n", + "ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n", + "DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n", + "SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n", + "CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n", + "LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n", + "OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n", + "DAMAGE.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Model Development with Custom Weights" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This example shows how to retrain a model with custom weights and fine-tune the model with quantization, then deploy the model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet50 as a featurizer. We don't use the last layer of ResNet50 in this case and instead add our own classification layer using Keras.\n", - "\n", - "The custom wegiths are trained with ImageNet on ResNet50. We will use the Kaggle Cats and Dogs dataset to retrain and fine-tune the model. The dataset can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\"). \n", - "\n", - "Please set up your environment as described in the [quick start](project-brainwave-quickstart.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import sys\n", - "import tensorflow as tf\n", - "import numpy as np\n", - "from keras import backend as K" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setup Environment\n", - "After you train your model in float32, you'll write the weights to a place on disk. We also need a location to store the models that get downloaded." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "custom_weights_dir = os.path.expanduser(\"~/custom-weights\")\n", - "saved_model_dir = os.path.expanduser(\"~/models\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare Data\n", - "Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run relatively quickly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import imghdr\n", - "datadir = os.path.expanduser(\"~/catsanddogs\")\n", - "\n", - "cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n", - "dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n", - "\n", - "# Limit the data set to make the notebook execute quickly.\n", - "cat_files = cat_files[:64]\n", - "dog_files = dog_files[:64]\n", - "\n", - "# The data set has a few images that are not jpeg. Remove them.\n", - "cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n", - "dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n", - "\n", - "if(not len(cat_files) or not len(dog_files)):\n", - " print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n", - " raise ValueError(\"Data not found\")\n", - "else:\n", - " print(cat_files[0])\n", - " print(dog_files[0])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Construct a numpy array as labels\n", - "image_paths = cat_files + dog_files\n", - "total_files = len(cat_files) + len(dog_files)\n", - "labels = np.zeros(total_files)\n", - "labels[len(cat_files):] = 1" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Split images data as training data and test data\n", - "from sklearn.model_selection import train_test_split\n", - "onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n", - "img_train, img_test, label_train, label_test = train_test_split(image_paths, onehot_labels, random_state=42, shuffle=True)\n", - "\n", - "print(len(img_train), len(img_test), label_train.shape, label_test.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Construct Model\n", - "We use ResNet50 for the featuirzer and build our own classifier using Keras layers. We train the featurizer and the classifier as one model. The weights trained on ImageNet are used as the starting point for the retraining of our featurizer. The weights are loaded from tensorflow chkeckpoint files." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Before passing image dataset to the ResNet50 featurizer, we need to preprocess the input file to get it into the form expected by ResNet50. ResNet50 expects float tensors representing the images in BGR, channel last order. We've provided a default implementation of the preprocessing that you can use." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.contrib.brainwave.models.utils as utils\n", - "\n", - "def preprocess_images():\n", - " # Convert images to 3D tensors [width,height,channel] - channels are in BGR order.\n", - " in_images = tf.placeholder(tf.string)\n", - " image_tensors = utils.preprocess_array(in_images)\n", - " return in_images, image_tensors" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We use Keras layer APIs to construct the classifier. Because we're using the tensorflow backend, we can train this classifier in one session with our Resnet50 model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def construct_classifier(in_tensor):\n", - " from keras.layers import Dropout, Dense, Flatten\n", - " K.set_session(tf.get_default_session())\n", - " \n", - " FC_SIZE = 1024\n", - " NUM_CLASSES = 2\n", - "\n", - " x = Dropout(0.2, input_shape=(1, 1, 2048,))(in_tensor)\n", - " x = Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,))(x)\n", - " x = Flatten()(x)\n", - " preds = Dense(NUM_CLASSES, activation='softmax', input_dim=FC_SIZE, name='classifier_output')(x)\n", - " return preds" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now every component of the model is defined, we can construct the model. Constructing the model with the project brainwave models is two steps - first we import the graph definition, then we restore the weights of the model into a tensorflow session. Because the quantized graph defintion and the float32 graph defintion share the same node names in the graph definitions, we can initally train the weights in float32, and then reload them with the quantized operations (which take longer) to fine-tune the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def construct_model(quantized, starting_weights_directory = None):\n", - " from azureml.contrib.brainwave.models import Resnet50, QuantizedResnet50\n", - " \n", - " # Convert images to 3D tensors [width,height,channel]\n", - " in_images, image_tensors = preprocess_images()\n", - "\n", - " # Construct featurizer using quantized or unquantized ResNet50 model\n", - " if not quantized:\n", - " featurizer = Resnet50(saved_model_dir)\n", - " else:\n", - " featurizer = QuantizedResnet50(saved_model_dir, custom_weights_directory = starting_weights_directory)\n", - "\n", - "\n", - " features = featurizer.import_graph_def(input_tensor=image_tensors)\n", - " # Construct classifier\n", - " preds = construct_classifier(features)\n", - " \n", - " # Initialize weights\n", - " sess = tf.get_default_session()\n", - " tf.global_variables_initializer().run()\n", - "\n", - " featurizer.restore_weights(sess)\n", - "\n", - " return in_images, image_tensors, features, preds, featurizer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train Model\n", - "First we train the model with custom weights but without quantization. Training is done with native float precision (32-bit floats). We load the traing data set and batch the training with 10 epochs. When the performance reaches desired level or starts decredation, we stop the training iteration and save the weights as tensorflow checkpoint files. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def read_files(files):\n", - " \"\"\" Read files to array\"\"\"\n", - " contents = []\n", - " for path in files:\n", - " with open(path, 'rb') as f:\n", - " contents.append(f.read())\n", - " return contents" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def train_model(preds, in_images, img_train, label_train, is_retrain = False, train_epoch = 10):\n", - " \"\"\" training model \"\"\"\n", - " from keras.objectives import binary_crossentropy\n", - " from tqdm import tqdm\n", - " \n", - " learning_rate = 0.001 if is_retrain else 0.01\n", - " \n", - " # Specify the loss function\n", - " in_labels = tf.placeholder(tf.float32, shape=(None, 2)) \n", - " cross_entropy = tf.reduce_mean(binary_crossentropy(in_labels, preds))\n", - " optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)\n", - "\n", - " def chunks(a, b, n):\n", - " \"\"\"Yield successive n-sized chunks from a and b.\"\"\"\n", - " if (len(a) != len(b)):\n", - " print(\"a and b are not equal in chunks(a,b,n)\")\n", - " raise ValueError(\"Parameter error\")\n", - "\n", - " for i in range(0, len(a), n):\n", - " yield a[i:i + n], b[i:i + n]\n", - "\n", - " chunk_size = 16\n", - " chunk_num = len(label_train) / chunk_size\n", - "\n", - " sess = tf.get_default_session()\n", - " for epoch in range(train_epoch):\n", - " avg_loss = 0\n", - " for img_chunk, label_chunk in tqdm(chunks(img_train, label_train, chunk_size)):\n", - " contents = read_files(img_chunk)\n", - " _, loss = sess.run([optimizer, cross_entropy],\n", - " feed_dict={in_images: contents,\n", - " in_labels: label_chunk,\n", - " K.learning_phase(): 1})\n", - " avg_loss += loss / chunk_num\n", - " print(\"Epoch:\", (epoch + 1), \"loss = \", \"{:.3f}\".format(avg_loss))\n", - " \n", - " # Reach desired performance\n", - " if (avg_loss < 0.001):\n", - " break" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def test_model(preds, in_images, img_test, label_test):\n", - " \"\"\"Test the model\"\"\"\n", - " from keras.metrics import categorical_accuracy\n", - "\n", - " in_labels = tf.placeholder(tf.float32, shape=(None, 2))\n", - " accuracy = tf.reduce_mean(categorical_accuracy(in_labels, preds))\n", - " contents = read_files(img_test)\n", - "\n", - " accuracy = accuracy.eval(feed_dict={in_images: contents,\n", - " in_labels: label_test,\n", - " K.learning_phase(): 0})\n", - " return accuracy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Launch the training\n", - "tf.reset_default_graph()\n", - "sess = tf.Session(graph=tf.get_default_graph())\n", - "\n", - "with sess.as_default():\n", - " in_images, image_tensors, features, preds, featurizer = construct_model(quantized=False)\n", - " train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10) \n", - " accuracy = test_model(preds, in_images, img_test, label_test) \n", - " print(\"Accuracy:\", accuracy)\n", - " featurizer.save_weights(custom_weights_dir + \"/rn50\", tf.get_default_session())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test Model\n", - "After training, we evaluate the trained model's accuracy on test dataset with quantization. So that we know the model's performance if it is deployed on the FPGA." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.Session(graph=tf.get_default_graph())\n", - "\n", - "with sess.as_default():\n", - " print(\"Testing trained model with quantization\")\n", - " in_images, image_tensors, features, preds, quantized_featurizer = construct_model(quantized=True, starting_weights_directory=custom_weights_dir)\n", - " accuracy = test_model(preds, in_images, img_test, label_test) \n", - " print(\"Accuracy:\", accuracy)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Fine-Tune Model\n", - "Sometimes, the model's accuracy can drop significantly after quantization. In those cases, we need to retrain the model enabled with quantization to get better model accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if (accuracy < 0.93):\n", - " with sess.as_default():\n", - " print(\"Fine-tuning model with quantization\")\n", - " train_model(preds, in_images, img_train, label_train, is_retrain=True, train_epoch=10)\n", - " accuracy = test_model(preds, in_images, img_test, label_test) \n", - " print(\"Accuracy:\", accuracy)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Service Definition\n", - "Like in the QuickStart notebook our service definition pipeline consists of three stages. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage\n", - "\n", - "model_def_path = os.path.join(saved_model_dir, 'model_def.zip')\n", - "\n", - "model_def = ModelDefinition()\n", - "model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n", - "model_def.pipeline.append(BrainWaveStage(sess, quantized_featurizer))\n", - "model_def.pipeline.append(TensorflowStage(sess, features, preds))\n", - "model_def.save(model_def_path)\n", - "print(model_def_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy\n", - "Go to our [GitHub repo](https://aka.ms/aml-real-time-ai) \"docs\" folder to learn how to create a Model Management Account and find the required information below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "from azureml.core.image import Image\n", - "from azureml.core.webservice import Webservice\n", - "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", - "\n", - "model_name = \"catsanddogs-resnet50-model\"\n", - "image_name = \"catsanddogs-resnet50-image\"\n", - "service_name = \"modelbuild-service\"\n", - "\n", - "registered_model = Model.register(ws, service_def_path, model_name)\n", - "\n", - "image_config = BrainwaveImage.image_configuration()\n", - "deployment_config = BrainwaveWebservice.deploy_configuration()\n", - " \n", - "try:\n", - " service = Webservice(ws, service_name)\n", - " service.delete()\n", - " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", - "except WebserviceException:\n", - " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The service is now running in Azure and ready to serve requests. We can check the address and port." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.ipAddress + ':' + str(service.port))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Client\n", - "There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.client import PredictionClient\n", - "client = PredictionClient(service.ipAddress, service.port)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can adapt the client [code](../../pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](../../sample-clients/csharp).\n", - "\n", - "The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Request\n", - "Let's see how our service does on a few images. It may get a few wrong." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Specify an image to classify\n", - "print('CATS')\n", - "for image_file in cat_files[:8]:\n", - " results = client.score_image(image_file)\n", - " result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n", - " print(result + str(results))\n", - "print('DOGS')\n", - "for image_file in dog_files[:8]:\n", - " results = client.score_image(image_file)\n", - " result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n", - " print(result + str(results))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cleanup\n", - "Run the cell below to delete your service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Appendix" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "License for plot_confusion_matrix:\n", - "\n", - "New BSD License\n", - "\n", - "Copyright (c) 2007-2018 The scikit-learn developers.\n", - "All rights reserved.\n", - "\n", - "\n", - "Redistribution and use in source and binary forms, with or without\n", - "modification, are permitted provided that the following conditions are met:\n", - "\n", - " a. Redistributions of source code must retain the above copyright notice,\n", - " this list of conditions and the following disclaimer.\n", - " b. Redistributions in binary form must reproduce the above copyright\n", - " notice, this list of conditions and the following disclaimer in the\n", - " documentation and/or other materials provided with the distribution.\n", - " c. Neither the name of the Scikit-learn Developers nor the names of\n", - " its contributors may be used to endorse or promote products\n", - " derived from this software without specific prior written\n", - " permission. \n", - "\n", - "\n", - "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n", - "AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n", - "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n", - "ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n", - "ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n", - "DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n", - "SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n", - "CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n", - "LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n", - "OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n", - "DAMAGE.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/project-brainwave/project-brainwave-quickstart.ipynb b/project-brainwave/project-brainwave-quickstart.ipynb index 2bcecd54..876aa827 100644 --- a/project-brainwave/project-brainwave-quickstart.ipynb +++ b/project-brainwave/project-brainwave-quickstart.ipynb @@ -1,309 +1,309 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Azure ML Hardware Accelerated Models Quickstart" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This tutorial will show you how to deploy an image recognition service based on the ResNet 50 classifier in just a few minutes using the Azure Machine Learning Accelerated AI service. Get more help from our [documentation](https://aka.ms/aml-real-time-ai) or [forum](https://aka.ms/aml-forum).\n", + "\n", + "We will use an accelerated ResNet50 featurizer running on an FPGA. This functionality is powered by Project Brainwave, which handles translating deep neural networks (DNN) into an FPGA program.\n", + "\n", + "## Request Quota\n", + "**IMPORTANT:** You must [request quota](https://aka.ms/aml-real-time-ai-request) and be approved before you can successfully run this notebook. Notebook 00 will show you how to create a workspace which you can use to request quota." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image preprocessing\n", + "We'd like our service to accept JPEG images as input. However the input to ResNet50 is a tensor. So we need code that decodes JPEG images and does the preprocessing required by ResNet50. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as strings) and produces a tensor that is ready to be featurized by ResNet50." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n", + "import azureml.contrib.brainwave.models.utils as utils\n", + "in_images = tf.placeholder(tf.string)\n", + "image_tensors = utils.preprocess_array(in_images)\n", + "print(image_tensors.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Featurizer\n", + "We use ResNet50 as a featurizer. In this step we initialize the model. This downloads a TensorFlow checkpoint of the quantized ResNet50." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.models import QuantizedResnet50, Resnet50\n", + "model_path = os.path.expanduser('~/models')\n", + "model = QuantizedResnet50(model_path, is_frozen = True)\n", + "feature_tensor = model.import_graph_def(image_tensors)\n", + "print(model.version)\n", + "print(feature_tensor.name)\n", + "print(feature_tensor.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Classifier\n", + "The model we downloaded includes a classifier which takes the output of the ResNet50 and identifies an image. This classifier is trained on the ImageNet dataset. We are going to use this classifier for our service. The next [notebook](project-brainwave-trainsfer-learning.ipynb) shows how to train a classifier for a different data set. The input to the classifier is a tensor matching the output of our ResNet50 featurizer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_output = model.get_default_classifier(feature_tensor)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Service Definition\n", + "Now that we've definied the image preprocessing, featurizer, and classifier that we will execute on our service we can create a service definition. The service definition is a set of files generated from the model that allow us to deploy to the FPGA service. The service definition consists of a pipeline. The pipeline is a series of stages that are executed in order. We support TensorFlow stages, Keras stages, and BrainWave stages. The stages will be executed in order on the service, with the output of each stage input into the subsequent stage.\n", + "\n", + "To create a TensorFlow stage we specify a session containing the graph (in this case we are using the default graph) and the input and output tensors to this stage. We use this information to save the graph so that we can execute it on the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage\n", + "\n", + "save_path = os.path.expanduser('~/models/save')\n", + "model_def_path = os.path.join(save_path, 'model_def.zip')\n", + "\n", + "model_def = ModelDefinition()\n", + "with tf.Session() as sess:\n", + " model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n", + " model_def.pipeline.append(BrainWaveStage(sess, model))\n", + " model_def.pipeline.append(TensorflowStage(sess, classifier_input, classifier_output))\n", + " model_def.save(model_def_path)\n", + " print(model_def_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy\n", + "Time to create a service from the service definition. You need a Workspace in the **East US 2** location. In the previous notebooks, you've created this Workspace. The code below will load that Workspace from a configuration file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Upload the model to the workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "model_name = \"resnet-50-rtai\"\n", + "registered_model = Model.register(ws, model_def_path, model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a service from the model that we registered. If this is a new service then we create it. If you already have a service with this name then the existing service will be updated to use this model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from azureml.exceptions import WebserviceException\n", + "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", + "service_name = \"imagenet-infer\"\n", + "service = None\n", + "try:\n", + " service = Webservice(ws, service_name)\n", + "except WebserviceException:\n", + " image_config = BrainwaveImage.image_configuration()\n", + " deployment_config = BrainwaveWebservice.deploy_configuration()\n", + " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", + " service.wait_for_deployment(true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Client\n", + "The service supports gRPC and the TensorFlow Serving \"predict\" API. We provide a client that can call the service to get predictions on aka.ms/rtai. You can also invoke the service like any other web service." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand the results we need a mapping to the human readable imagenet classes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "classes_entries = requests.get(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").text.splitlines()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now send an image to the service and get the predictions. Let's see if it can identify a snow leopard.\n", + "![title](snowleopardgaze.jpg)\n", + "Snow leopard in a zoo. Photo by Peter Bolliger.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "results = service.run('snowleopardgaze.jpg')\n", + "# map results [class_id] => [confidence]\n", + "results = enumerate(results)\n", + "# sort results by confidence\n", + "sorted_results = sorted(results, key=lambda x: x[1], reverse=True)\n", + "# print top 5 results\n", + "for top in sorted_results[:5]:\n", + " print(classes_entries[top[0]], 'confidence:', top[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "Run the cell below to delete your service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()\n", + " \n", + "registered_model.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Congratulations! You've just created a service that does predictions using an FPGA. The next [notebook](project-brainwave-trainsfer-learning.ipynb) shows how to customize the service using transfer learning to classify different types of images." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Azure ML Hardware Accelerated Models Quickstart" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This tutorial will show you how to deploy an image recognition service based on the ResNet 50 classifier in just a few minutes using the Azure Machine Learning Accelerated AI service. Get more help from our [documentation](https://aka.ms/aml-real-time-ai) or [forum](https://aka.ms/aml-forum).\n", - "\n", - "We will use an accelerated ResNet50 featurizer running on an FPGA. This functionality is powered by Project Brainwave, which handles translating deep neural networks (DNN) into an FPGA program.\n", - "\n", - "## Request Quota\n", - "**IMPORTANT:** You must [request quota](https://aka.ms/aml-real-time-ai-request) and be approved before you can successfully run this notebook. Notebook 00 will show you how to create a workspace which you can use to request quota." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Imports" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import tensorflow as tf" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Image preprocessing\n", - "We'd like our service to accept JPEG images as input. However the input to ResNet50 is a tensor. So we need code that decodes JPEG images and does the preprocessing required by ResNet50. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as strings) and produces a tensor that is ready to be featurized by ResNet50." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n", - "import azureml.contrib.brainwave.models.utils as utils\n", - "in_images = tf.placeholder(tf.string)\n", - "image_tensors = utils.preprocess_array(in_images)\n", - "print(image_tensors.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Featurizer\n", - "We use ResNet50 as a featurizer. In this step we initialize the model. This downloads a TensorFlow checkpoint of the quantized ResNet50." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.models import QuantizedResnet50, Resnet50\n", - "model_path = os.path.expanduser('~/models')\n", - "model = QuantizedResnet50(model_path, is_frozen = True)\n", - "feature_tensor = model.import_graph_def(image_tensors)\n", - "print(model.version)\n", - "print(feature_tensor.name)\n", - "print(feature_tensor.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Classifier\n", - "The model we downloaded includes a classifier which takes the output of the ResNet50 and identifies an image. This classifier is trained on the ImageNet dataset. We are going to use this classifier for our service. The next [notebook](project-brainwave-trainsfer-learning.ipynb) shows how to train a classifier for a different data set. The input to the classifier is a tensor matching the output of our ResNet50 featurizer." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "classifier_input, classifier_output = Resnet50.get_default_classifier(feature_tensor, model_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Service Definition\n", - "Now that we've definied the image preprocessing, featurizer, and classifier that we will execute on our service we can create a service definition. The service definition is a set of files generated from the model that allow us to deploy to the FPGA service. The service definition consists of a pipeline. The pipeline is a series of stages that are executed in order. We support TensorFlow stages, Keras stages, and BrainWave stages. The stages will be executed in order on the service, with the output of each stage input into the subsequent stage.\n", - "\n", - "To create a TensorFlow stage we specify a session containing the graph (in this case we are using the default graph) and the input and output tensors to this stage. We use this information to save the graph so that we can execute it on the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage\n", - "\n", - "save_path = os.path.expanduser('~/models/save')\n", - "model_def_path = os.path.join(save_path, 'model_def.zip')\n", - "\n", - "model_def = ModelDefinition()\n", - "with tf.Session() as sess:\n", - " model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n", - " model_def.pipeline.append(BrainWaveStage(sess, model))\n", - " model_def.pipeline.append(TensorflowStage(sess, classifier_input, classifier_output))\n", - " model_def.save(model_def_path)\n", - " print(model_def_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy\n", - "Time to create a service from the service definition. You need a Workspace in the **East US 2** location. In the previous notebooks, you've created this Workspace. The code below will load that Workspace from a configuration file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Upload the model to the workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "model_name = \"resnet-50-rtai\"\n", - "registered_model = Model.register(ws, model_def_path, model_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a service from the model that we registered. If this is a new service then we create it. If you already have a service with this name then the existing service will be updated to use this model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from azureml.exceptions import WebserviceException\n", - "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", - "service_name = \"imagenet-infer\"\n", - "service = None\n", - "try:\n", - " service = Webservice(ws, service_name)\n", - "except WebserviceException:\n", - " image_config = BrainwaveImage.image_configuration()\n", - " deployment_config = BrainwaveWebservice.deploy_configuration()\n", - " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", - " service.wait_for_deployment(true)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Client\n", - "The service supports gRPC and the TensorFlow Serving \"predict\" API. We provide a client that can call the service to get predictions on aka.ms/rtai. You can also invoke the service like any other web service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To understand the results we need a mapping to the human readable imagenet classes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "classes_entries = requests.get(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").text.splitlines()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can now send an image to the service and get the predictions. Let's see if it can identify a snow leopard.\n", - "![title](snowleopardgaze.jpg)\n", - "Snow leopard in a zoo. Photo by Peter Bolliger.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results = service.run('snowleopardgaze.jpg')\n", - "# map results [class_id] => [confidence]\n", - "results = enumerate(results)\n", - "# sort results by confidence\n", - "sorted_results = sorted(results, key=lambda x: x[1], reverse=True)\n", - "# print top 5 results\n", - "for top in sorted_results[:5]:\n", - " print(classes_entries[top[0]], 'confidence:', top[1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cleanup\n", - "Run the cell below to delete your service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()\n", - " \n", - "registered_model.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Congratulations! You've just created a service that does predictions using an FPGA. The next [notebook](project-brainwave-trainsfer-learning.ipynb) shows how to customize the service using transfer learning to classify different types of images." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/project-brainwave/project-brainwave-transfer-learning.ipynb b/project-brainwave/project-brainwave-transfer-learning.ipynb index 4f0ca4a2..2a338731 100644 --- a/project-brainwave/project-brainwave-transfer-learning.ipynb +++ b/project-brainwave/project-brainwave-transfer-learning.ipynb @@ -1,567 +1,567 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Model Development" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This example shows how to build, train, evaluate and deploy a model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet152 as a featurizer. We don't use the last layer of ResNet152 in this case and instead add and train our own classification layer.\n", + "\n", + "We will use the Kaggle Cats and Dogs dataset to train the classifier. The dataset can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\").\n", + "\n", + "Please set up your environment as described in the [quick start](project-brainwave-quickstart.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Model Construction\n", + "Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run quickly, but doesn't create a very accurate classifier. You can improve the classifier by using more of the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import glob\n", + "import imghdr\n", + "datadir = os.path.expanduser(\"~/catsanddogs\")\n", + "\n", + "cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n", + "dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n", + "\n", + "# Limit the data set to make the notebook execute quickly.\n", + "cat_files = cat_files[:64]\n", + "dog_files = dog_files[:64]\n", + "\n", + "# The data set has a few images that are not jpeg. Remove them.\n", + "cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n", + "dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n", + "\n", + "if(not len(cat_files) or not len(dog_files)):\n", + " print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n", + " raise ValueError(\"Data not found\")\n", + "else:\n", + " print(cat_files[0])\n", + " print(dog_files[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# constructing a numpy array as labels\n", + "image_paths = cat_files + dog_files\n", + "total_files = len(cat_files) + len(dog_files)\n", + "labels = np.zeros(total_files)\n", + "labels[len(cat_files):] = 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We need to preprocess the input file to get it into the form expected by ResNet152. We've provided a default implementation of the preprocessing that you can use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n", + "import azureml.contrib.brainwave.models.utils as utils\n", + "in_images = tf.placeholder(tf.string)\n", + "image_tensors = utils.preprocess_array(in_images)\n", + "print(image_tensors.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, if you would like to customize the preprocessing, you can write your own preprocessor using TensorFlow operations.\n", + "\n", + "The input to the classifier we are training is the set of features produced by ResNet50. To train the classifier we need to \n", + "featurize the images using ResNet50. You can also run the featurizer locally on CPU or GPU. We import the featurizer as frozen, so that we are only training the classifier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.models import QuantizedResnet152\n", + "model_path = os.path.expanduser('~/models')\n", + "bwmodel = QuantizedResnet152(model_path, is_frozen = True)\n", + "print(bwmodel.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calling import_graph_def on the featurizer will create a service that runs the featurizer on FPGA." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "features = bwmodel.import_graph_def(input_tensor=image_tensors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pre-compute features\n", + "Load the data set and compute the features. These can be precomputed because they don't change during training. This can take a while to run on CPU." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from tqdm import tqdm\n", + "\n", + "def chunks(l, n):\n", + " \"\"\"Yield successive n-sized chunks from l.\"\"\"\n", + " for i in range(0, len(l), n):\n", + " yield l[i:i + n]\n", + "\n", + "def read_files(files):\n", + " contents = []\n", + " for path in files:\n", + " with open(path, 'rb') as f:\n", + " contents.append(f.read())\n", + " return contents\n", + " \n", + "feature_list = []\n", + "with tf.Session() as sess:\n", + " for chunk in tqdm(chunks(image_paths, 5)):\n", + " contents = read_files(chunk)\n", + " result = sess.run([features], feed_dict={in_images: contents})\n", + " feature_list.extend(result[0])\n", + "\n", + "feature_results = np.array(feature_list)\n", + "print(feature_results.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add and Train the classifier\n", + "We use Keras to define and train a simple classifier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Dropout, Dense, Flatten\n", + "from keras import optimizers\n", + "\n", + "FC_SIZE = 1024\n", + "NUM_CLASSES = 2\n", + "\n", + "model = Sequential()\n", + "model.add(Dropout(0.2, input_shape=(1, 1, 2048,)))\n", + "model.add(Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,)))\n", + "model.add(Flatten())\n", + "model.add(Dense(NUM_CLASSES, activation='sigmoid', input_dim=FC_SIZE))\n", + "\n", + "model.compile(optimizer=optimizers.SGD(lr=1e-4,momentum=0.9), loss='binary_crossentropy', metrics=['accuracy'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Prepare the train and test data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n", + "X_train, X_test, y_train, y_test = train_test_split(feature_results, onehot_labels, random_state=42, shuffle=True)\n", + "print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Train the classifier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.fit(X_train, y_train, epochs=16, batch_size=32)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test the Classifier\n", + "Let's test the classifier and see how well it does. Since we only trained on a few images, we are not expecting to win a Kaggle competition, but it will likely get most of the images correct. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from numpy import argmax\n", + "\n", + "y_probs = model.predict(X_test)\n", + "y_prob_max = np.argmax(y_probs, 1)\n", + "y_test_max = np.argmax(y_test, 1)\n", + "print(y_prob_max)\n", + "print(y_test_max)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score, precision_score, recall_score, f1_score\n", + "import itertools\n", + "import matplotlib\n", + "from matplotlib import pyplot as plt\n", + "\n", + "# compute a bunch of classification metrics \n", + "def classification_metrics(y_true, y_pred, y_prob):\n", + " cm_dict = {}\n", + " cm_dict['Accuracy'] = accuracy_score(y_true, y_pred)\n", + " cm_dict['Precision'] = precision_score(y_true, y_pred)\n", + " cm_dict['Recall'] = recall_score(y_true, y_pred)\n", + " cm_dict['F1'] = f1_score(y_true, y_pred) \n", + " cm_dict['AUC'] = roc_auc_score(y_true, y_prob[:,0])\n", + " cm_dict['Confusion Matrix'] = confusion_matrix(y_true, y_pred).tolist()\n", + " return cm_dict\n", + "\n", + "def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):\n", + " \"\"\"Plots a confusion matrix.\n", + " Source: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html\n", + " New BSD License - see appendix\n", + " \"\"\"\n", + " cm_max = cm.max()\n", + " cm_min = cm.min()\n", + " if cm_min > 0: cm_min = 0\n", + " if normalize:\n", + " cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", + " cm_max = 1\n", + " plt.imshow(cm, interpolation='nearest', cmap=cmap)\n", + " plt.title(title)\n", + " plt.colorbar()\n", + " tick_marks = np.arange(len(classes))\n", + " plt.xticks(tick_marks, classes, rotation=45)\n", + " plt.yticks(tick_marks, classes)\n", + " thresh = cm_max / 2.\n", + " plt.clim(cm_min, cm_max)\n", + "\n", + " for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n", + " plt.text(j, i,\n", + " round(cm[i, j], 3), # round to 3 decimals if they are float\n", + " horizontalalignment=\"center\",\n", + " color=\"white\" if cm[i, j] > thresh else \"black\")\n", + " plt.ylabel('True label')\n", + " plt.xlabel('Predicted label')\n", + " plt.show()\n", + " \n", + "cm_dict = classification_metrics(y_test_max, y_prob_max, y_probs)\n", + "for m in cm_dict:\n", + " print(m, cm_dict[m])\n", + "cm = np.asarray(cm_dict['Confusion Matrix'])\n", + "plot_confusion_matrix(cm, ['fail','pass'], normalize=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Service Definition\n", + "Like in the QuickStart notebook our service definition pipeline consists of three stages. Because the preprocessing and featurizing stage don't contain any variables, we can use a default session.\n", + "Here we use the Keras classifier as the final stage." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage, KerasStage\n", + "\n", + "model_def = ModelDefinition()\n", + "model_def.pipeline.append(TensorflowStage(tf.Session(), in_images, image_tensors))\n", + "model_def.pipeline.append(BrainWaveStage(tf.Session(), bwmodel))\n", + "model_def.pipeline.append(KerasStage(model))\n", + "\n", + "model_def_path = os.path.join(datadir, 'save', 'model_def')\n", + "model_def.save(model_def_path)\n", + "print(model_def_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n", + "model_name = \"catsanddogs-model\"\n", + "service_name = \"modelbuild-service\"\n", + "\n", + "registered_model = Model.register(ws, model_def_path, model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from azureml.exceptions import WebserviceException\n", + "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", + "try:\n", + " service = Webservice(ws, service_name)\n", + "except WebserviceException:\n", + " image_config = BrainwaveImage.image_configuration()\n", + " deployment_config = BrainwaveWebservice.deploy_configuration()\n", + " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", + " service.wait_for_deployment(true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The service is now running in Azure and ready to serve requests. We can check the address and port." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.ipAddress + ':' + str(service.port))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Client\n", + "There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.brainwave.client import PredictionClient\n", + "client = PredictionClient(service.ipAddress, service.port)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can adapt the client [code](../../pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](../../sample-clients/csharp).\n", + "\n", + "The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Request\n", + "Let's see how our service does on a few images. It may get a few wrong." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify an image to classify\n", + "print('CATS')\n", + "for image_file in cat_files[:8]:\n", + " results = client.score_image(image_file)\n", + " result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n", + " print(result + str(results))\n", + "print('DOGS')\n", + "for image_file in dog_files[:8]:\n", + " results = client.score_image(image_file)\n", + " result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n", + " print(result + str(results))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "Run the cell below to delete your service. In the [next notebook](project-brainwave-custom-weights.ipynb) you will learn how to retrain all the weights of one of the models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()\n", + " \n", + "registered_model.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "License for plot_confusion_matrix:\n", + "\n", + "New BSD License\n", + "\n", + "Copyright (c) 2007\u00e2\u20ac\u201c2018 The scikit-learn developers.\n", + "All rights reserved.\n", + "\n", + "\n", + "Redistribution and use in source and binary forms, with or without\n", + "modification, are permitted provided that the following conditions are met:\n", + "\n", + " a. Redistributions of source code must retain the above copyright notice,\n", + " this list of conditions and the following disclaimer.\n", + " b. Redistributions in binary form must reproduce the above copyright\n", + " notice, this list of conditions and the following disclaimer in the\n", + " documentation and/or other materials provided with the distribution.\n", + " c. Neither the name of the Scikit-learn Developers nor the names of\n", + " its contributors may be used to endorse or promote products\n", + " derived from this software without specific prior written\n", + " permission. \n", + "\n", + "\n", + "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n", + "AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n", + "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n", + "ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n", + "ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n", + "DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n", + "SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n", + "CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n", + "LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n", + "OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n", + "DAMAGE.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Model Development" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This example shows how to build, train, evaluate and deploy a model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet152 as a featurizer. We don't use the last layer of ResNet152 in this case and instead add and train our own classification layer.\n", - "\n", - "We will use the Kaggle Cats and Dogs dataset to train the classifier. The dataset can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\").\n", - "\n", - "Please set up your environment as described in the [quick start](project-brainwave-quickstart.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import tensorflow as tf\n", - "import numpy as np" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Model Construction\n", - "Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run quickly, but doesn't create a very accurate classifier. You can improve the classifier by using more of the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import imghdr\n", - "datadir = os.path.expanduser(\"~/catsanddogs\")\n", - "\n", - "cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n", - "dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n", - "\n", - "# Limit the data set to make the notebook execute quickly.\n", - "cat_files = cat_files[:64]\n", - "dog_files = dog_files[:64]\n", - "\n", - "# The data set has a few images that are not jpeg. Remove them.\n", - "cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n", - "dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n", - "\n", - "if(not len(cat_files) or not len(dog_files)):\n", - " print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n", - " raise ValueError(\"Data not found\")\n", - "else:\n", - " print(cat_files[0])\n", - " print(dog_files[0])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# constructing a numpy array as labels\n", - "image_paths = cat_files + dog_files\n", - "total_files = len(cat_files) + len(dog_files)\n", - "labels = np.zeros(total_files)\n", - "labels[len(cat_files):] = 1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We need to preprocess the input file to get it into the form expected by ResNet152. We've provided a default implementation of the preprocessing that you can use." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n", - "import azureml.contrib.brainwave.models.utils as utils\n", - "in_images = tf.placeholder(tf.string)\n", - "image_tensors = utils.preprocess_array(in_images)\n", - "print(image_tensors.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, if you would like to customize the preprocessing, you can write your own preprocessor using TensorFlow operations.\n", - "\n", - "The input to the classifier we are training is the set of features produced by ResNet50. To train the classifier we need to \n", - "featurize the images using ResNet50. You can also run the featurizer locally on CPU or GPU. We import the featurizer as frozen, so that we are only training the classifier." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.models import QuantizedResnet152\n", - "model_path = os.path.expanduser('~/models')\n", - "bwmodel = QuantizedResnet152(model_path, is_frozen = True)\n", - "print(bwmodel.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calling import_graph_def on the featurizer will create a service that runs the featurizer on FPGA." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "features = bwmodel.import_graph_def(input_tensor=image_tensors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pre-compute features\n", - "Load the data set and compute the features. These can be precomputed because they don't change during training. This can take a while to run on CPU." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from tqdm import tqdm\n", - "\n", - "def chunks(l, n):\n", - " \"\"\"Yield successive n-sized chunks from l.\"\"\"\n", - " for i in range(0, len(l), n):\n", - " yield l[i:i + n]\n", - "\n", - "def read_files(files):\n", - " contents = []\n", - " for path in files:\n", - " with open(path, 'rb') as f:\n", - " contents.append(f.read())\n", - " return contents\n", - " \n", - "feature_list = []\n", - "with tf.Session() as sess:\n", - " for chunk in tqdm(chunks(image_paths, 5)):\n", - " contents = read_files(chunk)\n", - " result = sess.run([features], feed_dict={in_images: contents})\n", - " feature_list.extend(result[0])\n", - "\n", - "feature_results = np.array(feature_list)\n", - "print(feature_results.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Add and Train the classifier\n", - "We use Keras to define and train a simple classifier." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from keras.models import Sequential\n", - "from keras.layers import Dropout, Dense, Flatten\n", - "from keras import optimizers\n", - "\n", - "FC_SIZE = 1024\n", - "NUM_CLASSES = 2\n", - "\n", - "model = Sequential()\n", - "model.add(Dropout(0.2, input_shape=(1, 1, 2048,)))\n", - "model.add(Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,)))\n", - "model.add(Flatten())\n", - "model.add(Dense(NUM_CLASSES, activation='sigmoid', input_dim=FC_SIZE))\n", - "\n", - "model.compile(optimizer=optimizers.SGD(lr=1e-4,momentum=0.9), loss='binary_crossentropy', metrics=['accuracy'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Prepare the train and test data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n", - "X_train, X_test, y_train, y_test = train_test_split(feature_results, onehot_labels, random_state=42, shuffle=True)\n", - "print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Train the classifier." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model.fit(X_train, y_train, epochs=16, batch_size=32)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test the Classifier\n", - "Let's test the classifier and see how well it does. Since we only trained on a few images, we are not expecting to win a Kaggle competition, but it will likely get most of the images correct. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from numpy import argmax\n", - "\n", - "y_probs = model.predict(X_test)\n", - "y_prob_max = np.argmax(y_probs, 1)\n", - "y_test_max = np.argmax(y_test, 1)\n", - "print(y_prob_max)\n", - "print(y_test_max)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score, precision_score, recall_score, f1_score\n", - "import itertools\n", - "import matplotlib\n", - "from matplotlib import pyplot as plt\n", - "\n", - "# compute a bunch of classification metrics \n", - "def classification_metrics(y_true, y_pred, y_prob):\n", - " cm_dict = {}\n", - " cm_dict['Accuracy'] = accuracy_score(y_true, y_pred)\n", - " cm_dict['Precision'] = precision_score(y_true, y_pred)\n", - " cm_dict['Recall'] = recall_score(y_true, y_pred)\n", - " cm_dict['F1'] = f1_score(y_true, y_pred) \n", - " cm_dict['AUC'] = roc_auc_score(y_true, y_prob[:,0])\n", - " cm_dict['Confusion Matrix'] = confusion_matrix(y_true, y_pred).tolist()\n", - " return cm_dict\n", - "\n", - "def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):\n", - " \"\"\"Plots a confusion matrix.\n", - " Source: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html\n", - " New BSD License - see appendix\n", - " \"\"\"\n", - " cm_max = cm.max()\n", - " cm_min = cm.min()\n", - " if cm_min > 0: cm_min = 0\n", - " if normalize:\n", - " cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", - " cm_max = 1\n", - " plt.imshow(cm, interpolation='nearest', cmap=cmap)\n", - " plt.title(title)\n", - " plt.colorbar()\n", - " tick_marks = np.arange(len(classes))\n", - " plt.xticks(tick_marks, classes, rotation=45)\n", - " plt.yticks(tick_marks, classes)\n", - " thresh = cm_max / 2.\n", - " plt.clim(cm_min, cm_max)\n", - "\n", - " for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n", - " plt.text(j, i,\n", - " round(cm[i, j], 3), # round to 3 decimals if they are float\n", - " horizontalalignment=\"center\",\n", - " color=\"white\" if cm[i, j] > thresh else \"black\")\n", - " plt.ylabel('True label')\n", - " plt.xlabel('Predicted label')\n", - " plt.show()\n", - " \n", - "cm_dict = classification_metrics(y_test_max, y_prob_max, y_probs)\n", - "for m in cm_dict:\n", - " print(m, cm_dict[m])\n", - "cm = np.asarray(cm_dict['Confusion Matrix'])\n", - "plot_confusion_matrix(cm, ['fail','pass'], normalize=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Service Definition\n", - "Like in the QuickStart notebook our service definition pipeline consists of three stages. Because the preprocessing and featurizing stage don't contain any variables, we can use a default session.\n", - "Here we use the Keras classifier as the final stage." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage, KerasStage\n", - "\n", - "model_def = ModelDefinition()\n", - "model_def.pipeline.append(TensorflowStage(tf.Session(), in_images, image_tensors))\n", - "model_def.pipeline.append(BrainWaveStage(tf.Session(), bwmodel))\n", - "model_def.pipeline.append(KerasStage(model))\n", - "\n", - "model_def_path = os.path.join(datadir, 'save', 'model_def')\n", - "model_def.save(model_def_path)\n", - "print(model_def_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n", - "model_name = \"catsanddogs-model\"\n", - "service_name = \"modelbuild-service\"\n", - "\n", - "registered_model = Model.register(ws, model_def_path, model_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from azureml.exceptions import WebserviceException\n", - "from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n", - "try:\n", - " service = Webservice(ws, service_name)\n", - "except WebserviceException:\n", - " image_config = BrainwaveImage.image_configuration()\n", - " deployment_config = BrainwaveWebservice.deploy_configuration()\n", - " service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n", - " service.wait_for_deployment(true)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The service is now running in Azure and ready to serve requests. We can check the address and port." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.ipAddress + ':' + str(service.port))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Client\n", - "There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.brainwave.client import PredictionClient\n", - "client = PredictionClient(service.ipAddress, service.port)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can adapt the client [code](../../pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](../../sample-clients/csharp).\n", - "\n", - "The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Request\n", - "Let's see how our service does on a few images. It may get a few wrong." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Specify an image to classify\n", - "print('CATS')\n", - "for image_file in cat_files[:8]:\n", - " results = client.score_image(image_file)\n", - " result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n", - " print(result + str(results))\n", - "print('DOGS')\n", - "for image_file in dog_files[:8]:\n", - " results = client.score_image(image_file)\n", - " result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n", - " print(result + str(results))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cleanup\n", - "Run the cell below to delete your service. In the [next notebook](project-brainwave-custom-weights.ipynb) you will learn how to retrain all the weights of one of the models" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()\n", - " \n", - "registered_model.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Appendix" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "License for plot_confusion_matrix:\n", - "\n", - "New BSD License\n", - "\n", - "Copyright (c) 2007–2018 The scikit-learn developers.\n", - "All rights reserved.\n", - "\n", - "\n", - "Redistribution and use in source and binary forms, with or without\n", - "modification, are permitted provided that the following conditions are met:\n", - "\n", - " a. Redistributions of source code must retain the above copyright notice,\n", - " this list of conditions and the following disclaimer.\n", - " b. Redistributions in binary form must reproduce the above copyright\n", - " notice, this list of conditions and the following disclaimer in the\n", - " documentation and/or other materials provided with the distribution.\n", - " c. Neither the name of the Scikit-learn Developers nor the names of\n", - " its contributors may be used to endorse or promote products\n", - " derived from this software without specific prior written\n", - " permission. \n", - "\n", - "\n", - "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n", - "AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n", - "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n", - "ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n", - "ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n", - "DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n", - "SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n", - "CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n", - "LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n", - "OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n", - "DAMAGE.\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/01.train-hyperparameter-tune-deploy-with-pytorch/01.train-hyperparameter-tune-deploy-with-pytorch.ipynb b/training/01.train-hyperparameter-tune-deploy-with-pytorch/01.train-hyperparameter-tune-deploy-with-pytorch.ipynb index be802667..16e153ae 100644 --- a/training/01.train-hyperparameter-tune-deploy-with-pytorch/01.train-hyperparameter-tune-deploy-with-pytorch.ipynb +++ b/training/01.train-hyperparameter-tune-deploy-with-pytorch/01.train-hyperparameter-tune-deploy-with-pytorch.ipynb @@ -1,759 +1,759 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 01. Train, hyperparameter tune, and deploy with PyTorch\n", - "\n", - "In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (AML) Python SDK.\n", - "\n", - "This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify ants and bees by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload training data\n", - "The dataset we will use consists of about 120 training images each for ants and bees, with 75 validation images for each class.\n", - "\n", - "First, download the dataset (located [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip) as a zip file) locally to your current directory and extract the files. This will create a folder called `hymenoptera_data` with two subfolders `train` and `val` that contain the training and validation images, respectively. [Hymenoptera](https://en.wikipedia.org/wiki/Hymenoptera) is the order of insects that includes ants and bees." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "from zipfile import ZipFile\n", - "\n", - "# download data\n", - "download_url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'\n", - "data_file = './hymenoptera_data.zip'\n", - "urllib.request.urlretrieve(download_url, filename=data_file)\n", - "\n", - "# extract files\n", - "with ZipFile(data_file, 'r') as zip:\n", - " print('extracting files...')\n", - " zip.extractall()\n", - " print('done')\n", - " \n", - "# delete zip file\n", - "os.remove(data_file)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", - "\n", - "**Note: If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step.**\n", - "\n", - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following code will upload the training data to the path `./hymenoptera_data` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./hymenoptera_data', target_path='hymenoptera_data')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'hymenoptera_data'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-hymenoptera'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `pytorch_train.py`. In practice, you should be able to take any custom training script as is and run it with AML without having to modify your code.\n", - "\n", - "However, if you would like to use AML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of AML code inside your training script. \n", - "\n", - "In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_submitted_run()\n", - "```\n", - "Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n", - "```Python\n", - "run.log('lr', np.float(learning_rate))\n", - "run.log('momentum', np.float(momentum))\n", - "\n", - "run.log('best_val_acc', np.float(best_acc))\n", - "```\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `pytorch_train.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('pytorch_train.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch-hymenoptera'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import PyTorch\n", - "\n", - "script_params = {\n", - " '--data_dir': ds_data,\n", - " '--num_epochs': 25,\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = PyTorch(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='pytorch_train.py',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n", - "- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `hymenoptera_data` on our datastore.\n", - "- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n", - "\n", - "To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters\n", - "Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep\n", - "First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).\n", - "\n", - "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).\n", - "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import *\n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " 'learning_rate': uniform(0.0005, 0.005),\n", - " 'momentum': uniform(0.9, 0.99)\n", - " }\n", - ")\n", - "\n", - "early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n", - "\n", - "hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " policy=early_termination_policy,\n", - " primary_metric_name='best_val_acc',\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=20,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_run_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor HyperDrive runs\n", - "You can monitor the progress of the runs with the following Jupyter widget. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "\n", - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find and register the best model\n", - "Once all the runs complete, we can find the run that produced the model with the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hyperdrive_run.get_best_run_by_primary_metric()\n", - "best_run_metrics = best_run.get_metrics()\n", - "print(best_run)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('Best Run is:\\n Validation accuracy: {0:.5f} \\n Learning rate: {1:.5f} \\n Momentum: {2:.5f}'.format(\n", - " best_run_metrics['best_val_acc'][-1],\n", - " best_run_metrics['lr'],\n", - " best_run_metrics['momentum'])\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, register the model from your best-performing run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. In the next section, we will deploy this registered model as a web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name = 'pytorch-hymenoptera', model_path = 'outputs/model.pt')\n", - "print(model.name, model.id, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy model as web service\n", - "Once you have your trained model, you can deploy the model on Azure. In this tutorial, we will deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). For more information on deploying models using Azure ML, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create scoring script\n", - "\n", - "First, we will create a scoring script that will be invoked by the web service call. Note that the scoring script must have two required functions:\n", - "* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n", - "* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n", - "\n", - "Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is an ant or a bee. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create environment file\n", - "Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by AML. In this case, we need to specify `torch`, `torchvision`, `pillow`, and `azureml-sdk`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile myenv.yml\n", - "name: myenv\n", - "channels:\n", - " - defaults\n", - "dependencies:\n", - " - pip:\n", - " - torch\n", - " - torchvision\n", - " - pillow\n", - " - azureml-core" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure the container image\n", - "Now configure the Docker image that you will use to build your ACI container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n", - " runtime='python', \n", - " conda_file='myenv.yml',\n", - " description='Image with hymenoptera model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure the ACI container\n", - "We are almost ready to deploy. Create a deployment configuration file to specify the number of CPUs and gigabytes of RAM needed for your ACI container. While it depends on your model, the default of `1` core and `1` gigabyte of RAM is usually sufficient for many models." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'data': 'hymenoptera', 'method':'transfer learning', 'framework':'pytorch'},\n", - " description='Classify ants/bees using transfer learning with PyTorch')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the registered model\n", - "Finally, let's deploy a web service from our registered model. First, retrieve the model from your workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "\n", - "model = Model(ws, name='pytorch-hymenoptera')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service_name = 'aci-hymenoptera'\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name=service_name,\n", - " models=[model],\n", - " image_config=image_config,\n", - " deployment_config=aciconfig,)\n", - "\n", - "service.wait_for_deployment(show_output=True)\n", - "print(service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If your deployment fails for any reason and you need to redeploy, make sure to delete the service before you do so: `service.delete()`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.get_logs()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the web service\n", - "Finally, let's test our deployed web service. We will send the data as a JSON string to the web service hosted in ACI and use the SDK's `run` API to invoke the service. Here we will take an arbitrary image from our validation data to predict on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os, json, base64\n", - "from io import BytesIO\n", - "from PIL import Image\n", - "import matplotlib.pyplot as plt\n", - "\n", - "def imgToBase64(img):\n", - " \"\"\"Convert pillow image to base64-encoded image\"\"\"\n", - " imgio = BytesIO()\n", - " img.save(imgio, 'JPEG')\n", - " img_str = base64.b64encode(imgio.getvalue())\n", - " return img_str.decode('utf-8')\n", - "\n", - "test_img = os.path.join('hymenoptera_data', 'val', 'bees', '10870992_eebeeb3a12.jpg') #arbitary image from val dataset\n", - "plt.imshow(Image.open(test_img))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "base64Img = imgToBase64(Image.open(test_img))\n", - "\n", - "result = service.run(input_data=json.dumps({'data': base64Img}))\n", - "print(json.loads(result))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete web service\n", - "Once you no longer need the web service, you should delete it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 01. Train, hyperparameter tune, and deploy with PyTorch\n", + "\n", + "In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (AML) Python SDK.\n", + "\n", + "This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify ants and bees by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload training data\n", + "The dataset we will use consists of about 120 training images each for ants and bees, with 75 validation images for each class.\n", + "\n", + "First, download the dataset (located [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip) as a zip file) locally to your current directory and extract the files. This will create a folder called `hymenoptera_data` with two subfolders `train` and `val` that contain the training and validation images, respectively. [Hymenoptera](https://en.wikipedia.org/wiki/Hymenoptera) is the order of insects that includes ants and bees." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib\n", + "from zipfile import ZipFile\n", + "\n", + "# download data\n", + "download_url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'\n", + "data_file = './hymenoptera_data.zip'\n", + "urllib.request.urlretrieve(download_url, filename=data_file)\n", + "\n", + "# extract files\n", + "with ZipFile(data_file, 'r') as zip:\n", + " print('extracting files...')\n", + " zip.extractall()\n", + " print('done')\n", + " \n", + "# delete zip file\n", + "os.remove(data_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", + "\n", + "**Note: If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step.**\n", + "\n", + "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code will upload the training data to the path `./hymenoptera_data` on the default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='./hymenoptera_data', target_path='hymenoptera_data')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "path_on_datastore = 'hymenoptera_data'\n", + "ds_data = ds.path(path_on_datastore)\n", + "print(ds_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './pytorch-hymenoptera'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare training script\n", + "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `pytorch_train.py`. In practice, you should be able to take any custom training script as is and run it with AML without having to modify your code.\n", + "\n", + "However, if you would like to use AML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of AML code inside your training script. \n", + "\n", + "In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n", + "```Python\n", + "from azureml.core.run import Run\n", + "run = Run.get_submitted_run()\n", + "```\n", + "Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n", + "```Python\n", + "run.log('lr', np.float(learning_rate))\n", + "run.log('momentum', np.float(momentum))\n", + "\n", + "run.log('best_val_acc', np.float(best_acc))\n", + "```\n", + "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once your script is ready, copy the training script `pytorch_train.py` into your project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('pytorch_train.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'pytorch-hymenoptera'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a PyTorch estimator\n", + "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import PyTorch\n", + "\n", + "script_params = {\n", + " '--data_dir': ds_data,\n", + " '--num_epochs': 25,\n", + " '--output_dir': './outputs'\n", + "}\n", + "\n", + "estimator = PyTorch(source_directory=project_folder, \n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " entry_script='pytorch_train.py',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n", + "- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `hymenoptera_data` on our datastore.\n", + "- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n", + "\n", + "To leverage the Azure VM's GPU for training, we set `use_gpu=True`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tune model hyperparameters\n", + "Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start a hyperparameter sweep\n", + "First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).\n", + "\n", + "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).\n", + "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive import *\n", + "\n", + "param_sampling = RandomParameterSampling( {\n", + " 'learning_rate': uniform(0.0005, 0.005),\n", + " 'momentum': uniform(0.9, 0.99)\n", + " }\n", + ")\n", + "\n", + "early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n", + "\n", + "hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n", + " hyperparameter_sampling=param_sampling, \n", + " policy=early_termination_policy,\n", + " primary_metric_name='best_val_acc',\n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", + " max_total_runs=20,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, lauch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# start the HyperDrive run\n", + "hyperdrive_run = experiment.submit(hyperdrive_run_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor HyperDrive runs\n", + "You can monitor the progress of the runs with the following Jupyter widget. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "\n", + "RunDetails(hyperdrive_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Find and register the best model\n", + "Once all the runs complete, we can find the run that produced the model with the highest accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run = hyperdrive_run.get_best_run_by_primary_metric()\n", + "best_run_metrics = best_run.get_metrics()\n", + "print(best_run)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Best Run is:\\n Validation accuracy: {0:.5f} \\n Learning rate: {1:.5f} \\n Momentum: {2:.5f}'.format(\n", + " best_run_metrics['best_val_acc'][-1],\n", + " best_run_metrics['lr'],\n", + " best_run_metrics['momentum'])\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, register the model from your best-performing run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. In the next section, we will deploy this registered model as a web service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = best_run.register_model(model_name = 'pytorch-hymenoptera', model_path = 'outputs/model.pt')\n", + "print(model.name, model.id, model.version, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy model as web service\n", + "Once you have your trained model, you can deploy the model on Azure. In this tutorial, we will deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). For more information on deploying models using Azure ML, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create scoring script\n", + "\n", + "First, we will create a scoring script that will be invoked by the web service call. Note that the scoring script must have two required functions:\n", + "* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n", + "* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n", + "\n", + "Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is an ant or a bee. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create environment file\n", + "Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by AML. In this case, we need to specify `torch`, `torchvision`, `pillow`, and `azureml-sdk`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile myenv.yml\n", + "name: myenv\n", + "channels:\n", + " - defaults\n", + "dependencies:\n", + " - pip:\n", + " - torch\n", + " - torchvision\n", + " - pillow\n", + " - azureml-core" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure the container image\n", + "Now configure the Docker image that you will use to build your ACI container." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n", + " runtime='python', \n", + " conda_file='myenv.yml',\n", + " description='Image with hymenoptera model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure the ACI container\n", + "We are almost ready to deploy. Create a deployment configuration file to specify the number of CPUs and gigabytes of RAM needed for your ACI container. While it depends on your model, the default of `1` core and `1` gigabyte of RAM is usually sufficient for many models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'data': 'hymenoptera', 'method':'transfer learning', 'framework':'pytorch'},\n", + " description='Classify ants/bees using transfer learning with PyTorch')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy the registered model\n", + "Finally, let's deploy a web service from our registered model. First, retrieve the model from your workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model(ws, name='pytorch-hymenoptera')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "\n", + "service_name = 'aci-hymenoptera'\n", + "service = Webservice.deploy_from_model(workspace=ws,\n", + " name=service_name,\n", + " models=[model],\n", + " image_config=image_config,\n", + " deployment_config=aciconfig,)\n", + "\n", + "service.wait_for_deployment(show_output=True)\n", + "print(service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If your deployment fails for any reason and you need to redeploy, make sure to delete the service before you do so: `service.delete()`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.get_logs()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get the web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the web service\n", + "Finally, let's test our deployed web service. We will send the data as a JSON string to the web service hosted in ACI and use the SDK's `run` API to invoke the service. Here we will take an arbitrary image from our validation data to predict on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os, json, base64\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def imgToBase64(img):\n", + " \"\"\"Convert pillow image to base64-encoded image\"\"\"\n", + " imgio = BytesIO()\n", + " img.save(imgio, 'JPEG')\n", + " img_str = base64.b64encode(imgio.getvalue())\n", + " return img_str.decode('utf-8')\n", + "\n", + "test_img = os.path.join('hymenoptera_data', 'val', 'bees', '10870992_eebeeb3a12.jpg') #arbitary image from val dataset\n", + "plt.imshow(Image.open(test_img))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "base64Img = imgToBase64(Image.open(test_img))\n", + "\n", + "result = service.run(input_data=json.dumps({'data': base64Img}))\n", + "print(json.loads(result))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete web service\n", + "Once you no longer need the web service, you should delete it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/02.distributed-pytorch-with-horovod/02.distributed-pytorch-with-horovod.ipynb b/training/02.distributed-pytorch-with-horovod/02.distributed-pytorch-with-horovod.ipynb index 3e31658c..c0a41dfa 100644 --- a/training/02.distributed-pytorch-with-horovod/02.distributed-pytorch-with-horovod.ipynb +++ b/training/02.distributed-pytorch-with-horovod/02.distributed-pytorch-with-horovod.ipynb @@ -1,289 +1,289 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 02. Distributed PyTorch with Horovod\n", + "In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", + "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* Review the [tutorial](https://aka.ms/aml-notebook-pytorch) on single-node PyTorch training using the SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the cluster ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './pytorch-distr-hvd'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `pytorch_horovod_mnist.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('pytorch_horovod_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'pytorch-distr-hvd'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a PyTorch estimator\n", + "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import PyTorch\n", + "\n", + "estimator = PyTorch(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " entry_script='pytorch_horovod_mnist.py',\n", + " node_count=2,\n", + " process_count_per_node=1,\n", + " distributed_backend='mpi',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True) # this provides a verbose log" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 02. Distributed PyTorch with Horovod\n", - "In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod)." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](https://aka.ms/aml-notebook-pytorch) on single-node PyTorch training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `pytorch_horovod_mnist.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('pytorch_horovod_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import PyTorch\n", - "\n", - "estimator = PyTorch(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='pytorch_horovod_mnist.py',\n", - " node_count=2,\n", - " process_count_per_node=1,\n", - " distributed_backend='mpi',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/03.train-hyperparameter-tune-deploy-with-tensorflow/03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/training/03.train-hyperparameter-tune-deploy-with-tensorflow/03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb index 809da2ae..817b7f03 100644 --- a/training/03.train-hyperparameter-tune-deploy-with-tensorflow/03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ b/training/03.train-hyperparameter-tune-deploy-with-tensorflow/03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb @@ -1,1624 +1,1624 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": { + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" + } + }, + "source": [ + "# 03. Training MNIST dataset with hyperparameter tuning & deploy to ACI\n", + "\n", + "## Introduction\n", + "This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", + "\n", + "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's get started. First let's import some Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import os\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" + } + }, + "outputs": [], + "source": [ + "import azureml\n", + "from azureml.core import Workspace, Run\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" + } + }, + "source": [ + "## Create an Azure ML experiment\n", + "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" + } + }, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "script_folder = './tf-mnist'\n", + "os.makedirs(script_folder, exist_ok=True)\n", + "\n", + "exp = Experiment(workspace=ws, name='tf-mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "defe921f-8097-44c3-8336-8af6700804a7" + } + }, + "source": [ + "## Download MNIST dataset\n", + "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib\n", + "\n", + "os.makedirs('./data/mnist', exist_ok=True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" + } + }, + "source": [ + "## Show some sample images\n", + "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbpresent": { + "id": "396d478b-34aa-4afa-9898-cdce8222a516" + } + }, + "outputs": [], + "source": [ + "from utils import load_data\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", + "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", + "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", + "\n", + "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", + "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", + "\n", + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload MNIST dataset to default datastore \n", + "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share.\n", + "\n", + "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on a Batch AI cluster for training.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Batch AI cluster as compute target\n", + "[Batch AI](https://docs.microsoft.com/en-us/azure/batch-ai/overview) is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Batch AI cluster in the current workspace, if it doesn't already exist. We will then run the training script on this compute target." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name in the previous cell, then we will create a new cluster here. We will create a Batch AI Cluster of `STANDARD_D2_V2` CPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the Batch AI cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "batchai_cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " # look for the existing cluster by name\n", + " compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n", + " if type(compute_target) is BatchAiCompute:\n", + " print('found compute target {}, just use it.'.format(batchai_cluster_name))\n", + " else:\n", + " print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n", + "except ComputeTargetException:\n", + " print('creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n", + " #vm_priority='lowpriority', # optional\n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n", + " \n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have created the compute target, let's see what the workspace's `compute_targets()` function returns. You should now see one entry named 'cpucluster' of type BatchAI." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for ct in ws.compute_targets():\n", + " print(ct.name, ct.type, ct.provisioning_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Copy the training files into the script folder\n", + "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "# the training logic is in the tf_mnist.py file.\n", + "shutil.copy('./tf_mnist.py', script_folder)\n", + "\n", + "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", + "shutil.copy('./utils.py', script_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nbpresent": { + "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" + } + }, + "source": [ + "## Construct neural network in TensorFlow\n", + "In the training script `tf_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", + "\n", + "![DNN](nn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Azure ML concepts \n", + "Please note the following three things in the code below:\n", + "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", + "```\n", + " parser = argparse.ArgumentParser()\n", + " parser.add_argument('--data_folder')\n", + "```\n", + "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_submitted_run()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n", + "```\n", + " run.log('training_acc', np.float(acc_train))\n", + " run.log('validation_acc', np.float(acc_val))\n", + "```\n", + "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next cell will print out the training code for you to inspect it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(os.path.join(script_folder, './tf_mnist.py'), 'r') as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create TensorFlow estimator\n", + "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", + "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params = {\n", + " '--data-folder': ws.get_default_datastore().as_mount(),\n", + " '--batch-size': 50,\n", + " '--first-layer-neurons': 300,\n", + " '--second-layer-neurons': 100,\n", + " '--learning-rate': 0.01\n", + "}\n", + "\n", + "est = TensorFlow(source_directory=script_folder,\n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " entry_script='tf_mnist.py', \n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit job to run\n", + "Calling the `fit` function on the estimator submits the job to Azure ML for execution. Submitting the job should only take a few seconds." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(config=est)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the Run\n", + "As the Run is executed, it will go through the following stages:\n", + "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", + "\n", + "2. Scaling: If the compute needs to be scaled up (i.e. the Batch AI cluster requires more nodes to execute the run than currently available), the Batch AI cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", + "\n", + "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", + "\n", + "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", + "\n", + "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", + "\n", + "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Run object\n", + "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", + "* `run.get_details()`: Provides a rich set of properties of the run\n", + "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", + "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", + "\n", + "Below are some examples -- please run through them and inspect their output. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_details()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_metrics()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Plot accuracy over epochs\n", + "Since we can retrieve the metrics from the run, we can easily make plots using `matplotlib` in the notebook. Then we can add the plotted image to the run using `run.log_image()`, so all information about the run is kept together." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.makedirs('./imgs', exist_ok = True)\n", + "metrics = run.get_metrics()\n", + "\n", + "plt.figure(figsize = (13,5))\n", + "plt.plot(metrics['validation_acc'], 'r-', lw = 4, alpha = .6)\n", + "plt.plot(metrics['training_acc'], 'b--', alpha = 0.5)\n", + "plt.legend(['Full evaluation set', 'Training set mini-batch'])\n", + "plt.xlabel('epochs', fontsize = 14)\n", + "plt.ylabel('accuracy', fontsize = 14)\n", + "plt.title('Accuracy over Epochs', fontsize = 16)\n", + "run.log_image(name = 'acc_over_epochs.png', plot = plt)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download the saved model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the training script, a TensorFlow `saver` object is used to persist the model in a local folder (local to the compute target). The model was saved to the `./outputs` folder on the disk of the Batch AI cluster node where the job is run. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `Run` object to download the model files the `saver` object saved. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`. Note the TensorFlow model consists of four files in binary format and they are not human-readable." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create a model folder in the current directory\n", + "os.makedirs('./model', exist_ok = True)\n", + "\n", + "for f in run.get_file_names():\n", + " if f.startswith('outputs/model'):\n", + " output_file_path = os.path.join('./model', f.split('/')[-1])\n", + " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", + " run.download_file(name = f, output_file_path = output_file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predict on the test set\n", + "Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n", + "\n", + "Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "tf.reset_default_graph()\n", + "\n", + "saver = tf.train.import_meta_graph(\"./model/mnist-tf.model.meta\")\n", + "graph = tf.get_default_graph()\n", + "\n", + "for op in graph.get_operations():\n", + " if op.name.startswith('network'):\n", + " print(op.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feed test dataset to the persisted model to get predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\n", + "X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", + "# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\n", + "output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", + "\n", + "with tf.Session() as sess:\n", + " saver.restore(sess, './model/mnist-tf.model')\n", + " k = output.eval(feed_dict = {X : X_test})\n", + "# get the prediction, which is the index of the element that has the largest probability value.\n", + "y_hat = np.argmax(k, axis = 1)\n", + "\n", + "# print the first 30 labels and predictions\n", + "print('labels: \\t', y_test[:30])\n", + "print('predictions:\\t', y_hat[:30])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calculate the overall accuracy by comparing the predicted value against the test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Intelligent hyperparameter tuning\n", + "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.hyperdrive import *\n", + "\n", + "ps = RandomParameterSampling(\n", + " {\n", + " '--batch-size': choice(25, 50, 100),\n", + " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", + " '--second-layer-neurons': choice(10, 50, 200, 500),\n", + " '--learning-rate': loguniform(-6, -1)\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = TensorFlow(source_directory=script_folder,\n", + " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", + " compute_target=compute_target,\n", + " entry_script='tf_mnist.py', \n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "htc = HyperDriveRunConfig(estimator=est, \n", + " hyperparameter_sampling=ps, \n", + " primary_metric_name='validation_acc', \n", + " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", + " max_total_runs=20,\n", + " max_concurrent_runs=4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's launch the hyperparameter tuning job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "htr = exp.submit(config=htc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use a run history widget to show the progress. Be patient as this might take a while to complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RunDetails(htr).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Find and register best model\n", + "When all the jobs finish, we can find out the one that has the highest accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run = htr.get_best_run_by_primary_metric()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's list the model files uploaded during the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(best_run.get_file_names()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy the model in ACI\n", + "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", + "### Create score.py\n", + "First, we will create a scoring script that will be invoked by the web service call. \n", + "\n", + "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", + " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", + " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import os\n", + "import tensorflow as tf\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global X, output, sess\n", + " tf.reset_default_graph()\n", + " model_root = Model.get_model_path('tf-dnn-mnist')\n", + " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", + " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", + " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", + " \n", + " sess = tf.Session()\n", + " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", + "\n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " out = output.eval(session = sess, feed_dict = {X: data})\n", + " y_hat = np.argmax(out, axis = 1)\n", + " return json.dumps(y_hat.tolist())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create myenv.yml\n", + "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import CondaDependencies\n", + "cd = CondaDependencies.create()\n", + "cd.add_conda_package('numpy')\n", + "cd.add_tensorflow_conda_package()\n", + "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", + "\n", + "print(cd.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy to ACI\n", + "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", + " description='Tensorflow DNN on MNIST')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deployment Process\n", + "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", + "1. **Register model** \n", + "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", + "2. **Build Docker image** \n", + "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", + "3. **Register image** \n", + "Register that image under the workspace. \n", + "4. **Ship to ACI** \n", + "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "\n", + "service = Webservice.deploy_from_model(workspace=ws,\n", + " name='tf-mnist-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=imgconfig)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.get_logs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the scoring web service endpoint:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the deployed model\n", + "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", + "\n", + "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "\n", + "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", + "test_samples = bytes(test_samples, encoding = 'utf8')\n", + "\n", + "# predict using the deployed model\n", + "result = json.loads(service.run(input_data = test_samples))\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x = 10, y = -10, s = y_hat[s], fontsize = 18, color = font_color)\n", + " plt.imshow(X_test[s].reshape(28, 28), cmap = clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also send raw HTTP request to the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "\n", + "# send a random row from the test set to score\n", + "random_index = np.random.randint(0, len(X_test)-1)\n", + "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", + "\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", + "\n", + "print(\"POST to url\", service.scoring_uri)\n", + "#print(\"input data:\", input_data)\n", + "print(\"label:\", y_test[random_index])\n", + "print(\"prediction:\", resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the workspace after the web service was deployed. You should see \n", + "* a registered model named 'model' and with the id 'model:1'\n", + "* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", + "* a webservice called 'tf-mnist' with some scoring URL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for model in ws.models():\n", + " print(\"Model:\", model.name, model.id)\n", + "\n", + "for image in ws.images():\n", + " print(\"Image:\", image.name, image.image_location)\n", + "\n", + "for webservice in ws.webservices():\n", + " print(\"Webservice:\", webservice.name, webservice.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up\n", + "You can delete the ACI deployment with a simple delete API call." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also delete the computer cluster. But remember if you set the `cluster_min_nodes` value to 0 when you created the cluster, once the jobs are finished, all nodes are deleted automatically. So you don't have to delete the cluster itself since it won't incur any cost. Next time you submit jobs to it, the cluster will then automatically \"grow\" up to the `cluster_min_nodes` which is set to 4." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# delete the cluster if you need to.\n", + "compute_target.delete()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, "nbpresent": { - "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" + "slides": { + "05bb34ad-74b0-42b3-9654-8357d1ba9c99": { + "id": "05bb34ad-74b0-42b3-9654-8357d1ba9c99", + "prev": "851089af-9725-40c9-8f0b-9bf892b2b1fe", + "regions": { + "23fb396d-50f9-4770-adb3-0d6abcb40767": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "2039d2d5-aca6-4f25-a12f-df9ae6529cae", + "part": "whole" + }, + "id": "23fb396d-50f9-4770-adb3-0d6abcb40767" + } + } + }, + "11bebe14-d1dc-476d-a31a-5828b9c3adf0": { + "id": "11bebe14-d1dc-476d-a31a-5828b9c3adf0", + "prev": "502648cb-26fe-496b-899f-84c8fe1dcbc0", + "regions": { + "a42499db-623e-4414-bea2-ff3617fd8fc5": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "4788c040-27a2-4dc1-8ed0-378a99b3a255", + "part": "whole" + }, + "id": "a42499db-623e-4414-bea2-ff3617fd8fc5" + } + } + }, + "134f92d0-6389-4226-af51-1134ae8e8278": { + "id": "134f92d0-6389-4226-af51-1134ae8e8278", + "prev": "36b8728c-32ad-4941-be03-5cef51cdc430", + "regions": { + "b6d82a77-2d58-4b9e-a375-3103214b826c": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "7ab0e6d0-1f1c-451b-8ac5-687da44a8287", + "part": "whole" + }, + "id": "b6d82a77-2d58-4b9e-a375-3103214b826c" + } + } + }, + "282a2421-697b-4fd0-9485-755abf5a0c18": { + "id": "282a2421-697b-4fd0-9485-755abf5a0c18", + "prev": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2", + "regions": { + "522fec96-abe7-4a34-bd34-633733afecc8": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "d58e7785-c2ee-4a45-8e3d-4c538bf8075a", + "part": "whole" + }, + "id": "522fec96-abe7-4a34-bd34-633733afecc8" + } + } + }, + "2dfec088-8a70-411a-9199-904ef3fa2383": { + "id": "2dfec088-8a70-411a-9199-904ef3fa2383", + "prev": "282a2421-697b-4fd0-9485-755abf5a0c18", + "regions": { + "0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3", + "part": "whole" + }, + "id": "0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e" + } + } + }, + "36a814c9-c540-4a6d-92d9-c03553d3d2c2": { + "id": "36a814c9-c540-4a6d-92d9-c03553d3d2c2", + "prev": "b52e4d09-5186-44e5-84db-3371c087acde", + "regions": { + "8bfba503-9907-43f0-b1a6-46a0b4311793": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "d5e4a56c-dfac-4346-be83-1c15b503deac", + "part": "whole" + }, + "id": "8bfba503-9907-43f0-b1a6-46a0b4311793" + } + } + }, + "36b8728c-32ad-4941-be03-5cef51cdc430": { + "id": "36b8728c-32ad-4941-be03-5cef51cdc430", + "prev": "05bb34ad-74b0-42b3-9654-8357d1ba9c99", + "regions": { + "a36a5bdf-7f62-49b0-8634-e155a98851dc": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "e33dfc47-e7df-4623-a7a6-ab6bcf944629", + "part": "whole" + }, + "id": "a36a5bdf-7f62-49b0-8634-e155a98851dc" + } + } + }, + "3f136f2a-f14c-4a4b-afea-13380556a79c": { + "id": "3f136f2a-f14c-4a4b-afea-13380556a79c", + "prev": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3", + "regions": { + "80ecf237-d1b0-401e-83d2-6d04b7fcebd3": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "7debeb2b-ecea-414f-9b50-49657abb3e6a", + "part": "whole" + }, + "id": "80ecf237-d1b0-401e-83d2-6d04b7fcebd3" + } + } + }, + "502648cb-26fe-496b-899f-84c8fe1dcbc0": { + "id": "502648cb-26fe-496b-899f-84c8fe1dcbc0", + "prev": "3f136f2a-f14c-4a4b-afea-13380556a79c", + "regions": { + "4c83bb4d-2a52-41ba-a77f-0c6efebd83a6": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "dbd22f6b-6d49-4005-b8fe-422ef8ef1d42", + "part": "whole" + }, + "id": "4c83bb4d-2a52-41ba-a77f-0c6efebd83a6" + } + } + }, + "54cb8dfd-a89c-4922-867b-3c87d8b67cd3": { + "id": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3", + "prev": "aa224267-f885-4c0c-95af-7bacfcc186d9", + "regions": { + "0848f0a7-032d-46c7-b35c-bfb69c83f961": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "3c32c557-d0e8-4bb3-a61a-aa51a767cd4e", + "part": "whole" + }, + "id": "0848f0a7-032d-46c7-b35c-bfb69c83f961" + } + } + }, + "636b563c-faee-4c9e-a6a3-f46a905bfa82": { + "id": "636b563c-faee-4c9e-a6a3-f46a905bfa82", + "prev": "c5f59b98-a227-4344-9d6d-03abdd01c6aa", + "regions": { + "9c64f662-05dc-4b14-9cdc-d450b96f4368": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "70640ac0-7041-47a8-9a7f-e871defd74b2", + "part": "whole" + }, + "id": "9c64f662-05dc-4b14-9cdc-d450b96f4368" + } + } + }, + "793cec2f-8413-484d-aa1e-388fd2b53a45": { + "id": "793cec2f-8413-484d-aa1e-388fd2b53a45", + "prev": "c66f3dfd-2d27-482b-be78-10ba733e826b", + "regions": { + "d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "dd56113e-e3db-41ae-91b7-2472ed194308", + "part": "whole" + }, + "id": "d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e" + } + } + }, + "83e912ff-260a-4391-8a12-331aba098506": { + "id": "83e912ff-260a-4391-8a12-331aba098506", + "prev": "fe5a0732-69f5-462a-8af6-851f84a9fdec", + "regions": { + "2fefcf5f-ea20-4604-a528-5e6c91bcb100": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea", + "part": "whole" + }, + "id": "2fefcf5f-ea20-4604-a528-5e6c91bcb100" + } + } + }, + "851089af-9725-40c9-8f0b-9bf892b2b1fe": { + "id": "851089af-9725-40c9-8f0b-9bf892b2b1fe", + "prev": "636b563c-faee-4c9e-a6a3-f46a905bfa82", + "regions": { + "31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "8408b90e-6cdd-44d1-86d3-648c23f877ac", + "part": "whole" + }, + "id": "31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8" + } + } + }, + "87ab653d-e804-470f-bde9-c67caaa0f354": { + "id": "87ab653d-e804-470f-bde9-c67caaa0f354", + "prev": "a8c2d446-caee-42c8-886a-ed98f4935d78", + "regions": { + "bc3aeb56-c465-4868-a1ea-2de82584de98": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "59f52294-4a25-4c92-bab8-3b07f0f44d15", + "part": "whole" + }, + "id": "bc3aeb56-c465-4868-a1ea-2de82584de98" + } + } + }, + "8b887c97-83bc-4395-83ac-f6703cbe243d": { + "id": "8b887c97-83bc-4395-83ac-f6703cbe243d", + "prev": "36a814c9-c540-4a6d-92d9-c03553d3d2c2", + "regions": { + "9d0bc72a-cb13-483f-a572-2bf60d0d145f": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "75499c85-d0a1-43db-8244-25778b9b2736", + "part": "whole" + }, + "id": "9d0bc72a-cb13-483f-a572-2bf60d0d145f" + } + } + }, + "a8b9ceb9-b38f-4489-84df-b644c6fe28f2": { + "id": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2", + "prev": null, + "regions": { + "f741ed94-3f24-4427-b615-3ab8753e5814": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "bf74d2e9-2708-49b1-934b-e0ede342f475", + "part": "whole" + }, + "id": "f741ed94-3f24-4427-b615-3ab8753e5814" + } + } + }, + "a8c2d446-caee-42c8-886a-ed98f4935d78": { + "id": "a8c2d446-caee-42c8-886a-ed98f4935d78", + "prev": "2dfec088-8a70-411a-9199-904ef3fa2383", + "regions": { + "f03457d8-b2a7-4e14-9a73-cab80c5b815d": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "edaa7f2f-2439-4148-b57a-8c794c0945ec", + "part": "whole" + }, + "id": "f03457d8-b2a7-4e14-9a73-cab80c5b815d" + } + } + }, + "aa224267-f885-4c0c-95af-7bacfcc186d9": { + "id": "aa224267-f885-4c0c-95af-7bacfcc186d9", + "prev": "793cec2f-8413-484d-aa1e-388fd2b53a45", + "regions": { + "0d7ac442-5e1d-49a5-91b3-1432d72449d8": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "4d6826fe-2cb8-4468-85ed-a242a1ce7155", + "part": "whole" + }, + "id": "0d7ac442-5e1d-49a5-91b3-1432d72449d8" + } + } + }, + "b52e4d09-5186-44e5-84db-3371c087acde": { + "id": "b52e4d09-5186-44e5-84db-3371c087acde", + "prev": "134f92d0-6389-4226-af51-1134ae8e8278", + "regions": { + "7af7d997-80b2-497d-bced-ef8341763439": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "376882ec-d469-4fad-9462-18e4bbea64ca", + "part": "whole" + }, + "id": "7af7d997-80b2-497d-bced-ef8341763439" + } + } + }, + "c5f59b98-a227-4344-9d6d-03abdd01c6aa": { + "id": "c5f59b98-a227-4344-9d6d-03abdd01c6aa", + "prev": "83e912ff-260a-4391-8a12-331aba098506", + "regions": { + "7268abff-0540-4c06-aefc-c386410c0953": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "396d478b-34aa-4afa-9898-cdce8222a516", + "part": "whole" + }, + "id": "7268abff-0540-4c06-aefc-c386410c0953" + } + } + }, + "c66f3dfd-2d27-482b-be78-10ba733e826b": { + "id": "c66f3dfd-2d27-482b-be78-10ba733e826b", + "prev": "8b887c97-83bc-4395-83ac-f6703cbe243d", + "regions": { + "6cbe8e0e-8645-41a1-8a38-e44acb81be4b": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "7594c7c7-b808-48f7-9500-d7830a07968a", + "part": "whole" + }, + "id": "6cbe8e0e-8645-41a1-8a38-e44acb81be4b" + } + } + }, + "d22045e5-7e3e-452e-bc7b-c6c4a893da8e": { + "id": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e", + "prev": "ec41f96a-63a3-4825-9295-f4657a440ddb", + "regions": { + "24e2a3a9-bf65-4dab-927f-0bf6ffbe581d": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "defe921f-8097-44c3-8336-8af6700804a7", + "part": "whole" + }, + "id": "24e2a3a9-bf65-4dab-927f-0bf6ffbe581d" + } + } + }, + "d24c958c-e419-4e4d-aa9c-d228a8ca55e4": { + "id": "d24c958c-e419-4e4d-aa9c-d228a8ca55e4", + "prev": "11bebe14-d1dc-476d-a31a-5828b9c3adf0", + "regions": { + "25312144-9faa-4680-bb8e-6307ea71370f": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "bed09a92-9a7a-473b-9464-90e479883a3e", + "part": "whole" + }, + "id": "25312144-9faa-4680-bb8e-6307ea71370f" + } + } + }, + "ec41f96a-63a3-4825-9295-f4657a440ddb": { + "id": "ec41f96a-63a3-4825-9295-f4657a440ddb", + "prev": "87ab653d-e804-470f-bde9-c67caaa0f354", + "regions": { + "22e8be98-c254-4d04-b0e4-b9b5ae46eefe": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "bc70f780-c240-4779-96f3-bc5ef9a37d59", + "part": "whole" + }, + "id": "22e8be98-c254-4d04-b0e4-b9b5ae46eefe" + } + } + }, + "fe5a0732-69f5-462a-8af6-851f84a9fdec": { + "id": "fe5a0732-69f5-462a-8af6-851f84a9fdec", + "prev": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e", + "regions": { + "671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b": { + "attrs": { + "height": 0.8, + "width": 0.8, + "x": 0.1, + "y": 0.1 + }, + "content": { + "cell": "fd46e2ab-4ab6-4001-b536-1f323525d7d3", + "part": "whole" + }, + "id": "671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b" + } + } + } + }, + "themes": {} } - }, - "source": [ - "# 03. Training MNIST dataset with hyperparameter tuning & deploy to ACI\n", - "\n", - "## Introduction\n", - "This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", - "\n", - "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", - "\n", - "## Prerequisite:\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's get started. First let's import some Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" - } - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import os\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" - } - }, - "outputs": [], - "source": [ - "import azureml\n", - "from azureml.core import Workspace, Run\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" - } - }, - "source": [ - "## Create an Azure ML experiment\n", - "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" - } - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "script_folder = './tf-mnist'\n", - "os.makedirs(script_folder, exist_ok=True)\n", - "\n", - "exp = Experiment(workspace=ws, name='tf-mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "defe921f-8097-44c3-8336-8af6700804a7" - } - }, - "source": [ - "## Download MNIST dataset\n", - "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "## Show some sample images\n", - "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", - "\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload MNIST dataset to default datastore \n", - "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share.\n", - "\n", - "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on a Batch AI cluster for training.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Batch AI cluster as compute target\n", - "[Batch AI](https://docs.microsoft.com/en-us/azure/batch-ai/overview) is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Batch AI cluster in the current workspace, if it doesn't already exist. We will then run the training script on this compute target." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name in the previous cell, then we will create a new cluster here. We will create a Batch AI Cluster of `STANDARD_D2_V2` CPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the Batch AI cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "batchai_cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " # look for the existing cluster by name\n", - " compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n", - " if type(compute_target) is BatchAiCompute:\n", - " print('found compute target {}, just use it.'.format(batchai_cluster_name))\n", - " else:\n", - " print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n", - "except ComputeTargetException:\n", - " print('creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n", - " #vm_priority='lowpriority', # optional\n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n", - " \n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - " \n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have created the compute target, let's see what the workspace's `compute_targets()` function returns. You should now see one entry named 'cpucluster' of type BatchAI." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for ct in ws.compute_targets():\n", - " print(ct.name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy the training files into the script folder\n", - "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "# the training logic is in the tf_mnist.py file.\n", - "shutil.copy('./tf_mnist.py', script_folder)\n", - "\n", - "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", - "shutil.copy('./utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" - } - }, - "source": [ - "## Construct neural network in TensorFlow\n", - "In the training script `tf_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", - "\n", - "![DNN](nn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Azure ML concepts \n", - "Please note the following three things in the code below:\n", - "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", - "```\n", - " parser = argparse.ArgumentParser()\n", - " parser.add_argument('--data_folder')\n", - "```\n", - "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_submitted_run()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n", - "```\n", - " run.log('training_acc', np.float(acc_train))\n", - " run.log('validation_acc', np.float(acc_val))\n", - "```\n", - "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next cell will print out the training code for you to inspect it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(os.path.join(script_folder, './tf_mnist.py'), 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create TensorFlow estimator\n", - "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", - "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\n", - " '--data-folder': ws.get_default_datastore().as_mount(),\n", - " '--batch-size': 50,\n", - " '--first-layer-neurons': 300,\n", - " '--second-layer-neurons': 100,\n", - " '--learning-rate': 0.01\n", - "}\n", - "\n", - "est = TensorFlow(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit job to run\n", - "Calling the `fit` function on the estimator submits the job to Azure ML for execution. Submitting the job should only take a few seconds." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(config=est)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the Run\n", - "As the Run is executed, it will go through the following stages:\n", - "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", - "\n", - "2. Scaling: If the compute needs to be scaled up (i.e. the Batch AI cluster requires more nodes to execute the run than currently available), the Batch AI cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", - "\n", - "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", - "\n", - "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", - "\n", - "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", - "\n", - "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The Run object\n", - "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", - "* `run.get_details()`: Provides a rich set of properties of the run\n", - "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", - "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", - "\n", - "Below are some examples -- please run through them and inspect their output. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_details()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Plot accuracy over epochs\n", - "Since we can retrieve the metrics from the run, we can easily make plots using `matplotlib` in the notebook. Then we can add the plotted image to the run using `run.log_image()`, so all information about the run is kept together." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "os.makedirs('./imgs', exist_ok = True)\n", - "metrics = run.get_metrics()\n", - "\n", - "plt.figure(figsize = (13,5))\n", - "plt.plot(metrics['validation_acc'], 'r-', lw = 4, alpha = .6)\n", - "plt.plot(metrics['training_acc'], 'b--', alpha = 0.5)\n", - "plt.legend(['Full evaluation set', 'Training set mini-batch'])\n", - "plt.xlabel('epochs', fontsize = 14)\n", - "plt.ylabel('accuracy', fontsize = 14)\n", - "plt.title('Accuracy over Epochs', fontsize = 16)\n", - "run.log_image(name = 'acc_over_epochs.png', plot = plt)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download the saved model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the training script, a TensorFlow `saver` object is used to persist the model in a local folder (local to the compute target). The model was saved to the `./outputs` folder on the disk of the Batch AI cluster node where the job is run. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `Run` object to download the model files the `saver` object saved. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`. Note the TensorFlow model consists of four files in binary format and they are not human-readable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a model folder in the current directory\n", - "os.makedirs('./model', exist_ok = True)\n", - "\n", - "for f in run.get_file_names():\n", - " if f.startswith('outputs/model'):\n", - " output_file_path = os.path.join('./model', f.split('/')[-1])\n", - " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", - " run.download_file(name = f, output_file_path = output_file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Predict on the test set\n", - "Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n", - "\n", - "Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "tf.reset_default_graph()\n", - "\n", - "saver = tf.train.import_meta_graph(\"./model/mnist-tf.model.meta\")\n", - "graph = tf.get_default_graph()\n", - "\n", - "for op in graph.get_operations():\n", - " if op.name.startswith('network'):\n", - " print(op.name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feed test dataset to the persisted model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\n", - "X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - "# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\n", - "output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, './model/mnist-tf.model')\n", - " k = output.eval(feed_dict = {X : X_test})\n", - "# get the prediction, which is the index of the element that has the largest probability value.\n", - "y_hat = np.argmax(k, axis = 1)\n", - "\n", - "# print the first 30 labels and predictions\n", - "print('labels: \\t', y_test[:30])\n", - "print('predictions:\\t', y_hat[:30])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calculate the overall accuracy by comparing the predicted value against the test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Intelligent hyperparameter tuning\n", - "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import *\n", - "\n", - "ps = RandomParameterSampling(\n", - " {\n", - " '--batch-size': choice(25, 50, 100),\n", - " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", - " '--second-layer-neurons': choice(10, 50, 200, 500),\n", - " '--learning-rate': loguniform(-6, -1)\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = TensorFlow(source_directory=script_folder,\n", - " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htc = HyperDriveRunConfig(estimator=est, \n", - " hyperparameter_sampling=ps, \n", - " primary_metric_name='validation_acc', \n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", - " max_total_runs=20,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's launch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htr = exp.submit(config=htc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can use a run history widget to show the progress. Be patient as this might take a while to complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(htr).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Find and register best model\n", - "When all the jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = htr.get_best_run_by_primary_metric()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model in ACI\n", - "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", - "### Create score.py\n", - "First, we will create a scoring script that will be invoked by the web service call. \n", - "\n", - "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", - " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", - " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import tensorflow as tf\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global X, output, sess\n", - " tf.reset_default_graph()\n", - " model_root = Model.get_model_path('tf-dnn-mnist')\n", - " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", - " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - " \n", - " sess = tf.Session()\n", - " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " out = output.eval(session = sess, feed_dict = {X: data})\n", - " y_hat = np.argmax(out, axis = 1)\n", - " return json.dumps(y_hat.tolist())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import CondaDependencies\n", - "cd = CondaDependencies.create()\n", - "cd.add_conda_package('numpy')\n", - "cd.add_tensorflow_conda_package()\n", - "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", - "\n", - "print(cd.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", - " description='Tensorflow DNN on MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Deployment Process\n", - "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", - "1. **Register model** \n", - "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", - "2. **Build Docker image** \n", - "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", - "3. **Register image** \n", - "Register that image under the workspace. \n", - "4. **Ship to ACI** \n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='tf-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.get_logs())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is the scoring web service endpoint:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the deployed model\n", - "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", - "\n", - "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding = 'utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = json.loads(service.run(input_data = test_samples))\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x = 10, y = -10, s = y_hat[s], fontsize = 18, color = font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap = clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also send raw HTTP request to the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import json\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the workspace after the web service was deployed. You should see \n", - "* a registered model named 'model' and with the id 'model:1'\n", - "* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", - "* a webservice called 'tf-mnist' with some scoring URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for model in ws.models():\n", - " print(\"Model:\", model.name, model.id)\n", - "\n", - "for image in ws.images():\n", - " print(\"Image:\", image.name, image.image_location)\n", - "\n", - "for webservice in ws.webservices():\n", - " print(\"Webservice:\", webservice.name, webservice.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up\n", - "You can delete the ACI deployment with a simple delete API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also delete the computer cluster. But remember if you set the `cluster_min_nodes` value to 0 when you created the cluster, once the jobs are finished, all nodes are deleted automatically. So you don't have to delete the cluster itself since it won't incur any cost. Next time you submit jobs to it, the cluster will then automatically \"grow\" up to the `cluster_min_nodes` which is set to 4." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# delete the cluster if you need to.\n", - "compute_target.delete()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "nbpresent": { - "slides": { - "05bb34ad-74b0-42b3-9654-8357d1ba9c99": { - "id": "05bb34ad-74b0-42b3-9654-8357d1ba9c99", - "prev": "851089af-9725-40c9-8f0b-9bf892b2b1fe", - "regions": { - "23fb396d-50f9-4770-adb3-0d6abcb40767": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "2039d2d5-aca6-4f25-a12f-df9ae6529cae", - "part": "whole" - }, - "id": "23fb396d-50f9-4770-adb3-0d6abcb40767" - } - } - }, - "11bebe14-d1dc-476d-a31a-5828b9c3adf0": { - "id": "11bebe14-d1dc-476d-a31a-5828b9c3adf0", - "prev": "502648cb-26fe-496b-899f-84c8fe1dcbc0", - "regions": { - "a42499db-623e-4414-bea2-ff3617fd8fc5": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "4788c040-27a2-4dc1-8ed0-378a99b3a255", - "part": "whole" - }, - "id": "a42499db-623e-4414-bea2-ff3617fd8fc5" - } - } - }, - "134f92d0-6389-4226-af51-1134ae8e8278": { - "id": "134f92d0-6389-4226-af51-1134ae8e8278", - "prev": "36b8728c-32ad-4941-be03-5cef51cdc430", - "regions": { - "b6d82a77-2d58-4b9e-a375-3103214b826c": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "7ab0e6d0-1f1c-451b-8ac5-687da44a8287", - "part": "whole" - }, - "id": "b6d82a77-2d58-4b9e-a375-3103214b826c" - } - } - }, - "282a2421-697b-4fd0-9485-755abf5a0c18": { - "id": "282a2421-697b-4fd0-9485-755abf5a0c18", - "prev": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2", - "regions": { - "522fec96-abe7-4a34-bd34-633733afecc8": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "d58e7785-c2ee-4a45-8e3d-4c538bf8075a", - "part": "whole" - }, - "id": "522fec96-abe7-4a34-bd34-633733afecc8" - } - } - }, - "2dfec088-8a70-411a-9199-904ef3fa2383": { - "id": "2dfec088-8a70-411a-9199-904ef3fa2383", - "prev": "282a2421-697b-4fd0-9485-755abf5a0c18", - "regions": { - "0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3", - "part": "whole" - }, - "id": "0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e" - } - } - }, - "36a814c9-c540-4a6d-92d9-c03553d3d2c2": { - "id": "36a814c9-c540-4a6d-92d9-c03553d3d2c2", - "prev": "b52e4d09-5186-44e5-84db-3371c087acde", - "regions": { - "8bfba503-9907-43f0-b1a6-46a0b4311793": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "d5e4a56c-dfac-4346-be83-1c15b503deac", - "part": "whole" - }, - "id": "8bfba503-9907-43f0-b1a6-46a0b4311793" - } - } - }, - "36b8728c-32ad-4941-be03-5cef51cdc430": { - "id": "36b8728c-32ad-4941-be03-5cef51cdc430", - "prev": "05bb34ad-74b0-42b3-9654-8357d1ba9c99", - "regions": { - "a36a5bdf-7f62-49b0-8634-e155a98851dc": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "e33dfc47-e7df-4623-a7a6-ab6bcf944629", - "part": "whole" - }, - "id": "a36a5bdf-7f62-49b0-8634-e155a98851dc" - } - } - }, - "3f136f2a-f14c-4a4b-afea-13380556a79c": { - "id": "3f136f2a-f14c-4a4b-afea-13380556a79c", - "prev": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3", - "regions": { - "80ecf237-d1b0-401e-83d2-6d04b7fcebd3": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "7debeb2b-ecea-414f-9b50-49657abb3e6a", - "part": "whole" - }, - "id": "80ecf237-d1b0-401e-83d2-6d04b7fcebd3" - } - } - }, - "502648cb-26fe-496b-899f-84c8fe1dcbc0": { - "id": "502648cb-26fe-496b-899f-84c8fe1dcbc0", - "prev": "3f136f2a-f14c-4a4b-afea-13380556a79c", - "regions": { - "4c83bb4d-2a52-41ba-a77f-0c6efebd83a6": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "dbd22f6b-6d49-4005-b8fe-422ef8ef1d42", - "part": "whole" - }, - "id": "4c83bb4d-2a52-41ba-a77f-0c6efebd83a6" - } - } - }, - "54cb8dfd-a89c-4922-867b-3c87d8b67cd3": { - "id": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3", - "prev": "aa224267-f885-4c0c-95af-7bacfcc186d9", - "regions": { - "0848f0a7-032d-46c7-b35c-bfb69c83f961": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "3c32c557-d0e8-4bb3-a61a-aa51a767cd4e", - "part": "whole" - }, - "id": "0848f0a7-032d-46c7-b35c-bfb69c83f961" - } - } - }, - "636b563c-faee-4c9e-a6a3-f46a905bfa82": { - "id": "636b563c-faee-4c9e-a6a3-f46a905bfa82", - "prev": "c5f59b98-a227-4344-9d6d-03abdd01c6aa", - "regions": { - "9c64f662-05dc-4b14-9cdc-d450b96f4368": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "70640ac0-7041-47a8-9a7f-e871defd74b2", - "part": "whole" - }, - "id": "9c64f662-05dc-4b14-9cdc-d450b96f4368" - } - } - }, - "793cec2f-8413-484d-aa1e-388fd2b53a45": { - "id": "793cec2f-8413-484d-aa1e-388fd2b53a45", - "prev": "c66f3dfd-2d27-482b-be78-10ba733e826b", - "regions": { - "d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "dd56113e-e3db-41ae-91b7-2472ed194308", - "part": "whole" - }, - "id": "d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e" - } - } - }, - "83e912ff-260a-4391-8a12-331aba098506": { - "id": "83e912ff-260a-4391-8a12-331aba098506", - "prev": "fe5a0732-69f5-462a-8af6-851f84a9fdec", - "regions": { - "2fefcf5f-ea20-4604-a528-5e6c91bcb100": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea", - "part": "whole" - }, - "id": "2fefcf5f-ea20-4604-a528-5e6c91bcb100" - } - } - }, - "851089af-9725-40c9-8f0b-9bf892b2b1fe": { - "id": "851089af-9725-40c9-8f0b-9bf892b2b1fe", - "prev": "636b563c-faee-4c9e-a6a3-f46a905bfa82", - "regions": { - "31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "8408b90e-6cdd-44d1-86d3-648c23f877ac", - "part": "whole" - }, - "id": "31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8" - } - } - }, - "87ab653d-e804-470f-bde9-c67caaa0f354": { - "id": "87ab653d-e804-470f-bde9-c67caaa0f354", - "prev": "a8c2d446-caee-42c8-886a-ed98f4935d78", - "regions": { - "bc3aeb56-c465-4868-a1ea-2de82584de98": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "59f52294-4a25-4c92-bab8-3b07f0f44d15", - "part": "whole" - }, - "id": "bc3aeb56-c465-4868-a1ea-2de82584de98" - } - } - }, - "8b887c97-83bc-4395-83ac-f6703cbe243d": { - "id": "8b887c97-83bc-4395-83ac-f6703cbe243d", - "prev": "36a814c9-c540-4a6d-92d9-c03553d3d2c2", - "regions": { - "9d0bc72a-cb13-483f-a572-2bf60d0d145f": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "75499c85-d0a1-43db-8244-25778b9b2736", - "part": "whole" - }, - "id": "9d0bc72a-cb13-483f-a572-2bf60d0d145f" - } - } - }, - "a8b9ceb9-b38f-4489-84df-b644c6fe28f2": { - "id": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2", - "prev": null, - "regions": { - "f741ed94-3f24-4427-b615-3ab8753e5814": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "bf74d2e9-2708-49b1-934b-e0ede342f475", - "part": "whole" - }, - "id": "f741ed94-3f24-4427-b615-3ab8753e5814" - } - } - }, - "a8c2d446-caee-42c8-886a-ed98f4935d78": { - "id": "a8c2d446-caee-42c8-886a-ed98f4935d78", - "prev": "2dfec088-8a70-411a-9199-904ef3fa2383", - "regions": { - "f03457d8-b2a7-4e14-9a73-cab80c5b815d": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "edaa7f2f-2439-4148-b57a-8c794c0945ec", - "part": "whole" - }, - "id": "f03457d8-b2a7-4e14-9a73-cab80c5b815d" - } - } - }, - "aa224267-f885-4c0c-95af-7bacfcc186d9": { - "id": "aa224267-f885-4c0c-95af-7bacfcc186d9", - "prev": "793cec2f-8413-484d-aa1e-388fd2b53a45", - "regions": { - "0d7ac442-5e1d-49a5-91b3-1432d72449d8": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "4d6826fe-2cb8-4468-85ed-a242a1ce7155", - "part": "whole" - }, - "id": "0d7ac442-5e1d-49a5-91b3-1432d72449d8" - } - } - }, - "b52e4d09-5186-44e5-84db-3371c087acde": { - "id": "b52e4d09-5186-44e5-84db-3371c087acde", - "prev": "134f92d0-6389-4226-af51-1134ae8e8278", - "regions": { - "7af7d997-80b2-497d-bced-ef8341763439": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "376882ec-d469-4fad-9462-18e4bbea64ca", - "part": "whole" - }, - "id": "7af7d997-80b2-497d-bced-ef8341763439" - } - } - }, - "c5f59b98-a227-4344-9d6d-03abdd01c6aa": { - "id": "c5f59b98-a227-4344-9d6d-03abdd01c6aa", - "prev": "83e912ff-260a-4391-8a12-331aba098506", - "regions": { - "7268abff-0540-4c06-aefc-c386410c0953": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "396d478b-34aa-4afa-9898-cdce8222a516", - "part": "whole" - }, - "id": "7268abff-0540-4c06-aefc-c386410c0953" - } - } - }, - "c66f3dfd-2d27-482b-be78-10ba733e826b": { - "id": "c66f3dfd-2d27-482b-be78-10ba733e826b", - "prev": "8b887c97-83bc-4395-83ac-f6703cbe243d", - "regions": { - "6cbe8e0e-8645-41a1-8a38-e44acb81be4b": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "7594c7c7-b808-48f7-9500-d7830a07968a", - "part": "whole" - }, - "id": "6cbe8e0e-8645-41a1-8a38-e44acb81be4b" - } - } - }, - "d22045e5-7e3e-452e-bc7b-c6c4a893da8e": { - "id": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e", - "prev": "ec41f96a-63a3-4825-9295-f4657a440ddb", - "regions": { - "24e2a3a9-bf65-4dab-927f-0bf6ffbe581d": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "defe921f-8097-44c3-8336-8af6700804a7", - "part": "whole" - }, - "id": "24e2a3a9-bf65-4dab-927f-0bf6ffbe581d" - } - } - }, - "d24c958c-e419-4e4d-aa9c-d228a8ca55e4": { - "id": "d24c958c-e419-4e4d-aa9c-d228a8ca55e4", - "prev": "11bebe14-d1dc-476d-a31a-5828b9c3adf0", - "regions": { - "25312144-9faa-4680-bb8e-6307ea71370f": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "bed09a92-9a7a-473b-9464-90e479883a3e", - "part": "whole" - }, - "id": "25312144-9faa-4680-bb8e-6307ea71370f" - } - } - }, - "ec41f96a-63a3-4825-9295-f4657a440ddb": { - "id": "ec41f96a-63a3-4825-9295-f4657a440ddb", - "prev": "87ab653d-e804-470f-bde9-c67caaa0f354", - "regions": { - "22e8be98-c254-4d04-b0e4-b9b5ae46eefe": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "bc70f780-c240-4779-96f3-bc5ef9a37d59", - "part": "whole" - }, - "id": "22e8be98-c254-4d04-b0e4-b9b5ae46eefe" - } - } - }, - "fe5a0732-69f5-462a-8af6-851f84a9fdec": { - "id": "fe5a0732-69f5-462a-8af6-851f84a9fdec", - "prev": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e", - "regions": { - "671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b": { - "attrs": { - "height": 0.8, - "width": 0.8, - "x": 0.1, - "y": 0.1 - }, - "content": { - "cell": "fd46e2ab-4ab6-4001-b536-1f323525d7d3", - "part": "whole" - }, - "id": "671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b" - } - } - } - }, - "themes": {} - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/04.distributed-tensorflow-with-horovod/04.distributed-tensorflow-with-horovod.ipynb b/training/04.distributed-tensorflow-with-horovod/04.distributed-tensorflow-with-horovod.ipynb index a360ba52..891e4a4d 100644 --- a/training/04.distributed-tensorflow-with-horovod/04.distributed-tensorflow-with-horovod.ipynb +++ b/training/04.distributed-tensorflow-with-horovod/04.distributed-tensorflow-with-horovod.ipynb @@ -1,360 +1,360 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 04. Distributed Tensorflow with Horovod\n", + "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", + "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload data to datastore\n", + "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", + "\n", + "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib\n", + "\n", + "os.makedirs('./data', exist_ok=True)\n", + "download_url = 'http://mattmahoney.net/dc/text8.zip'\n", + "urllib.request.urlretrieve(download_url, filename='./data/text8.zip')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore. The below code will upload the contents of the data directory to the path `./data` on the default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)\n", + "\n", + "ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "path_on_datastore = 'data/text8.zip'\n", + "ds_data = ds.path(path_on_datastore)\n", + "print(ds_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "project_folder = './tf-distr-hvd'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `tf_horovod_word2vec.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('tf_horovod_word2vec.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'tf-distr-hvd'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a TensorFlow estimator\n", + "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params={\n", + " '--input_data': ds_data\n", + "}\n", + "\n", + "estimator= TensorFlow(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " script_params=script_params,\n", + " entry_script='tf_horovod_word2vec.py',\n", + " node_count=2,\n", + " process_count_per_node=1,\n", + " distributed_backend='mpi',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", + "\n", + "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 04. Distributed Tensorflow with Horovod\n", - "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data to datastore\n", - "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", - "\n", - "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data', exist_ok=True)\n", - "download_url = 'http://mattmahoney.net/dc/text8.zip'\n", - "urllib.request.urlretrieve(download_url, filename='./data/text8.zip')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore. The below code will upload the contents of the data directory to the path `./data` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)\n", - "\n", - "ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'data/text8.zip'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "project_folder = './tf-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_horovod_word2vec.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('tf_horovod_word2vec.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--input_data': ds_data\n", - "}\n", - "\n", - "estimator= TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_horovod_word2vec.py',\n", - " node_count=2,\n", - " process_count_per_node=1,\n", - " distributed_backend='mpi',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", - "\n", - "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/05.distributed-tensorflow-with-parameter-server/05.distributed-tensorflow-with-parameter-server.ipynb b/training/05.distributed-tensorflow-with-parameter-server/05.distributed-tensorflow-with-parameter-server.ipynb index a77eaf78..796ab6bb 100644 --- a/training/05.distributed-tensorflow-with-parameter-server/05.distributed-tensorflow-with-parameter-server.ipynb +++ b/training/05.distributed-tensorflow-with-parameter-server/05.distributed-tensorflow-with-parameter-server.ipynb @@ -1,286 +1,286 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 05. Distributed TensorFlow with parameter server\n", + "In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", + "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)\n", + "* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the cluster ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './tf-distr-ps'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `tf_mnist_replica.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('tf_mnist_replica.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'tf-distr-ps'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a TensorFlow estimator\n", + "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params={\n", + " '--num_gpus': 1\n", + "}\n", + "\n", + "estimator = TensorFlow(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " script_params=script_params,\n", + " entry_script='tf_mnist_replica.py',\n", + " node_count=2,\n", + " worker_count=2,\n", + " parameter_server_count=1, \n", + " distributed_backend='ps',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True) # this provides a verbose log" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "minxia" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 05. Distributed TensorFlow with parameter server\n", - "In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './tf-distr-ps'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_mnist_replica.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('tf_mnist_replica.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-ps'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--num_gpus': 1\n", - "}\n", - "\n", - "estimator = TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_replica.py',\n", - " node_count=2,\n", - " worker_count=2,\n", - " parameter_server_count=1, \n", - " distributed_backend='ps',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "minxia" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/06.distributed-cntk-with-custom-docker/06.distributed-cntk-with-custom-docker.ipynb b/training/06.distributed-cntk-with-custom-docker/06.distributed-cntk-with-custom-docker.ipynb index 4769985a..1a8cf239 100644 --- a/training/06.distributed-cntk-with-custom-docker/06.distributed-cntk-with-custom-docker.ipynb +++ b/training/06.distributed-cntk-with-custom-docker/06.distributed-cntk-with-custom-docker.ipynb @@ -1,364 +1,364 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 06. Distributed CNTK using custom docker images\n", + "In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* Go through the [00.configuration.ipynb]() notebook to:\n", + " * install the AML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize workspace\n", + "\n", + "Initialize a [Workspace](https://review.docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture?branch=release-ignite-aml#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a remote compute target\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", + "\n", + "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpucluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " autoscale_enabled=True,\n", + " cluster_min_nodes=0, \n", + " cluster_max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + " # Use the 'status' property to get a detailed status for the current cluster. \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload training data\n", + "For this tutorial, we will be using the MNIST dataset.\n", + "\n", + "First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import install_mnist\n", + "\n", + "install_mnist.main('mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", + "\n", + "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code will upload the training data to the path `./mnist` on the default datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds.upload(src_dir='./mnist', target_path='./mnist')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "path_on_datastore = 'mnist'\n", + "ds_data = ds.path(path_on_datastore)\n", + "print(ds_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train model on the remote compute\n", + "Now that we have the cluster ready to go, let's run our distributed training job." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a project directory\n", + "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './cntk-distr'\n", + "os.makedirs(project_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copy the training script `cntk_distr_mnist.py` into this project directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('cntk_distr_mnist.py', project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment\n", + "Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'cntk-distr'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an Estimator\n", + "The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.estimator import *\n", + "\n", + "script_params = {\n", + " '--num_epochs': 50,\n", + " '--data_dir': ds_data.as_mount(),\n", + " '--output_dir': './outputs'\n", + "}\n", + "\n", + "estimator = Estimator(source_directory=project_folder,\n", + " compute_target=compute_target,\n", + " entry_script='cntk_distr_mnist.py',\n", + " script_params=script_params,\n", + " node_count=2,\n", + " process_count_per_node=1,\n", + " distributed_backend='mpi', \n", + " pip_packages=['cntk-gpu==2.6'],\n", + " custom_docker_base_image='microsoft/mmlspark:gpu-0.12',\n", + " use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_base_image`. You can only provide images available in public docker repositories such as Docker Hub using this argument. To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", + "\n", + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job\n", + "Run your experiment by submitting your estimator object. Note that this call is asynchronous." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = experiment.submit(estimator)\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can block until the script has completed training before running more code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 06. Distributed CNTK using custom docker images\n", - "In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* Go through the [00.configuration.ipynb]() notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://review.docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture?branch=release-ignite-aml#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a remote compute target\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) to execute your training script on. In this tutorial, you create an [Azure Batch AI](https://docs.microsoft.com/azure/batch-ai/overview) cluster as your training compute resource. This code creates a cluster for you if it does not already exist in your workspace.\n", - "\n", - "**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in your workspace this code will skip the cluster creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload training data\n", - "For this tutorial, we will be using the MNIST dataset.\n", - "\n", - "First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import install_mnist\n", - "\n", - "install_mnist.main('mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n", - "\n", - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following code will upload the training data to the path `./mnist` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./mnist', target_path='./mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'mnist'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './cntk-distr'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `cntk_distr_mnist.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('cntk_distr_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'cntk-distr'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an Estimator\n", - "The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.estimator import *\n", - "\n", - "script_params = {\n", - " '--num_epochs': 50,\n", - " '--data_dir': ds_data.as_mount(),\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = Estimator(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='cntk_distr_mnist.py',\n", - " script_params=script_params,\n", - " node_count=2,\n", - " process_count_per_node=1,\n", - " distributed_backend='mpi', \n", - " pip_packages=['cntk-gpu==2.6'],\n", - " custom_docker_base_image='microsoft/mmlspark:gpu-0.12',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_base_image`. You can only provide images available in public docker repositories such as Docker Hub using this argument. To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", - "\n", - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/07.tensorboard/07.tensorboard.ipynb b/training/07.tensorboard/07.tensorboard.ipynb index 55759993..cf0062e4 100644 --- a/training/07.tensorboard/07.tensorboard.ipynb +++ b/training/07.tensorboard/07.tensorboard.ipynb @@ -1,504 +1,504 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 40. Tensorboard Integration with Run History\n", + "\n", + "1. Run a Tensorflow job locally and view its TB output live.\n", + "2. The same, for a DSVM.\n", + "3. And once more, with Batch AI.\n", + "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set experiment name and create project\n", + "Choose a name for your run history container in the workspace, and create a folder for the project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from os import path, makedirs\n", + "experiment_name = 'tensorboard-demo'\n", + "\n", + "# experiment folder\n", + "exp_dir = './sample_projects/' + experiment_name\n", + "\n", + "if not path.exists(exp_dir):\n", + " makedirs(exp_dir)\n", + "\n", + "# runs we started in this session, for the finale\n", + "runs = []" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download Tensorflow Tensorboard demo code\n", + "\n", + "Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n", + "\n", + "Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import os\n", + "import tempfile\n", + "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", + "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", + " file.write(tf_code.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure and run locally\n", + "\n", + "We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n", + "\n", + "Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "\n", + "# Create a run configuration.\n", + "run_config = RunConfiguration()\n", + "run_config.environment.python.user_managed_dependencies = True\n", + "\n", + "# You can choose a specific Python environment by pointing to a Python path \n", + "#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment, Run\n", + "from azureml.core.script_run_config import ScriptRunConfig\n", + "import tensorflow as tf\n", + "\n", + "logs_dir = os.curdir + os.sep + \"logs\"\n", + "tensorflow_logs_dir = os.path.join(logs_dir, \"tensorflow\")\n", + "\n", + "if not path.exists(tensorflow_logs_dir):\n", + " makedirs(tensorflow_logs_dir)\n", + "\n", + "os.environ[\"TEST_TMPDIR\"] = logs_dir\n", + "\n", + "# Writing logs to ./logs results in their being uploaded to Artifact Service,\n", + "# and thus, made accessible to our Tensorboard instance.\n", + "arguments_list = [\"--log_dir\", logs_dir]\n", + "\n", + "# Create an experiment\n", + "exp = Experiment(ws, experiment_name)\n", + "\n", + "script = ScriptRunConfig(exp_dir,\n", + " script=\"mnist_with_summaries.py\",\n", + " run_config=run_config)\n", + "\n", + "# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n", + "# arguments_list += [\"--max_steps\", \"5000\"]\n", + "kwargs = {}\n", + "kwargs['arguments_list'] = arguments_list\n", + "run = exp.submit(script, kwargs)\n", + "# You can also wait for the run to complete\n", + "# run.wait_for_completion(show_output=True)\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard\n", + "\n", + "Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.tensorboard import Tensorboard\n", + "\n", + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Now, with a DSVM\n", + "\n", + "Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n", + "Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n", + "\n", + "If you are unfamiliar with DSVM configuration, check [04. Train in a remote VM (Ubuntu DSVM)](04.train-on-remote-vm.ipynb) for a more detailed breakdown." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import DsvmCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "compute_target_name = 'cpu-dsvm'\n", + "\n", + "try:\n", + " compute_target = DsvmCompute(workspace = ws, name = compute_target_name)\n", + " print('found existing:', compute_target.name)\n", + "except ComputeTargetException:\n", + " print('creating new.')\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", + " compute_target = DsvmCompute.create(ws, name = compute_target_name, provisioning_configuration = dsvm_config)\n", + " compute_target.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit run using TensorFlow estimator\n", + "\n", + "Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import TensorFlow\n", + "\n", + "script_params = {\"--log_dir\": \"./logs\"}\n", + "\n", + "# If you want the run to go longer, set --max-steps to a higher number.\n", + "# script_params[\"--max_steps\"] = \"5000\"\n", + "\n", + "tf_estimator = TensorFlow(source_directory=exp_dir,\n", + " compute_target=compute_target,\n", + " entry_script='mnist_with_summaries.py',\n", + " script_params=script_params)\n", + "\n", + "run = exp.submit(tf_estimator)\n", + "\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard with this run\n", + "\n", + "Just like before." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Once more, with a Batch AI cluster\n", + "\n", + "Just to prove we can, let's create a Batch AI cluster using MLC, and run our demo there, as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import BatchAiCompute\n", + "\n", + "clust_name = ws.name + \"cpu\"\n", + "\n", + "try:\n", + " # If you already have a cluster named this, we don't need to make a new one.\n", + " cts = ws.compute_targets() \n", + " compute_target = cts[clust_name]\n", + " assert compute_target.type == 'BatchAI'\n", + "except:\n", + " # Let's make a new one here.\n", + " provisioning_config = BatchAiCompute.provisioning_configuration(cluster_max_nodes=2, \n", + " autoscale_enabled=True, \n", + " cluster_min_nodes=1,\n", + " vm_size='Standard_D11_V2')\n", + " \n", + " compute_target = BatchAiCompute.create(ws, clust_name, provisioning_config)\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n", + "print(compute_target.name)\n", + " # For a more detailed view of current BatchAI cluster status, use the 'status' property \n", + " # print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Submit run using TensorFlow estimator\n", + "\n", + "Again, we can use the TensorFlow estimator and everything is set up automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "script_params = {\"--log_dir\": \"./logs\"}\n", + "\n", + "# If you want the run to go longer, set --max-steps to a higher number.\n", + "# script_params[\"--max_steps\"] = \"5000\"\n", + "\n", + "tf_estimator = TensorFlow(source_directory=exp_dir,\n", + " compute_target=compute_target,\n", + " entry_script='mnist_with_summaries.py',\n", + " script_params=script_params)\n", + "\n", + "run = exp.submit(tf_estimator)\n", + "\n", + "runs.append(run)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard with this run\n", + "\n", + "Once more..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([run])\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Finale\n", + "\n", + "If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The Tensorboard constructor takes an array of runs...\n", + "# and it turns out that we have been building one of those all along.\n", + "tb = Tensorboard(runs)\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 40. Tensorboard Integration with Run History\n", - "\n", - "1. Run a Tensorflow job locally and view its TB output live.\n", - "2. The same, for a DSVM.\n", - "3. And once more, with Batch AI.\n", - "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name and create project\n", - "Choose a name for your run history container in the workspace, and create a folder for the project." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from os import path, makedirs\n", - "experiment_name = 'tensorboard-demo'\n", - "\n", - "# experiment folder\n", - "exp_dir = './sample_projects/' + experiment_name\n", - "\n", - "if not path.exists(exp_dir):\n", - " makedirs(exp_dir)\n", - "\n", - "# runs we started in this session, for the finale\n", - "runs = []" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download Tensorflow Tensorboard demo code\n", - "\n", - "Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n", - "\n", - "Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import os\n", - "import tempfile\n", - "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", - "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", - " file.write(tf_code.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure and run locally\n", - "\n", - "We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n", - "\n", - "Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "\n", - "# Create a run configuration.\n", - "run_config = RunConfiguration()\n", - "run_config.environment.python.user_managed_dependencies = True\n", - "\n", - "# You can choose a specific Python environment by pointing to a Python path \n", - "#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment, Run\n", - "from azureml.core.script_run_config import ScriptRunConfig\n", - "import tensorflow as tf\n", - "\n", - "logs_dir = os.curdir + os.sep + \"logs\"\n", - "tensorflow_logs_dir = os.path.join(logs_dir, \"tensorflow\")\n", - "\n", - "if not path.exists(tensorflow_logs_dir):\n", - " makedirs(tensorflow_logs_dir)\n", - "\n", - "os.environ[\"TEST_TMPDIR\"] = logs_dir\n", - "\n", - "# Writing logs to ./logs results in their being uploaded to Artifact Service,\n", - "# and thus, made accessible to our Tensorboard instance.\n", - "arguments_list = [\"--log_dir\", logs_dir]\n", - "\n", - "# Create an experiment\n", - "exp = Experiment(ws, experiment_name)\n", - "\n", - "script = ScriptRunConfig(exp_dir,\n", - " script=\"mnist_with_summaries.py\",\n", - " run_config=run_config)\n", - "\n", - "# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n", - "# arguments_list += [\"--max_steps\", \"5000\"]\n", - "kwargs = {}\n", - "kwargs['arguments_list'] = arguments_list\n", - "run = exp.submit(script, kwargs)\n", - "# You can also wait for the run to complete\n", - "# run.wait_for_completion(show_output=True)\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard\n", - "\n", - "Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.tensorboard import Tensorboard\n", - "\n", - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Now, with a DSVM\n", - "\n", - "Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n", - "Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n", - "\n", - "If you are unfamiliar with DSVM configuration, check [04. Train in a remote VM (Ubuntu DSVM)](04.train-on-remote-vm.ipynb) for a more detailed breakdown." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import DsvmCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "compute_target_name = 'cpu-dsvm'\n", - "\n", - "try:\n", - " compute_target = DsvmCompute(workspace = ws, name = compute_target_name)\n", - " print('found existing:', compute_target.name)\n", - "except ComputeTargetException:\n", - " print('creating new.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", - " compute_target = DsvmCompute.create(ws, name = compute_target_name, provisioning_configuration = dsvm_config)\n", - " compute_target.wait_for_completion(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=compute_target,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Just like before." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Once more, with a Batch AI cluster\n", - "\n", - "Just to prove we can, let's create a Batch AI cluster using MLC, and run our demo there, as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import BatchAiCompute\n", - "\n", - "clust_name = ws.name + \"cpu\"\n", - "\n", - "try:\n", - " # If you already have a cluster named this, we don't need to make a new one.\n", - " cts = ws.compute_targets() \n", - " compute_target = cts[clust_name]\n", - " assert compute_target.type == 'BatchAI'\n", - "except:\n", - " # Let's make a new one here.\n", - " provisioning_config = BatchAiCompute.provisioning_configuration(cluster_max_nodes=2, \n", - " autoscale_enabled=True, \n", - " cluster_min_nodes=1,\n", - " vm_size='Standard_D11_V2')\n", - " \n", - " compute_target = BatchAiCompute.create(ws, clust_name, provisioning_config)\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n", - "print(compute_target.name)\n", - " # For a more detailed view of current BatchAI cluster status, use the 'status' property \n", - " # print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Again, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=compute_target,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Once more..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Finale\n", - "\n", - "If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs...\n", - "# and it turns out that we have been building one of those all along.\n", - "tb = Tensorboard(runs)\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/training/08.export-run-history-to-tensorboard/08.export-run-history-to-tensorboard.ipynb b/training/08.export-run-history-to-tensorboard/08.export-run-history-to-tensorboard.ipynb index 410105a2..fbeaf55b 100644 --- a/training/08.export-run-history-to-tensorboard/08.export-run-history-to-tensorboard.ipynb +++ b/training/08.export-run-history-to-tensorboard/08.export-run-history-to-tensorboard.ipynb @@ -1,243 +1,243 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 41. Export Run History as Tensorboard logs\n", + "\n", + "1. Run some training and log some metrics into Run History\n", + "2. Export the run history to some directory as Tensorboard logs\n", + "3. Launch a local Tensorboard to view the run history" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace, Run, Experiment\n", + "\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set experiment name and start the run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment_name = 'export-to-tensorboard'\n", + "exp = Experiment(ws, experiment_name)\n", + "root_run = exp.start_logging()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True)\n", + "\n", + "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", + "data = {\n", + " \"train\":{\"x\":x_train, \"y\":y_train}, \n", + " \"test\":{\"x\":x_test, \"y\":y_test}\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example experiment\n", + "from tqdm import tqdm\n", + "\n", + "alphas = [.1, .2, .3, .4, .5, .6 , .7]\n", + "\n", + "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", + "for alpha in tqdm(alphas):\n", + " # create a bunch of child runs\n", + " with root_run.child_run(\"alpha\" + str(alpha)) as run:\n", + " # More data science stuff\n", + " reg = Ridge(alpha=alpha)\n", + " reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n", + " # TODO save model\n", + " preds = reg.predict(data[\"test\"][\"x\"])\n", + " mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n", + " # End train and eval\n", + "\n", + " # log alpha, mean_squared_error and feature names in run history\n", + " root_run.log(\"alpha\", alpha)\n", + " root_run.log(\"mse\", mse)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Export Run History to Tensorboard logs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Export Run History to Tensorboard logs\n", + "from azureml.contrib.tensorboard.export import export_to_tensorboard\n", + "import os\n", + "import tensorflow as tf\n", + "\n", + "logdir = 'exportedTBlogs'\n", + "log_path = os.path.join(os.getcwd(), logdir)\n", + "try:\n", + " os.stat(log_path)\n", + "except os.error:\n", + " os.mkdir(log_path)\n", + "print(logdir)\n", + "\n", + "# export run history for the project\n", + "export_to_tensorboard(root_run, logdir)\n", + "\n", + "# or export a particular run\n", + "# export_to_tensorboard(run, logdir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "root_run.complete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Tensorboard\n", + "\n", + "Or you can start the Tensorboard outside this notebook to view the result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.contrib.tensorboard import Tensorboard\n", + "\n", + "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", + "tb = Tensorboard([], local_root=logdir, port=6006)\n", + "\n", + "# If successful, start() returns a string with the URI of the instance.\n", + "tb.start()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stop Tensorboard\n", + "\n", + "When you're done, make sure to call the `stop()` method of the Tensorboard object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tb.stop()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 41. Export Run History as Tensorboard logs\n", - "\n", - "1. Run some training and log some metrics into Run History\n", - "2. Export the run history to some directory as Tensorboard logs\n", - "3. Launch a local Tensorboard to view the run history" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace, Run, Experiment\n", - "\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name and start the run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'export-to-tensorboard'\n", - "exp = Experiment(ws, experiment_name)\n", - "root_run = exp.start_logging()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n", - "from sklearn.datasets import load_diabetes\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.metrics import mean_squared_error\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "X, y = load_diabetes(return_X_y=True)\n", - "\n", - "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n", - "\n", - "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n", - "data = {\n", - " \"train\":{\"x\":x_train, \"y\":y_train}, \n", - " \"test\":{\"x\":x_test, \"y\":y_test}\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Example experiment\n", - "from tqdm import tqdm\n", - "\n", - "alphas = [.1, .2, .3, .4, .5, .6 , .7]\n", - "\n", - "# try a bunch of alpha values in a Linear Regression (Ridge) model\n", - "for alpha in tqdm(alphas):\n", - " # create a bunch of child runs\n", - " with root_run.child_run(\"alpha\" + str(alpha)) as run:\n", - " # More data science stuff\n", - " reg = Ridge(alpha=alpha)\n", - " reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n", - " # TODO save model\n", - " preds = reg.predict(data[\"test\"][\"x\"])\n", - " mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n", - " # End train and eval\n", - "\n", - " # log alpha, mean_squared_error and feature names in run history\n", - " root_run.log(\"alpha\", alpha)\n", - " root_run.log(\"mse\", mse)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Export Run History to Tensorboard logs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Export Run History to Tensorboard logs\n", - "from azureml.contrib.tensorboard.export import export_to_tensorboard\n", - "import os\n", - "import tensorflow as tf\n", - "\n", - "logdir = 'exportedTBlogs'\n", - "log_path = os.path.join(os.getcwd(), logdir)\n", - "try:\n", - " os.stat(log_path)\n", - "except os.error:\n", - " os.mkdir(log_path)\n", - "print(logdir)\n", - "\n", - "# export run history for the project\n", - "export_to_tensorboard(root_run, logdir)\n", - "\n", - "# or export a particular run\n", - "# export_to_tensorboard(run, logdir)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "root_run.complete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard\n", - "\n", - "Or you can start the Tensorboard outside this notebook to view the result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.contrib.tensorboard import Tensorboard\n", - "\n", - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([], local_root=logdir, port=6006)\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/tutorials/01.train-models.ipynb b/tutorials/01.train-models.ipynb index 41041dff..f459daaa 100644 --- a/tutorials/01.train-models.ipynb +++ b/tutorials/01.train-models.ipynb @@ -1,699 +1,721 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial #1: Train an image classification model with Azure Machine Learning\n", + "\n", + "In this tutorial, you train a machine learning model both locally and on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook. You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**. \n", + "\n", + "This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n", + "\n", + "Learn how to:\n", + "\n", + "> * Set up your development environment\n", + "> * Access and examine the data\n", + "> * Train a simple logistic regression model locally using the popular scikit-learn machine learning library \n", + "> * Train multiple models on a remote cluster\n", + "> * Review training results, find and register the best model\n", + "\n", + "You'll learn how to select a model and deploy it in [part two of this tutorial](deploy-models.ipynb) later. \n", + "\n", + "## Prerequisites\n", + "\n", + "Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n", + "* Create a workspace and its configuration file (**config.json**) \n", + "* Save your **config.json** to the same folder as this notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up your development environment\n", + "\n", + "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", + "\n", + "* Importing Python packages\n", + "* Connecting to a workspace to enable communication between your local computer and remote resources\n", + "* Creating an experiment to track all your runs\n", + "* Creating a remote compute target to use for training\n", + "\n", + "### Import packages\n", + "\n", + "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "check version" + ] + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import azureml\n", + "from azureml.core import Workspace, Run\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to workspace\n", + "\n", + "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "load workspace" + ] + }, + "outputs": [], + "source": [ + "# load workspace configuration from the config.json file in the current folder.\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create experiment\n", + "\n", + "Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create experiment" + ] + }, + "outputs": [], + "source": [ + "experiment_name = 'sklearn-mnist'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create remote compute target\n", + "\n", + "Azure Azure ML Managed Compute is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Managed Compute cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace. \n", + "\n", + " **Creation of the cluster takes approximately 5 minutes.** If the cluster is already in the workspace this code uses it and skips the creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create mlc", + "batchai" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.compute import BatchAiCompute\n", + "from azureml.core.compute import ComputeTarget\n", + "import os\n", + "\n", + "# choose a name for your cluster\n", + "batchai_cluster_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", ws.name + \"gpu\")\n", + "cluster_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 1)\n", + "cluster_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 3)\n", + "vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_NC6\")\n", + "autoscale_enabled = os.environ.get(\"BATCHAI_CLUSTER_AUTOSCALE_ENABLED\", True)\n", + "\n", + "\n", + "if batchai_cluster_name in ws.compute_targets():\n", + " compute_target = ws.compute_targets()[batchai_cluster_name]\n", + " if compute_target and type(compute_target) is BatchAiCompute:\n", + " print('found compute target. just use it. ' + batchai_cluster_name)\n", + "else:\n", + " print('creating a new compute target...')\n", + " provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n", + " vm_priority = 'lowpriority', # optional\n", + " autoscale_enabled = autoscale_enabled,\n", + " cluster_min_nodes = cluster_min_nodes, \n", + " cluster_max_nodes = cluster_max_nodes)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n", + " \n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it will use the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + " # For a more detailed view of current BatchAI cluster status, use the 'status' property \n", + " print(compute_target.status.serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You now have the necessary packages and compute resources to train a model in the cloud. \n", + "\n", + "## Explore data\n", + "\n", + "Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n", + "\n", + "* Download the MNIST dataset\n", + "* Display some sample images\n", + "* Upload data to the cloud\n", + "\n", + "### Download the MNIST dataset\n", + "\n", + "Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib.request\n", + "\n", + "os.makedirs('./data', exist_ok = True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/train-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/train-labels.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display some sample images\n", + "\n", + "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# make sure utils.py is in the same directory as this code\n", + "from utils import load_data\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", + "X_train = load_data('./data/train-images.gz', False) / 255.0\n", + "y_train = load_data('./data/train-labels.gz', True).reshape(-1)\n", + "\n", + "X_test = load_data('./data/test-images.gz', False) / 255.0\n", + "y_test = load_data('./data/test-labels.gz', True).reshape(-1)\n", + "\n", + "# now let's show some randomly chosen images from the traininng set.\n", + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you have an idea of what these images look like and the expected prediction outcome.\n", + "\n", + "### Upload data to the cloud\n", + "\n", + "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", + "\n", + "The MNIST files are uploaded into a directory named `mnist` at the root of the datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "use datastore" + ] + }, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)\n", + "\n", + "ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You now have everything you need to start training a model. \n", + "\n", + "## Train a local model\n", + "\n", + "Train a simple logistic regression model using scikit-learn locally.\n", + "\n", + "**Training locally can take a minute or two** depending on your computer configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "clf = LogisticRegression()\n", + "clf.fit(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, make predictions using the test set and calculate the accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y_hat = clf.predict(X_test)\n", + "print(np.average(y_hat == y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With just a few lines of code, you have a 92% accuracy.\n", + "\n", + "## Train on a remote cluster\n", + "\n", + "Now you can expand on this simple model by building a model with a different regularization rate. This time you'll train the model on a remote resource. \n", + "\n", + "For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n", + "* Create a directory\n", + "* Create a training script\n", + "* Create an estimator object\n", + "* Submit the job \n", + "\n", + "### Create a directory\n", + "\n", + "Create a directory to deliver the necessary code from your computer to the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "script_folder = './sklearn-mnist'\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a training script\n", + "\n", + "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $script_folder/train.py\n", + "\n", + "import argparse\n", + "import os\n", + "import numpy as np\n", + "\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.externals import joblib\n", + "\n", + "from azureml.core import Run\n", + "from utils import load_data\n", + "\n", + "# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model\n", + "parser = argparse.ArgumentParser()\n", + "parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')\n", + "parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n", + "args = parser.parse_args()\n", + "\n", + "data_folder = os.path.join(args.data_folder, 'mnist')\n", + "print('Data folder:', data_folder)\n", + "\n", + "# load train and test set into numpy arrays\n", + "# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n", + "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", + "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", + "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", + "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", + "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n", + "\n", + "# get hold of the current run\n", + "run = Run.get_submitted_run()\n", + "\n", + "print('Train a logistic regression model with regularizaion rate of', args.reg)\n", + "clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n", + "clf.fit(X_train, y_train)\n", + "\n", + "print('Predict the test set')\n", + "y_hat = clf.predict(X_test)\n", + "\n", + "# calculate accuracy on the prediction\n", + "acc = np.average(y_hat == y_test)\n", + "print('Accuracy is', acc)\n", + "\n", + "run.log('regularization rate', np.float(args.reg))\n", + "run.log('accuracy', np.float(acc))\n", + "\n", + "os.makedirs('outputs', exist_ok=True)\n", + "# note file saved in the outputs folder is automatically uploaded into experiment record\n", + "joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice how the script gets data and saves models:\n", + "\n", + "+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the datastore for this argument:\n", + "`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "+ The training script saves your model into a directory named outputs.
\n", + "`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`
\n", + "Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('utils.py', script_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an estimator\n", + "\n", + "An estimator object is used to submit the run. Create your estimator by running the following code to define:\n", + "\n", + "* The name of the estimator object, `est`\n", + "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", + "* The compute target. In this case you will use the Batch AI cluster you created\n", + "* The training script name, train.py\n", + "* Parameters required from the training script \n", + "* Python packages needed for training\n", + "\n", + "In this tutorial, this target is the Batch AI cluster. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure estimator" + ] + }, + "outputs": [], + "source": [ + "from azureml.train.estimator import Estimator\n", + "\n", + "script_params = {\n", + " '--data-folder': ds.as_mount(),\n", + " '--regularization': 0.8\n", + "}\n", + "\n", + "est = Estimator(source_directory=script_folder,\n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " entry_script='train.py',\n", + " conda_packages=['scikit-learn'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the job to the cluster\n", + "\n", + "Run the experiment by submitting the estimator object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "batchai", + "scikit-learn" + ] + }, + "outputs": [], + "source": [ + "run = exp.submit(config=est)\n", + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n", + "\n", + "## Monitor a remote run\n", + "\n", + "In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n", + "\n", + "Here is what's happening while you wait:\n", + "\n", + "- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n", + "\n", + " This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n", + "\n", + "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n", + "\n", + "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n", + "\n", + "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n", + "\n", + "\n", + "You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n", + "\n", + "### Jupyter widget\n", + "\n", + "Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "use notebook widget" + ] + }, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get log results upon completion\n", + "\n", + "Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "batchai", + "scikit-learn" + ] + }, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=False) # specify True for a verbose log" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display run results\n", + "\n", + "You now have a model trained on a remote cluster. Retrieve the accuracy of the model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get metrics" + ] + }, + "outputs": [], + "source": [ + "print(run.get_metrics())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the next tutorial you will explore this model in more detail.\n", + "\n", + "## Register model\n", + "\n", + "The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n", + "\n", + "You can see files associated with that run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "print(run.get_file_names())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from history" + ] + }, + "outputs": [], + "source": [ + "# register model \n", + "model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n", + "print(model.name, model.id, model.version, sep = '\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "In this Azure Machine Learning tutorial, you used Python to:\n", + "\n", + "> * Set up your development environment\n", + "> * Access and examine the data\n", + "> * Train a simple logistic regression locally using the popular scikit-learn machine learning library\n", + "> * Train multiple models on a remote cluster\n", + "> * Review training details and register the best model\n", + "\n", + "You are ready to deploy this registered model using the instructions in the next part of the tutorial series:\n", + "\n", + "> [Tutorial 2 - Deploy models](02.deploy-models.ipynb)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "sgilley" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial #1: Train an image classification model with Azure Machine Learning\n", - "\n", - "In this tutorial, you train a machine learning model both locally and on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook. You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**. \n", - "\n", - "This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n", - "\n", - "Learn how to:\n", - "\n", - "> * Set up your development environment\n", - "> * Access and examine the data\n", - "> * Train a simple logistic regression model locally using the popular scikit-learn machine learning library \n", - "> * Train multiple models on a remote cluster\n", - "> * Review training results, find and register the best model\n", - "\n", - "You'll learn how to select a model and deploy it in [part two of this tutorial](deploy-models.ipynb) later. \n", - "\n", - "## Prerequisites\n", - "\n", - "Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n", - "* Create a workspace and its configuration file (**config.json**) \n", - "* Save your **config.json** to the same folder as this notebook" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up your development environment\n", - "\n", - "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", - "\n", - "* Importing Python packages\n", - "* Connecting to a workspace to enable communication between your local computer and remote resources\n", - "* Creating an experiment to track all your runs\n", - "* Creating a remote compute target to use for training\n", - "\n", - "### Import packages\n", - "\n", - "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt\n", - "\n", - "import azureml\n", - "from azureml.core import Workspace, Run\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to workspace\n", - "\n", - "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load workspace configuration from the config.json file in the current folder.\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create experiment\n", - "\n", - "Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = 'sklearn-mnist'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create remote compute target\n", - "\n", - "Azure Azure ML Managed Compute is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Managed Compute cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace. \n", - "\n", - " **Creation of the cluster takes approximately 5 minutes.** If the cluster is already in the workspace this code uses it and skips the creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create mlc", - "batchai" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, BatchAiCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "batchai_cluster_name = \"traincluster\"\n", - "\n", - "try:\n", - " # look for the existing cluster by name\n", - " compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n", - " if type(compute_target) is BatchAiCompute:\n", - " print('found compute target {}, just use it.'.format(batchai_cluster_name))\n", - " else:\n", - " print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n", - "except ComputeTargetException:\n", - " print('creating a new compute target...')\n", - " compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\", # small CPU-based VM\n", - " #vm_priority='lowpriority', # optional\n", - " autoscale_enabled=True,\n", - " cluster_min_nodes=0, \n", - " cluster_max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n", - " \n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - " \n", - " # Use the 'status' property to get a detailed status for the current cluster. \n", - " print(compute_target.status.serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You now have the necessary packages and compute resources to train a model in the cloud. \n", - "\n", - "## Explore data\n", - "\n", - "Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n", - "\n", - "* Download the MNIST dataset\n", - "* Display some sample images\n", - "* Upload data to the cloud\n", - "\n", - "### Download the MNIST dataset\n", - "\n", - "Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib.request\n", - "\n", - "os.makedirs('./data', exist_ok = True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename='./data/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename='./data/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Display some sample images\n", - "\n", - "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# make sure utils.py is in the same directory as this code\n", - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", - "X_train = load_data('./data/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/test-labels.gz', True).reshape(-1)\n", - "\n", - "# now let's show some randomly chosen images from the traininng set.\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you have an idea of what these images look like and the expected prediction outcome.\n", - "\n", - "### Upload data to the cloud\n", - "\n", - "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", - "\n", - "The MNIST files are uploaded into a directory named `mnist` at the root of the datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)\n", - "\n", - "ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You now have everything you need to start training a model. \n", - "\n", - "## Train a local model\n", - "\n", - "Train a simple logistic regression model using scikit-learn locally.\n", - "\n", - "**Training locally can take a minute or two** depending on your computer configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "clf = LogisticRegression()\n", - "clf.fit(X_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, make predictions using the test set and calculate the accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "y_hat = clf.predict(X_test)\n", - "print(np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With just a few lines of code, you have a 92% accuracy.\n", - "\n", - "## Train on a remote cluster\n", - "\n", - "Now you can expand on this simple model by building a model with a different regularization rate. This time you'll train the model on a remote resource. \n", - "\n", - "For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n", - "* Create a directory\n", - "* Create a training script\n", - "* Create an estimator object\n", - "* Submit the job \n", - "\n", - "### Create a directory\n", - "\n", - "Create a directory to deliver the necessary code from your computer to the remote resource." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "script_folder = './sklearn-mnist'\n", - "os.makedirs(script_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a training script\n", - "\n", - "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $script_folder/train.py\n", - "\n", - "import argparse\n", - "import os\n", - "import numpy as np\n", - "\n", - "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.externals import joblib\n", - "\n", - "from azureml.core import Run\n", - "from utils import load_data\n", - "\n", - "# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model\n", - "parser = argparse.ArgumentParser()\n", - "parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')\n", - "parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n", - "args = parser.parse_args()\n", - "\n", - "data_folder = os.path.join(args.data_folder, 'mnist')\n", - "print('Data folder:', data_folder)\n", - "\n", - "# load train and test set into numpy arrays\n", - "# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n", - "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", - "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", - "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", - "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", - "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n", - "\n", - "# get hold of the current run\n", - "run = Run.get_submitted_run()\n", - "\n", - "print('Train a logistic regression model with regularizaion rate of', args.reg)\n", - "clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n", - "clf.fit(X_train, y_train)\n", - "\n", - "print('Predict the test set')\n", - "y_hat = clf.predict(X_test)\n", - "\n", - "# calculate accuracy on the prediction\n", - "acc = np.average(y_hat == y_test)\n", - "print('Accuracy is', acc)\n", - "\n", - "run.log('regularization rate', np.float(args.reg))\n", - "run.log('accuracy', np.float(acc))\n", - "\n", - "os.makedirs('outputs', exist_ok=True)\n", - "# note file saved in the outputs folder is automatically uploaded into experiment record\n", - "joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice how the script gets data and saves models:\n", - "\n", - "+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the datastore for this argument:\n", - "`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "+ The training script saves your model into a directory named outputs.
\n", - "`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`
\n", - "Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an estimator\n", - "\n", - "An estimator object is used to submit the run. Create your estimator by running the following code to define:\n", - "\n", - "* The name of the estimator object, `est`\n", - "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", - "* The compute target. In this case you will use the Batch AI cluster you created\n", - "* The training script name, train.py\n", - "* Parameters required from the training script \n", - "* Python packages needed for training\n", - "\n", - "In this tutorial, this target is the Batch AI cluster. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.estimator import Estimator\n", - "\n", - "script_params = {\n", - " '--data-folder': ds.as_mount(),\n", - " '--regularization': 0.8\n", - "}\n", - "\n", - "est = Estimator(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='train.py',\n", - " conda_packages=['scikit-learn'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the job to the cluster\n", - "\n", - "Run the experiment by submitting the estimator object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(config=est)\n", - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n", - "\n", - "## Monitor a remote run\n", - "\n", - "In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n", - "\n", - "Here is what's happening while you wait:\n", - "\n", - "- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n", - "\n", - " This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n", - "\n", - "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n", - "\n", - "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n", - "\n", - "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n", - "\n", - "\n", - "You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n", - "\n", - "### Jupyter widget\n", - "\n", - "Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "use notebook widget" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get log results upon completion\n", - "\n", - "Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=False) # specify True for a verbose log" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Display run results\n", - "\n", - "You now have a model trained on a remote cluster. Retrieve the accuracy of the model:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "get metrics" - ] - }, - "outputs": [], - "source": [ - "print(run.get_metrics())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the next tutorial you will explore this model in more detail.\n", - "\n", - "## Register model\n", - "\n", - "The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n", - "\n", - "You can see files associated with that run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# register model \n", - "model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n", - "print(model.name, model.id, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up resources\n", - "\n", - "If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n", - "\n", - "You can also just delete the Azure Managed Compute cluster. But even if you don't delete it, since `autoscale_enabled` is set to `True`, and `cluster_min_nodes` is set to `0`, when the jobs are done, all cluster nodes will be shut down and you will not incur any additional compute charges. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# optionally, delete the Azure Managed Compute cluster\n", - "compute_target.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps\n", - "\n", - "In this Azure Machine Learning tutorial, you used Python to:\n", - "\n", - "> * Set up your development environment\n", - "> * Access and examine the data\n", - "> * Train a simple logistic regression locally using the popular scikit-learn machine learning library\n", - "> * Train multiple models on a remote cluster\n", - "> * Review training details and register the best model\n", - "\n", - "You are ready to deploy this registered model using the instructions in the next part of the tutorial series:\n", - "\n", - "> [Tutorial 2 - Deploy models](02.deploy-models.ipynb)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "sgilley" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/tutorials/02.deploy-models.ipynb b/tutorials/02.deploy-models.ipynb index 8c42cdf4..dbe31d17 100644 --- a/tutorials/02.deploy-models.ipynb +++ b/tutorials/02.deploy-models.ipynb @@ -1,610 +1,610 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n", + "\n", + "This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](01.train-models.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n", + "\n", + "Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n", + "\n", + "In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n", + "\n", + "> * Set up your testing environment\n", + "> * Retrieve the model from your workspace\n", + "> * Test the model locally\n", + "> * Deploy the model to ACI\n", + "> * Test the deployed model\n", + "\n", + "ACI is not ideal for production deployments, but it is great for testing and understanding the workflow. For scalable production deployments, consider using AKS.\n", + "\n", + "\n", + "## Prerequisites\n", + "\n", + "Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](01.train-models.ipynb) notebook. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "# If you did NOT complete the tutorial, you can instead run this cell \n", + "# This will register a model and download the data needed for this tutorial\n", + "# These prerequisites are created in the training tutorial\n", + "# Feel free to skip this cell if you completed the training tutorial \n", + "\n", + "# register a model\n", + "from azureml.core import Workspace\n", + "ws = Workspace.from_config()\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "model_name = \"sklearn_mnist\"\n", + "model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n", + " model_name=model_name,\n", + " tags={\"data\": \"mnist\", \"model\": \"classification\"},\n", + " description=\"Mnist handwriting recognition\",\n", + " workspace=ws)\n", + "\n", + "# download test data\n", + "import os\n", + "import urllib.request\n", + "\n", + "os.makedirs('./data', exist_ok=True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up the environment\n", + "\n", + "Start by setting up a testing environment.\n", + "\n", + "### Import packages\n", + "\n", + "Import the Python packages needed for this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "check version" + ] + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + " \n", + "import azureml\n", + "from azureml.core import Workspace, Run\n", + "\n", + "# display the core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the model\n", + "\n", + "You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "load workspace", + "download model" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.model import Model\n", + "\n", + "ws = Workspace.from_config()\n", + "model=Model(ws, 'sklearn_mnist')\n", + "model.download(target_dir = '.')\n", + "import os \n", + "# verify the downloaded model file\n", + "os.stat('./sklearn_mnist_model.pkl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test model locally\n", + "\n", + "Before deploying, make sure your model is working locally by:\n", + "* Loading test data\n", + "* Predicting test data\n", + "* Examining the confusion matrix\n", + "\n", + "### Load test data\n", + "\n", + "Load the test data from the **./data/** directory created during the training tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import load_data\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", + "X_test = load_data('./data/test-images.gz', False) / 255.0\n", + "y_test = load_data('./data/test-labels.gz', True).reshape(-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Predict test data\n", + "\n", + "Feed the test dataset to the model to get predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "from sklearn.externals import joblib\n", + "\n", + "clf = joblib.load('./sklearn_mnist_model.pkl')\n", + "y_hat = clf.predict(X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Examine the confusion matrix\n", + "\n", + "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import confusion_matrix\n", + "\n", + "conf_mx = confusion_matrix(y_test, y_hat)\n", + "print(conf_mx)\n", + "print('Overall accuracy:', np.average(y_hat==y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# normalize the diagnal cells so that they don't overpower the rest of the cells when visualized\n", + "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", + "norm_conf_mx = conf_mx / row_sums\n", + "np.fill_diagonal(norm_conf_mx, 0)\n", + "\n", + "fig = plt.figure(figsize=(8,5))\n", + "ax = fig.add_subplot(111)\n", + "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", + "ticks = np.arange(0, 10, 1)\n", + "ax.set_xticks(ticks)\n", + "ax.set_yticks(ticks)\n", + "ax.set_xticklabels(ticks)\n", + "ax.set_yticklabels(ticks)\n", + "fig.colorbar(cax)\n", + "plt.ylabel('true labels', fontsize=14)\n", + "plt.xlabel('predicted values', fontsize=14)\n", + "plt.savefig('conf.png')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy as web service\n", + "\n", + "Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n", + "\n", + "To build the correct environment for ACI, provide the following:\n", + "* A scoring script to show how to use the model\n", + "* An environment file to show what packages need to be installed\n", + "* A configuration file to build the ACI\n", + "* The model you trained before\n", + "\n", + "### Create scoring script\n", + "\n", + "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", + "\n", + "You must include two required functions into the scoring script:\n", + "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", + "\n", + "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import os\n", + "import pickle\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # retreive the path to the model file using the model name\n", + " model_path = Model.get_model_path('sklearn_mnist')\n", + " model = joblib.load(model_path)\n", + "\n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " y_hat = model.predict(data)\n", + " return json.dumps(y_hat.tolist())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create environment file\n", + "\n", + "Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "set conda dependencies" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies()\n", + "myenv.add_conda_package(\"scikit-learn\")\n", + "myenv.add_pip_package(\"pynacl==1.2.1\")\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Review the content of the `myenv.yml` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"myenv.yml\",\"r\") as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create configuration file\n", + "\n", + "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure web service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", + " description='Predict MNIST with sklearn')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy in ACI\n", + "Estimated time to complete: **about 7-8 minutes**\n", + "\n", + "Configure the image and deploy. The following code goes through these steps:\n", + "\n", + "1. Build an image using:\n", + " * The scoring file (`score.py`)\n", + " * The environment file (`myenv.yml`)\n", + " * The model file\n", + "1. Register that image under the workspace. \n", + "1. Send the image to the ACI container.\n", + "1. Start up a container in ACI using the image.\n", + "1. Get the web service HTTP endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure image", + "create image", + "deploy web service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.image import ContainerImage\n", + "\n", + "# configure the image\n", + "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", + " runtime=\"python\", \n", + " conda_file=\"myenv.yml\")\n", + "\n", + "service = Webservice.deploy_from_model(workspace=ws,\n", + " name='sklearn-mnist-svc',\n", + " deployment_config=aciconfig,\n", + " models=[model],\n", + " image_config=image_config)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get scoring uri" + ] + }, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test deployed service\n", + "\n", + "Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n", + "\n", + "The following code goes through these steps:\n", + "1. Send the data as a JSON array to the web service hosted in ACI. \n", + "\n", + "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n", + "\n", + "1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n", + "\n", + " Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "score web service" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "\n", + "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", + "test_samples = bytes(test_samples, encoding = 'utf8')\n", + "\n", + "# predict using the deployed model\n", + "result = json.loads(service.run(input_data=test_samples))\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n", + " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also send raw HTTP request to test the web service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "score web service" + ] + }, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "\n", + "# send a random row from the test set to score\n", + "random_index = np.random.randint(0, len(X_test)-1)\n", + "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", + "\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "# for AKS deployment you'd need to the service key in the header as well\n", + "# api_key = service.get_key()\n", + "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", + "\n", + "print(\"POST to url\", service.scoring_uri)\n", + "#print(\"input data:\", input_data)\n", + "print(\"label:\", y_test[random_index])\n", + "print(\"prediction:\", resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up resources\n", + "\n", + "To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "delete web service" + ] + }, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n", + "\n", + "\n", + "## Next steps\n", + "\n", + "In this Azure Machine Learning tutorial, you used Python to:\n", + "\n", + "> * Set up your testing environment\n", + "> * Retrieve the model from your workspace\n", + "> * Test the model locally\n", + "> * Deploy the model to ACI\n", + "> * Test the deployed model\n", + " \n", + "You can also try out the [Automatic algorithm selection tutorial](03.auto-train-models.ipynb) to see how Azure Machine Learning can auto-select and tune the best algorithm for your model and build that model for you." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "sgilley" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n", - "\n", - "This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](01.train-models.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n", - "\n", - "Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n", - "\n", - "In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n", - "\n", - "> * Set up your testing environment\n", - "> * Retrieve the model from your workspace\n", - "> * Test the model locally\n", - "> * Deploy the model to ACI\n", - "> * Test the deployed model\n", - "\n", - "ACI is not ideal for production deployments, but it is great for testing and understanding the workflow. For scalable production deployments, consider using AKS.\n", - "\n", - "\n", - "## Prerequisites\n", - "\n", - "Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](01.train-models.ipynb) notebook. \n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "# If you did NOT complete the tutorial, you can instead run this cell \n", - "# This will register a model and download the data needed for this tutorial\n", - "# These prerequisites are created in the training tutorial\n", - "# Feel free to skip this cell if you completed the training tutorial \n", - "\n", - "# register a model\n", - "from azureml.core import Workspace\n", - "ws = Workspace.from_config()\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "model_name = \"sklearn_mnist\"\n", - "model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n", - " model_name=model_name,\n", - " tags={\"data\": \"mnist\", \"model\": \"classification\"},\n", - " description=\"Mnist handwriting recognition\",\n", - " workspace=ws)\n", - "\n", - "# download test data\n", - "import os\n", - "import urllib.request\n", - "\n", - "os.makedirs('./data', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up the environment\n", - "\n", - "Start by setting up a testing environment.\n", - "\n", - "### Import packages\n", - "\n", - "Import the Python packages needed for this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "check version" - ] - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt\n", - " \n", - "import azureml\n", - "from azureml.core import Workspace, Run\n", - "\n", - "# display the core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the model\n", - "\n", - "You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "load workspace", - "download model" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.model import Model\n", - "\n", - "ws = Workspace.from_config()\n", - "model=Model(ws, 'sklearn_mnist')\n", - "model.download(target_dir = '.')\n", - "import os \n", - "# verify the downloaded model file\n", - "os.stat('./sklearn_mnist_model.pkl')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test model locally\n", - "\n", - "Before deploying, make sure your model is working locally by:\n", - "* Loading test data\n", - "* Predicting test data\n", - "* Examining the confusion matrix\n", - "\n", - "### Load test data\n", - "\n", - "Load the test data from the **./data/** directory created during the training tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", - "X_test = load_data('./data/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/test-labels.gz', True).reshape(-1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Predict test data\n", - "\n", - "Feed the test dataset to the model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pickle\n", - "from sklearn.externals import joblib\n", - "\n", - "clf = joblib.load('./sklearn_mnist_model.pkl')\n", - "y_hat = clf.predict(X_test)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Examine the confusion matrix\n", - "\n", - "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import confusion_matrix\n", - "\n", - "conf_mx = confusion_matrix(y_test, y_hat)\n", - "print(conf_mx)\n", - "print('Overall accuracy:', np.average(y_hat==y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# normalize the diagnal cells so that they don't overpower the rest of the cells when visualized\n", - "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", - "norm_conf_mx = conf_mx / row_sums\n", - "np.fill_diagonal(norm_conf_mx, 0)\n", - "\n", - "fig = plt.figure(figsize=(8,5))\n", - "ax = fig.add_subplot(111)\n", - "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", - "ticks = np.arange(0, 10, 1)\n", - "ax.set_xticks(ticks)\n", - "ax.set_yticks(ticks)\n", - "ax.set_xticklabels(ticks)\n", - "ax.set_yticklabels(ticks)\n", - "fig.colorbar(cax)\n", - "plt.ylabel('true labels', fontsize=14)\n", - "plt.xlabel('predicted values', fontsize=14)\n", - "plt.savefig('conf.png')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy as web service\n", - "\n", - "Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n", - "\n", - "To build the correct environment for ACI, provide the following:\n", - "* A scoring script to show how to use the model\n", - "* An environment file to show what packages need to be installed\n", - "* A configuration file to build the ACI\n", - "* The model you trained before\n", - "\n", - "### Create scoring script\n", - "\n", - "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", - "\n", - "You must include two required functions into the scoring script:\n", - "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", - "\n", - "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import pickle\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # retreive the path to the model file using the model name\n", - " model_path = Model.get_model_path('sklearn_mnist')\n", - " model = joblib.load(model_path)\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " y_hat = model.predict(data)\n", - " return json.dumps(y_hat.tolist())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create environment file\n", - "\n", - "Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "set conda dependencies" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies()\n", - "myenv.add_conda_package(\"scikit-learn\")\n", - "myenv.add_pip_package(\"pynacl==1.2.1\")\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Review the content of the `myenv.yml` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(\"myenv.yml\",\"r\") as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create configuration file\n", - "\n", - "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure web service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", - " description='Predict MNIST with sklearn')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy in ACI\n", - "Estimated time to complete: **about 7-8 minutes**\n", - "\n", - "Configure the image and deploy. The following code goes through these steps:\n", - "\n", - "1. Build an image using:\n", - " * The scoring file (`score.py`)\n", - " * The environment file (`myenv.yml`)\n", - " * The model file\n", - "1. Register that image under the workspace. \n", - "1. Send the image to the ACI container.\n", - "1. Start up a container in ACI using the image.\n", - "1. Get the web service HTTP endpoint." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure image", - "create image", - "deploy web service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "from azureml.core.image import ContainerImage\n", - "\n", - "# configure the image\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='sklearn-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=image_config)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "get scoring uri" - ] - }, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test deployed service\n", - "\n", - "Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n", - "\n", - "The following code goes through these steps:\n", - "1. Send the data as a JSON array to the web service hosted in ACI. \n", - "\n", - "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n", - "\n", - "1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n", - "\n", - " Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "score web service" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding = 'utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = json.loads(service.run(input_data=test_samples))\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also send raw HTTP request to test the web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "score web service" - ] - }, - "outputs": [], - "source": [ - "import requests\n", - "import json\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "# for AKS deployment you'd need to the service key in the header as well\n", - "# api_key = service.get_key()\n", - "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up resources\n", - "\n", - "To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "delete web service" - ] - }, - "outputs": [], - "source": [ - "service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n", - "\n", - "\n", - "## Next steps\n", - "\n", - "In this Azure Machine Learning tutorial, you used Python to:\n", - "\n", - "> * Set up your testing environment\n", - "> * Retrieve the model from your workspace\n", - "> * Test the model locally\n", - "> * Deploy the model to ACI\n", - "> * Test the deployed model\n", - " \n", - "You can also try out the [Automatic algorithm selection tutorial](03.auto-train-models.ipynb) to see how Azure Machine Learning can auto-select and tune the best algorithm for your model and build that model for you." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "sgilley" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/tutorials/03.auto-train-models.ipynb b/tutorials/03.auto-train-models.ipynb index ad45f30b..5bb6882c 100644 --- a/tutorials/03.auto-train-models.ipynb +++ b/tutorials/03.auto-train-models.ipynb @@ -1,403 +1,422 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: Train a classification model with automated machine learning\n", + "\n", + "In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform data preprocessing, algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n", + "\n", + "[flow diagram](./imgs/flow2.png)\n", + "\n", + "Similar to the [train models tutorial](01.train-models.ipynb), this tutorial classifies handwritten images of digits (0-9) from the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. But this time you don't to specify an algorithm or tune hyperparameters. The automated ML technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", + "\n", + "You'll learn how to:\n", + "\n", + "> * Set up your development environment\n", + "> * Access and examine the data\n", + "> * Train using an automated classifier locally with custom parameters\n", + "> * Explore the results\n", + "> * Review training results\n", + "> * Register the best model\n", + "\n", + "## Prerequisites\n", + "\n", + "Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n", + "* Create a workspace and its configuration file (**config.json**) \n", + "* Upload your **config.json** to the same folder as this notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start a notebook\n", + "\n", + "To follow along, start a new notebook from the same directory as **config.json** and copy the code from the sections below.\n", + "\n", + "\n", + "## Set up your development environment\n", + "\n", + "All the setup for your development work can be accomplished in the Python notebook. Setup includes:\n", + "\n", + "* Import Python packages\n", + "* Configure a workspace to enable communication between your local computer and remote resources\n", + "* Create a directory to store training scripts\n", + "\n", + "### Import packages\n", + "Import Python packages you need in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "import pandas as pd\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.train.automl.run import AutoMLRun\n", + "import time\n", + "import logging\n", + "from sklearn import datasets\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib.pyplot import imshow\n", + "import random\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure workspace\n", + "\n", + "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **aml_config/config.json** and loads the details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.\n", + "\n", + "Once you have a workspace object, specify a name for the experiment and create and register a local directory with the workspace. The history of all runs is recorded under the specified experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "# choose a name for the run history container in the workspace\n", + "experiment_name = 'automl-classifier'\n", + "# project folder\n", + "project_folder = './automl-classifier'\n", + "\n", + "import os\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Project Directory'] = project_folder\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.DataFrame(data=output, index=['']).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore data\n", + "\n", + "The initial training tutorial used a high-resolution version of the MNIST dataset (28x28 pixels). Since auto training requires many iterations, this tutorial uses a smaller resolution version of the images (8x8 pixels) to demonstrate the concepts while speeding up the time needed for each iteration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn import datasets\n", + "\n", + "digits = datasets.load_digits()\n", + "\n", + "# Exclude the first 100 rows from training so that they can be used for test.\n", + "X_digits = digits.data[100:,:]\n", + "y_digits = digits.target[100:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display some sample images\n", + "\n", + "Load the data into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n", + " plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You now have the necessary packages and data ready for auto training for your model. \n", + "\n", + "## Auto train a model \n", + "\n", + "To auto train a model, first define settings for autogeneration and tuning and then run the automatic classifier.\n", + "\n", + "\n", + "### Define settings for autogeneration and tuning\n", + "\n", + "Define the experiment parameters and models settings for autogeneration and tuning. \n", + "\n", + "\n", + "|Property| Value in this tutorial |Description|\n", + "|----|----|---|\n", + "|**primary_metric**|AUC Weighted | Metric that you want to optimize.|\n", + "|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n", + "|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n", + "|**n_cross_validations**|3|Number of cross validation splits|\n", + "|**preprocess**|False| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n", + "|**exit_score**|0.995|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n", + "|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure automl" + ] + }, + "outputs": [], + "source": [ + "from azureml.train.automl import AutoMLConfig\n", + "\n", + "##Local compute \n", + "Automl_config = AutoMLConfig(task = 'classification',\n", + " primary_metric = 'AUC_weighted',\n", + " max_time_sec = 12000,\n", + " iterations = 20,\n", + " n_cross_validations = 3,\n", + " preprocess = False,\n", + " exit_score = 0.9985,\n", + " blacklist_algos = ['kNN','LinearSVM'],\n", + " X = X_digits,\n", + " y = y_digits,\n", + " path=project_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run the automatic classifier\n", + "\n", + "Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "local submitted run", + "automl" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.experiment import Experiment\n", + "experiment=Experiment(ws, experiment_name)\n", + "local_run = experiment.submit(Automl_config, show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the results\n", + "\n", + "Explore the results of automatic training with a Jupyter widget or by examining the experiment history.\n", + "\n", + "### Jupyter widget\n", + "\n", + "Use the Jupyter notebook widget to see a graph and a table of all results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "use notebook widget" + ] + }, + "outputs": [], + "source": [ + "from azureml.train.widgets import RunDetails\n", + "RunDetails(local_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve all iterations\n", + "\n", + "View the experiment history and see individual metrics for each iteration run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get metrics", + "query history" + ] + }, + "outputs": [], + "source": [ + "children = list(local_run.get_children())\n", + "metricslist = {}\n", + "for run in children:\n", + " properties = run.get_properties()\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metricslist[int(properties['iteration'])] = metrics\n", + "\n", + "import pandas as pd\n", + "rundata = pd.DataFrame(metricslist).sort_index(1)\n", + "rundata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the best model \n", + "\n", + "Use the `local_run` object to get the best model and register it into the workspace. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history", + "register model from history" + ] + }, + "outputs": [], + "source": [ + "# find the run with the highest accuracy value.\n", + "best_run, fitted_model = local_run.get_output()\n", + "\n", + "# register model in workspace\n", + "description = 'Automated Machine Learning Model'\n", + "tags = None\n", + "local_run.register_model(description=description, tags=tags)\n", + "local_run.model_id # Use this id to deploy the model as a web service in Azure" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test the best model\n", + "\n", + "Use the model to predict a few random digits. Display the predicted and the image. Red font and inverse image (white on black) is used to highlight the misclassified samples.\n", + "\n", + "Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n", + "test_samples = X_digits[sample_indices]\n", + "\n", + "\n", + "# predict using the model\n", + "result = fitted_model.predict(test_samples)\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_digits[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n", + " plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "In this Azure Machine Learning tutorial, you used Python to:\n", + "\n", + "> * Set up your development environment\n", + "> * Access and examine the data\n", + "> * Train using an automated classifier locally with custom parameters\n", + "> * Explore the results\n", + "> * Review training results\n", + "> * Register the best model\n", + "\n", + "Learn more about [how to configure settings for automatic training]() or [how to use automatic training on a remote resource]()." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "sgilley" }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial: Train a classification model with automated machine learning\n", - "\n", - "In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform data preprocessing, algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n", - "\n", - "[flow diagram](./imgs/flow2.png)\n", - "\n", - "Similar to the [train models tutorial](01.train-models.ipynb), this tutorial classifies handwritten images of digits (0-9) from the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. But this time you don't to specify an algorithm or tune hyperparameters. The automated ML technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", - "\n", - "You'll learn how to:\n", - "\n", - "> * Set up your development environment\n", - "> * Access and examine the data\n", - "> * Train using an automated classifier locally with custom parameters\n", - "> * Explore the results\n", - "> * Review training results\n", - "> * Register the best model\n", - "\n", - "## Prerequisites\n", - "\n", - "Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n", - "* Create a workspace and its configuration file (**config.json**) \n", - "* Upload your **config.json** to the same folder as this notebook" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a notebook\n", - "\n", - "To follow along, start a new notebook from the same directory as **config.json** and copy the code from the sections below.\n", - "\n", - "\n", - "## Set up your development environment\n", - "\n", - "All the setup for your development work can be accomplished in the Python notebook. Setup includes:\n", - "\n", - "* Import Python packages\n", - "* Configure a workspace to enable communication between your local computer and remote resources\n", - "* Create a directory to store training scripts\n", - "\n", - "### Import packages\n", - "Import Python packages you need in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "import pandas as pd\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl.run import AutoMLRun\n", - "import time\n", - "import logging\n", - "from sklearn import datasets\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import random\n", - "import numpy as np" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure workspace\n", - "\n", - "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **aml_config/config.json** and loads the details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.\n", - "\n", - "Once you have a workspace object, specify a name for the experiment and create and register a local directory with the workspace. The history of all runs is recorded under the specified experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "# choose a name for the run history container in the workspace\n", - "experiment_name = 'automl-classifier'\n", - "# project folder\n", - "project_folder = './automl-classifier'\n", - "\n", - "import os\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data=output, index=['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore data\n", - "\n", - "The initial training tutorial used a high-resolution version of the MNIST dataset (28x28 pixels). Since auto training requires many iterations, this tutorial uses a smaller resolution version of the images (8x8 pixels) to demonstrate the concepts while speeding up the time needed for each iteration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn import datasets\n", - "\n", - "digits = datasets.load_digits()\n", - "\n", - "# only take the first 100 rows if you want the training steps to run faster\n", - "X_digits = digits.data[:100,:]\n", - "y_digits = digits.target[:100]\n", - "\n", - "# use full dataset\n", - "#X_digits = digits.data\n", - "#y_digits = digits.target" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Display some sample images\n", - "\n", - "Load the data into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n", - " plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You now have the necessary packages and data ready for auto training for your model. \n", - "\n", - "## Auto train a model \n", - "\n", - "To auto train a model, first define settings for autogeneration and tuning and then run the automatic classifier.\n", - "\n", - "\n", - "### Define settings for autogeneration and tuning\n", - "\n", - "Define the experiment parameters and models settings for autogeneration and tuning. \n", - "\n", - "\n", - "|Property| Value in this tutorial |Description|\n", - "|----|----|---|\n", - "|**primary_metric**|AUC Weighted | Metric that you want to optimize.|\n", - "|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n", - "|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n", - "|**n_cross_validations**|3|Number of cross validation splits|\n", - "|**preprocess**|False| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n", - "|**exit_score**|0.995|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n", - "|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl import AutoMLConfig\n", - "\n", - "##Local compute \n", - "Automl_config = AutoMLConfig(task = 'classification',\n", - " primary_metric = 'AUC_weighted',\n", - " max_time_sec = 12000,\n", - " iterations = 20,\n", - " n_cross_validations = 3,\n", - " preprocess = False,\n", - " exit_score = 0.995,\n", - " blacklist_algos = ['kNN','LinearSVM'],\n", - " X = X_digits,\n", - " y = y_digits,\n", - " path=project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run the automatic classifier\n", - "\n", - "Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.experiment import Experiment\n", - "experiment=Experiment(ws, experiment_name)\n", - "local_run = experiment.submit(Automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore the results\n", - "\n", - "Explore the results of automatic training with a Jupyter widget or by examining the experiment history.\n", - "\n", - "### Jupyter widget\n", - "\n", - "Use the Jupyter notebook widget to see a graph and a table of all results." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.widgets import RunDetails\n", - "RunDetails(local_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve all iterations\n", - "\n", - "View the experiment history and see individual metrics for each iteration run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "import pandas as pd\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register the best model \n", - "\n", - "Use the `local_run` object to get the best model and register it into the workspace. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find the run with the highest accuracy value.\n", - "best_run, fitted_model = local_run.get_output()\n", - "\n", - "# register model in workspace\n", - "description = 'Automated Machine Learning Model'\n", - "tags = None\n", - "local_run.register_model(description=description, tags=tags)\n", - "local_run.model_id # Use this id to deploy the model as a web service in Azure" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test the best model\n", - "\n", - "Use the model to predict a few random digits. Display the predicted and the image. Red font and inverse image (white on black) is used to highlight the misclassified samples.\n", - "\n", - "Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n", - "test_samples = X_digits[sample_indices]\n", - "\n", - "\n", - "# predict using the model\n", - "result = fitted_model.predict(test_samples)\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_digits[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n", - " plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps\n", - "\n", - "In this Azure Machine Learning tutorial, you used Python to:\n", - "\n", - "> * Set up your development environment\n", - "> * Access and examine the data\n", - "> * Train using an automated classifier locally with custom parameters\n", - "> * Explore the results\n", - "> * Review training results\n", - "> * Register the best model\n", - "\n", - "Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "sgilley" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file From b4df74c72ee3845216ecf635fcb8d735093dcf1a Mon Sep 17 00:00:00 2001 From: Roope Astala Date: Mon, 1 Oct 2018 13:47:58 -0400 Subject: [PATCH 2/2] adding nb 13 for app insights --- ...e-app-insights-in-production-service.ipynb | 410 ++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 01.getting-started/13.enable-app-insights/13.enable-app-insights-in-production-service.ipynb diff --git a/01.getting-started/13.enable-app-insights/13.enable-app-insights-in-production-service.ipynb b/01.getting-started/13.enable-app-insights/13.enable-app-insights-in-production-service.ipynb new file mode 100644 index 00000000..11ce5383 --- /dev/null +++ b/01.getting-started/13.enable-app-insights/13.enable-app-insights-in-production-service.ipynb @@ -0,0 +1,410 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Enabling App Insights for Services in Production\n", + "With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n", + "\n", + "\n", + "## What does Application Insights monitor?\n", + "It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n", + "\n", + "\n", + "## What is different compared to standard production deployment process?\n", + "If you want to enable generic App Insights for a service run:\n", + "```python\n", + "aks_service= Webservice(ws, \"aks-w-dc2\")\n", + "aks_service.update(enable_app_insights=True)```\n", + "Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n", + "\n", + "If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n", + "1. Update scoring file.\n", + "2. Update aks configuration.\n", + "3. Build new image and deploy it. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Import your dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace, Run\n", + "from azureml.core.compute import AksCompute, ComputeTarget\n", + "from azureml.core.webservice import Webservice, AksWebservice\n", + "from azureml.core.image import Image\n", + "from azureml.core.model import Model\n", + "\n", + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Set up your configuration and create a workspace\n", + "Follow Notebook 00 instructions to do this.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Register Model\n", + "Register an existing trained model, add descirption and tags." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", + " model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", + " description = \"Ridge regression model to predict diabetes\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. *Update your scoring file with custom print statements*\n", + "Here is an example:\n", + "### a. In your init function add:\n", + "```python\n", + "print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n", + "\n", + "### b. In your run function add:\n", + "```python\n", + "print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n", + "print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy \n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "from azureml.monitoring import ModelDataCollector\n", + "import time\n", + "\n", + "def init():\n", + " global model\n", + " #Print statement for appinsights custom traces:\n", + " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", + " \n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n", + " # this call should return the path to the model.pkl file on the local disk.\n", + " model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n", + " \n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + " \n", + " global inputs_dc, prediction_dc\n", + " \n", + " # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n", + " inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n", + " \n", + " # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n", + " prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n", + " \n", + "# note you can pass in multiple rows for scoring\n", + "def run(raw_data):\n", + " global inputs_dc, prediction_dc\n", + " try:\n", + " data = json.loads(raw_data)['data']\n", + " data = numpy.array(data)\n", + " result = model.predict(data)\n", + " \n", + " #Print statement for appinsights custom traces:\n", + " print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n", + " \n", + " #this call is saving our input data into our blob\n", + " inputs_dc.collect(data) \n", + " #this call is saving our prediction data into our blob\n", + " prediction_dc.collect(result)\n", + " \n", + " #Print statement for appinsights custom traces:\n", + " print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n", + " \n", + " return json.dumps({\"result\": result.tolist()})\n", + " \n", + " except Exception as e:\n", + " result = str(e)\n", + " print (result + time.strftime(\"%H:%M:%S\"))\n", + " return json.dumps({\"error\": result})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. *Create myenv.yml file*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Create your new Image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.image import ContainerImage\n", + "\n", + "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", + " runtime = \"python\",\n", + " conda_file = \"myenv.yml\",\n", + " description = \"Image with ridge regression model\",\n", + " tags = {'area': \"diabetes\", 'type': \"regression\"}\n", + " )\n", + "\n", + "image = ContainerImage.create(name = \"myimage1\",\n", + " # this is the model object\n", + " models = [model],\n", + " image_config = image_config,\n", + " workspace = ws)\n", + "\n", + "image.wait_for_creation(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Deploy to AKS service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create AKS compute if you haven't done so (Notebook 11)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the default configuration (can also provide parameters to customize)\n", + "prov_config = AksCompute.provisioning_configuration()\n", + "\n", + "aks_name = 'my-aks-test1' \n", + "# Create the cluster\n", + "aks_target = ComputeTarget.create(workspace = ws, \n", + " name = aks_name, \n", + " provisioning_configuration = prov_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_target.wait_for_completion(show_output = True)\n", + "print(aks_target.provisioning_state)\n", + "print(aks_target.provisioning_errors)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you already have a cluster you can attach the service to it:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python \n", + "%%time\n", + "resource_id = '/subscriptions//resourcegroups//providers/Microsoft.ContainerService/managedClusters/'\n", + "create_name= 'myaks4'\n", + "aks_target = AksCompute.attach(workspace = ws, \n", + " name = create_name, \n", + " #esource_id=resource_id)\n", + "## Wait for the operation to complete\n", + "aks_target.wait_for_provisioning(True)```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### a. *Activate App Insights through updating AKS Webservice configuration*\n", + "In order to enable App Insights in your service you will need to update your AKS configuration file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Set the web service configuration\n", + "aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Deploy your service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service_name ='aks-w-dc3'\n", + "\n", + "aks_service = Webservice.deploy_from_image(workspace = ws, \n", + " name = aks_service_name,\n", + " image = image,\n", + " deployment_config = aks_config,\n", + " deployment_target = aks_target\n", + " )\n", + "aks_service.wait_for_deployment(show_output = True)\n", + "print(aks_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Test your service " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "import json\n", + "\n", + "test_sample = json.dumps({'data': [\n", + " [1,28,13,45,54,6,57,8,8,10], \n", + " [101,9,8,37,6,45,4,3,2,41]\n", + "]})\n", + "test_sample = bytes(test_sample,encoding = 'utf8')\n", + "\n", + "prediction = aks_service.run(input_data = test_sample)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. See your service telemetry in App Insights\n", + "1. Go to the [Azure Portal](https://portal.azure.com/)\n", + "2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n", + "3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n", + "4. Click on the top banner \"Analytics\"\n", + "5. In the \"Schema\" section select \"traces\" and run your query.\n", + "6. Voila! All your custom traces should be there." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Disable App Insights" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "aks_service.update(enable_app_insights=False)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file