Files
MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb

587 lines
24 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and explain models remotely via Azure Machine Learning Compute and deploy model and scoring explainer\n",
"\n",
"\n",
"_**This notebook illustrates how to use the Azure Machine Learning Interpretability SDK to train and explain a classification model remotely on an Azure Machine Leanrning Compute Target (AMLCompute), and use Azure Container Instances (ACI) for deploying your model and its corresponding scoring explainer as a web service.**_\n",
"\n",
"Problem: IBM employee attrition classification with scikit-learn (train a model and run an explainer remotely via AMLCompute, and deploy model and its corresponding explainer.)\n",
"\n",
"---\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Run model explainer locally at training time](#Explain)\n",
" 1. Apply feature transformations\n",
" 1. Train a binary classification model\n",
" 1. Explain the model on raw features\n",
" 1. Generate global explanations\n",
" 1. Generate local explanations\n",
"1. [Visualize results](#Visualize)\n",
"1. [Deploy model and scoring explainer](#Deploy)\n",
"1. [Next steps](#Next)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"This notebook showcases how to train and explain a classification model remotely via Azure Machine Learning Compute (AMLCompute), download the calculated explanations locally for visualization and inspection, and deploy the final model and its corresponding explainer to Azure Container Instances (ACI).\n",
"It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations, and using Azure Machine Learning MLOps capabilities to deploy your model and its corresponding explainer.\n",
"\n",
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP) and follow these steps:\n",
"1.\tDevelop a machine learning script in Python which involves the training script and the explanation script.\n",
"2.\tCreate and configure a compute target.\n",
"3.\tSubmit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution (e.g., model, metrics, prediction explanations) are saved as runs in the workspace and grouped under experiments.\n",
"4.\tQuery the experiment for logged metrics and explanations from the current and past runs. Use the interpretability toolkit\u00e2\u20ac\u2122s visualization dashboard to visualize predictions and their explanation. If the metrics and explanations don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.\n",
"5.\tAfter a satisfactory run is found, create a scoring explainer and register the persisted model and its corresponding explainer in the model registry.\n",
"6.\tDevelop a scoring script.\n",
"7.\tCreate an image and register it in the image registry.\n",
"8.\tDeploy the image as a web service in Azure.\n",
"\n",
"| ![azure-machine-learning-cycle](./img/azure-machine-learning-cycle.png) |\n",
"|:--:|"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"Make sure you go through the [configuration notebook](../../../../configuration.ipynb) first if you haven't."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize a Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain\n",
"\n",
"Create An Experiment: **Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"experiment_name = 'explainer-remote-run-on-amlcompute'\n",
"experiment = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to AmlCompute\n",
"\n",
"Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n",
"\n",
"Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n",
"\n",
"For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n",
"\n",
"If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
"\n",
"\n",
"The training script `run_explainer.py` is already created for you. Let's have a look."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit an AmlCompute run\n",
"\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n",
"\n",
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"AmlCompute.supported_vmsizes(workspace=ws)\n",
"# AmlCompute.supported_vmsizes(workspace=ws, location='southcentralus')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create project directory\n",
"\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import shutil\n",
"\n",
"project_folder = './explainer-remote-run-on-amlcompute'\n",
"os.makedirs(project_folder, exist_ok=True)\n",
"shutil.copy('train_explain.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Provision a compute target\n",
"\n",
"You can provision an AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
"\n",
"* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n",
"* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"# Create a new runconfig object\n",
"run_config = RunConfiguration()\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# Set Docker base image to the default CPU-based image\n",
"run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n",
"\n",
"# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-telemetry', 'azureml-interpret'\n",
"]\n",
" \n",
"\n",
"\n",
"# Note: this is to pin the scikit-learn version to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# Specify CondaDependencies obj\n",
"# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"azureml_pip_packages.extend(['pyyaml', sklearn_dep, pandas_dep])\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=azureml_pip_packages)\n",
"# Now submit a run on AmlCompute\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory=project_folder,\n",
" script='train_explain.py',\n",
" run_config=run_config)\n",
"\n",
"run = experiment.submit(script_run_config)\n",
"\n",
"# Show run details\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
"# 'cpucluster' in this case but use a different VM family for instance.\n",
"\n",
"# cpu_cluster.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download Model Explanation, Model, and Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n",
"import joblib\n",
"original_model = Model(ws, 'amlcompute_deploy_model')\n",
"model_path = original_model.download(exist_ok=True)\n",
"original_svm_model = joblib.load(model_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve global explanation for visualization\n",
"from azureml.interpret import ExplanationClient\n",
"\n",
"# get model explanation data\n",
"client = ExplanationClient.from_run(run)\n",
"global_explanation = client.download_model_explanation()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve x_test for visualization\n",
"import joblib\n",
"x_test_path = './x_test.pkl'\n",
"run.download_file('x_test_ibm.pkl', output_file_path=x_test_path)\n",
"x_test = joblib.load(x_test_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize\n",
"Visualize the explanations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from interpret_community.widget import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, original_svm_model, datasetX=x_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy\n",
"Deploy Model and ScoringExplainer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"# WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-interpret'\n",
"]\n",
" \n",
"\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# Specify CondaDependencies obj\n",
"# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"azureml_pip_packages.extend(['pyyaml', sklearn_dep, pandas_dep])\n",
"myenv = CondaDependencies.create(pip_packages=azureml_pip_packages)\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())\n",
"\n",
"with open(\"myenv.yml\",\"r\") as f:\n",
" print(f.read())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve scoring explainer for deployment\n",
"scoring_explainer_model = Model(ws, 'IBM_attrition_explainer')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"from azureml.exceptions import WebserviceException\n",
"\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={\"data\": \"IBM_Attrition\", \n",
" \"method\" : \"local_explanation\"}, \n",
" description='Get local explanations for IBM Employee Attrition data')\n",
"\n",
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
"inference_config = InferenceConfig(entry_script=\"score_remote_explain.py\", environment=myenv)\n",
"\n",
"# Use configs and models generated above\n",
"service = Model.deploy(ws, 'model-scoring-service', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
"try:\n",
" service.wait_for_deployment(show_output=True)\n",
"except WebserviceException as e:\n",
" print(e.message)\n",
" print(service.get_logs())\n",
" raise"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"# Create data to test service with\n",
"examples = x_test[:4]\n",
"input_data = examples.to_json()\n",
"\n",
"headers = {'Content-Type':'application/json'}\n",
"\n",
"# Send request to service\n",
"print(\"POST to url\", service.scoring_uri)\n",
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n",
"# Can covert back to Python objects from json string if desired\n",
"print(\"prediction:\", resp.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n",
"1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
"1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)\n",
"1. [Inferencing time: deploy a locally-trained keras model and explainer](./train-explain-model-keras-locally-and-deploy.ipynb)"
]
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}