{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Explain tree-based models on GPU using GPUTreeExplainer\n", "\n", "\n", "_**This notebook illustrates how to use shap's GPUTreeExplainer on an Azure GPU machine.**_\n", "\n", "\n", "\n", "\n", "\n", "Problem: Train a tree-based model and explain the model on an Azure GPU machine using the GPUTreeExplainer.\n", "\n", "---\n", "\n", "## Table of Contents\n", "\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Run model explainer locally at training time](#Explain)\n", " 1. Apply feature transformations\n", " 1. Train a binary classification model\n", " 1. Explain the model on raw features\n", " 1. Generate global explanations\n", " 1. Generate local explanations\n", "1. [Visualize explanations](#Visualize)\n", "1. [Deploy model and scoring explainer](#Deploy)\n", "1. [Next steps](#Next)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This notebook demonstrates how to use the GPUTreeExplainer on some simple datasets. Like the TreeExplainer, the GPUTreeExplainer is specifically designed for tree-based machine learning models, but it is designed to accelerate the computations using NVIDIA GPUs.\n", "\n", "\n", "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n", "\n", "Notebook synopsis:\n", "\n", "1. Creating an Experiment in an existing Workspace\n", "2. Configuration and remote run with a GPU machine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "import os\n", "import shutil\n", "\n", "import pandas as pd\n", "\n", "import azureml.core\n", "from azureml.core.experiment import Experiment\n", "from azureml.core.workspace import Workspace\n", "from azureml.core.dataset import Dataset\n", "from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget\n", "from azureml.core.run import Run\n", "from azureml.core.model import Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"This notebook was created using version 1.37.0 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As part of the setup you have already created a Workspace. To run the script, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ws = Workspace.from_config()\n", "\n", "# Choose an experiment name.\n", "experiment_name = 'gpu-tree-explainer'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", "outputDf.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create project directory\n", "\n", "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import shutil\n", "\n", "project_folder = './azureml-shap-gpu-tree-explainer'\n", "os.makedirs(project_folder, exist_ok=True)\n", "shutil.copy('gpu_tree_explainer.py', project_folder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up a compute cluster\n", "This section uses a user-provided compute cluster (named \"gpu-shap-cluster\" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.compute import ComputeTarget, AmlCompute\n", "from azureml.core.compute_target import ComputeTargetException\n", "\n", "num_nodes = 1\n", "\n", "# Choose a name for your cluster.\n", "amlcompute_cluster_name = \"gpu-shap-cluster\"\n", "\n", "# Verify that cluster does not exist already\n", "try:\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", " print('Found existing cluster, use it.')\n", "except ComputeTargetException:\n", " compute_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", " # To use GPUTreeExplainer, select a GPU such as \"STANDARD_NC6\" \n", " # or similar GPU option\n", " # available in your workspace\n", " max_nodes = num_nodes)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", "compute_target.wait_for_completion(show_output=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure & Run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", "# Create a new RunConfig object\n", "run_config = RunConfiguration(framework=\"python\")\n", "\n", "# Set compute target to AmlCompute target created in previous step\n", "run_config.target = amlcompute_cluster_name\n", "\n", "from azureml.core import Environment\n", "\n", "environment_name = \"shap-gpu-tree\"\n", "\n", "env = Environment(environment_name)\n", "\n", "env.docker.enabled = True\n", "env.docker.base_image = None\n", "env.docker.base_dockerfile = \"\"\"\n", "FROM rapidsai/rapidsai:cuda10.0-devel-ubuntu18.04\n", "RUN apt-get update && \\\n", "apt-get install -y fuse && \\\n", "apt-get install -y build-essential && \\\n", "apt-get install -y python3-dev && \\\n", "source activate rapids && \\\n", "apt-get install -y g++ && \\\n", "printenv && \\\n", "echo \"which nvcc: \" && \\\n", "which nvcc && \\\n", "pip install azureml-defaults && \\\n", "pip install azureml-telemetry && \\\n", "cd /usr/local/src && \\\n", "git clone https://github.com/slundberg/shap && \\\n", "cd shap && \\\n", "mkdir build && \\\n", "python setup.py install --user && \\\n", "pip uninstall -y xgboost && \\\n", "rm /conda/envs/rapids/lib/libxgboost.so && \\\n", "pip install xgboost==1.4.2\n", "\"\"\"\n", "\n", "env.python.user_managed_dependencies = True\n", "\n", "from azureml.core import Run\n", "from azureml.core import ScriptRunConfig\n", "\n", "src = ScriptRunConfig(source_directory=project_folder, \n", " script='gpu_tree_explainer.py', \n", " compute_target=amlcompute_cluster_name,\n", " environment=env) \n", "run = experiment.submit(config=src)\n", "run" ] } ], "metadata": { "authors": [ { "name": "ilmat" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }