Compare commits
128 Commits
release_up
...
shwinne1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cf0490ab92 | ||
|
|
9f0e817c70 | ||
|
|
a4d713d19b | ||
|
|
91a20a0ff9 | ||
|
|
a0c510bf42 | ||
|
|
116d57c012 | ||
|
|
660708db63 | ||
|
|
206df82f9b | ||
|
|
7cfb2da5b8 | ||
|
|
e5adb4af3a | ||
|
|
b849267220 | ||
|
|
9891080b70 | ||
|
|
2974e86aa0 | ||
|
|
0a18161193 | ||
|
|
c676cc9969 | ||
|
|
50f4bc9643 | ||
|
|
f3c7072735 | ||
|
|
44295d9e16 | ||
|
|
710fc0bb4b | ||
|
|
c44dba427f | ||
|
|
8066a9263c | ||
|
|
054aadffed | ||
|
|
8f418b216d | ||
|
|
2d549ecad3 | ||
|
|
4dbb024529 | ||
|
|
142a1a510e | ||
|
|
2522486c26 | ||
|
|
6d5226e47c | ||
|
|
e7676d7cdc | ||
|
|
a84f6636f1 | ||
|
|
41be10d1c1 | ||
|
|
429eb43914 | ||
|
|
c0dae0c645 | ||
|
|
e4d9a2b4c5 | ||
|
|
7648e8f516 | ||
|
|
b5ed94b4eb | ||
|
|
85e487f74f | ||
|
|
c0a5b2de79 | ||
|
|
0a9e076e5f | ||
|
|
e3b974811d | ||
|
|
381d1a6f35 | ||
|
|
adaa55675e | ||
|
|
5e3c592d4b | ||
|
|
9c6f1e2571 | ||
|
|
bd1bedd563 | ||
|
|
9716f3614e | ||
|
|
d2c72ca149 | ||
|
|
4f62f64207 | ||
|
|
16473eb33e | ||
|
|
d10474c249 | ||
|
|
6389cc16f9 | ||
|
|
bc0a8e0152 | ||
|
|
39384aea52 | ||
|
|
5bf4b0bafe | ||
|
|
f22adb7949 | ||
|
|
8409ab7133 | ||
|
|
32acd55774 | ||
|
|
7f65c1a255 | ||
|
|
bc7ccc7ef3 | ||
|
|
1cc79a71e9 | ||
|
|
c0bec5f110 | ||
|
|
77e5664482 | ||
|
|
e2eb64372a | ||
|
|
03cbb6a3a2 | ||
|
|
44d3d998a8 | ||
|
|
c626f37057 | ||
|
|
0175574864 | ||
|
|
f6e8d57da3 | ||
|
|
01cd31ce44 | ||
|
|
eb2024b3e0 | ||
|
|
6bce41b3d7 | ||
|
|
bbdabbb552 | ||
|
|
65343fc263 | ||
|
|
b6b27fded6 | ||
|
|
7e492cbeb6 | ||
|
|
4cc8f4c6af | ||
|
|
9fba46821b | ||
|
|
a45954a58f | ||
|
|
f16dfb0e5b | ||
|
|
edabbf9031 | ||
|
|
63d1d57dfb | ||
|
|
10f7004161 | ||
|
|
86ba4e7406 | ||
|
|
33bda032b8 | ||
|
|
0fd4bfbc56 | ||
|
|
3fe08c944e | ||
|
|
d587ea5676 | ||
|
|
edd8562102 | ||
|
|
5ac2c63336 | ||
|
|
1f4e4cdda2 | ||
|
|
2e245c1691 | ||
|
|
e1b09f71fa | ||
|
|
8e2220d397 | ||
|
|
f74ccf5048 | ||
|
|
97a6d9ca43 | ||
|
|
a0ff1c6b64 | ||
|
|
08f15ef4cf | ||
|
|
7160416c0b | ||
|
|
218fed3d65 | ||
|
|
b8499dfb98 | ||
|
|
6bfd472cc2 | ||
|
|
ecefb229e9 | ||
|
|
883ad806ba | ||
|
|
848b5bc302 | ||
|
|
58087b53a0 | ||
|
|
ff4d5450a7 | ||
|
|
e2b2b89842 | ||
|
|
390be2ba24 | ||
|
|
cd1258f81d | ||
|
|
8a0b48ea48 | ||
|
|
b0dc904189 | ||
|
|
82bede239a | ||
|
|
774517e173 | ||
|
|
c3ce2bc7fe | ||
|
|
5dd09a1f7c | ||
|
|
ee1da0ee19 | ||
|
|
ddfce6b24c | ||
|
|
31dfc3dc55 | ||
|
|
168c45b188 | ||
|
|
159948db67 | ||
|
|
d842731a3b | ||
|
|
7822fd4c13 | ||
|
|
d9fbe4cd87 | ||
|
|
a64f4d331a | ||
|
|
c41f449208 | ||
|
|
4fe8c1702d | ||
|
|
b025816c92 | ||
|
|
c75e820107 |
30
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
30
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
name: Bug report
|
||||
about: Create a report to help us improve
|
||||
title: "[Notebook issue]"
|
||||
labels: ''
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
**Describe the bug**
|
||||
A clear and concise description of what the bug is.
|
||||
|
||||
Provide the following if applicable:
|
||||
+ Your Python & SDK version
|
||||
+ Python Scripts or the full notebook name
|
||||
+ Pipeline definition
|
||||
+ Environment definition
|
||||
+ Example data
|
||||
+ Any log files.
|
||||
+ Run and Workspace Id
|
||||
|
||||
**To Reproduce**
|
||||
Steps to reproduce the behavior:
|
||||
1.
|
||||
|
||||
**Expected behavior**
|
||||
A clear and concise description of what you expected to happen.
|
||||
|
||||
**Additional context**
|
||||
Add any other context about the problem here.
|
||||
30
.github/ISSUE_TEMPLATE/notebook-issue.md
vendored
Normal file
30
.github/ISSUE_TEMPLATE/notebook-issue.md
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
name: Notebook issue
|
||||
about: Create a report to help us improve
|
||||
title: "[Notebook issue]"
|
||||
labels: ''
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
**Describe the bug**
|
||||
A clear and concise description of what the bug is.
|
||||
|
||||
Provide the following if applicable:
|
||||
+ Your Python & SDK version
|
||||
+ Python Scripts or the full notebook name
|
||||
+ Pipeline definition
|
||||
+ Environment definition
|
||||
+ Example data
|
||||
+ Any log files.
|
||||
+ Run and Workspace Id
|
||||
|
||||
**To Reproduce**
|
||||
Steps to reproduce the behavior:
|
||||
1.
|
||||
|
||||
**Expected behavior**
|
||||
A clear and concise description of what you expected to happen.
|
||||
|
||||
**Additional context**
|
||||
Add any other context about the problem here.
|
||||
@@ -38,6 +38,7 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
||||
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
||||
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
||||
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
||||
- [Monitor Models](./how-to-use-azureml/monitor-models) - Examples showing how to enable model monitoring services such as DataDrift
|
||||
|
||||
---
|
||||
## Documentation
|
||||
@@ -52,6 +53,7 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
||||
|
||||
Visit following repos to see projects contributed by Azure ML users:
|
||||
|
||||
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
|
||||
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||
|
||||
|
||||
215
build_nb_index.py
Normal file
215
build_nb_index.py
Normal file
@@ -0,0 +1,215 @@
|
||||
# ---------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# ---------------------------------------------------------
|
||||
|
||||
### USAGE
|
||||
#
|
||||
# 1. Add following metadata elements to the notebook
|
||||
#
|
||||
# "friendly_name": "string", friendly name for notebook
|
||||
# "exclude_from_index": true/false, setting true excludes the notebook from index
|
||||
# "order_index": integer, smaller value moves notebook closer to beginning
|
||||
# "category": "starter", "tutorial", "training", "deployment" or "other"
|
||||
# "tags": [ "featured" ], optional, only supported tag to highlight notebook with :star: symbol
|
||||
# "task": "string", description of notebook task
|
||||
# "datasets": [ "dataset 1", "dataset 2"], list of datasets, can be ["None"]
|
||||
# "compute": [ "compute 1", "compute 2" ], list of computes, can be ["None"]
|
||||
# "deployment": ["deployment 1", "deployment 2"], list of deployment targets, can be ["None"]
|
||||
# "framework": ["fw 1", "fw2"], list of ml framework, can be ["None"]
|
||||
#
|
||||
# 2. Then run
|
||||
#
|
||||
# build_nb_index.py <root folder of notebooks>
|
||||
#
|
||||
# 3. The script should produce index.md file with tables of notebook indices
|
||||
|
||||
### Example metadata section
|
||||
|
||||
'''
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "cforbe"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.7"
|
||||
},
|
||||
"msauthor": "trbye",
|
||||
"friendly_name": "Prepare data for regression modeling",
|
||||
"exclude_from_index": false,
|
||||
"order_index": 1,
|
||||
"category": "tutorial",
|
||||
"tags": [
|
||||
"featured"
|
||||
],
|
||||
"task": "Regression",
|
||||
"datasets": [
|
||||
"NYC Taxi"
|
||||
],
|
||||
"compute": [
|
||||
"local"
|
||||
],
|
||||
"deployment": [
|
||||
"None"
|
||||
],
|
||||
"framework": [
|
||||
"Azure ML AutoML"
|
||||
]
|
||||
}
|
||||
'''
|
||||
|
||||
import os, json, sys
|
||||
from shutil import copyfile, copytree, rmtree
|
||||
|
||||
|
||||
# Index building walk over notebook folder
|
||||
def post_process(notebooks_dir):
|
||||
indexer = NotebookIndex()
|
||||
n_dest = len(notebooks_dir)
|
||||
for r, d, f in os.walk(notebooks_dir):
|
||||
for file in f:
|
||||
# Handle only notebooks
|
||||
if file.endswith(".ipynb") and not file.endswith('checkpoint.ipynb'):
|
||||
try:
|
||||
file_path = os.path.join(r, file)
|
||||
with open(file_path, 'r') as fin:
|
||||
content = json.load(fin)
|
||||
print(file)
|
||||
indexer.add_to_index(os.path.join(r[n_dest:],file), content["metadata"])
|
||||
except Exception as e:
|
||||
print("Problem: ",str(e))
|
||||
indexer.write_index("./index.md")
|
||||
|
||||
### Customize these make index look different
|
||||
|
||||
index_template = '''
|
||||
# Index
|
||||
Azure Machine Learning is a cloud service that you use to train, deploy, automate, and manage machine learning models. This index should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content.
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|
||||
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
|
||||
GETTING_STARTED_NBS
|
||||
|
||||
## Tutorials
|
||||
|
||||
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|
||||
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
|
||||
TUTORIAL_NBS
|
||||
|
||||
## Training
|
||||
|
||||
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|
||||
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
|
||||
TRAINING_NBS
|
||||
|
||||
## Deployment
|
||||
|
||||
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|
||||
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
|
||||
DEPLOYMENT_NBS
|
||||
|
||||
## Other Notebooks
|
||||
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|
||||
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
|
||||
OTHER_NBS
|
||||
'''
|
||||
|
||||
index_row = '''| NB_SYMBOL[NB_NAME](NB_PATH) | NB_TASK | NB_DATASET | NB_COMPUTE | NB_DEPLOYMENT | NB_FRAMEWORK |'''
|
||||
|
||||
index_file = "index.md"
|
||||
|
||||
nb_types = ["starter", "tutorial", "training", "deployment", "other"]
|
||||
replace_strings = ["GETTING_STARTED_NBS", "TUTORIAL_NBS", "TRAINING_NBS", "DEPLOYMENT_NBS", "OTHER_NBS"]
|
||||
|
||||
class NotebookIndex:
|
||||
def __init__(self):
|
||||
self.index = index_template
|
||||
self.nb_rows = {}
|
||||
for elem in nb_types:
|
||||
self.nb_rows[elem] = []
|
||||
|
||||
def add_to_index(self, path_to_notebook, metadata):
|
||||
repo_url = "https://github.com/Azure/MachineLearningNotebooks/blob/master/"
|
||||
|
||||
if "exclude_from_index" in metadata:
|
||||
if metadata["exclude_from_index"]:
|
||||
return
|
||||
|
||||
if "friendly_name" in metadata:
|
||||
this_row = index_row.replace("NB_NAME",metadata["friendly_name"])
|
||||
else:
|
||||
this_name = os.path.basename(path_to_notebook)
|
||||
this_row = index_row.replace("NB_NAME", this_name[:-6])
|
||||
|
||||
path_to_notebook = path_to_notebook.replace("\\","/")
|
||||
this_row = this_row.replace("NB_PATH", repo_url + path_to_notebook)
|
||||
|
||||
if "task" in metadata:
|
||||
this_row = this_row.replace("NB_TASK", metadata["task"])
|
||||
if "datasets" in metadata:
|
||||
this_row = this_row.replace("NB_DATASET", ", ".join(metadata["datasets"]))
|
||||
if "compute" in metadata:
|
||||
this_row = this_row.replace("NB_COMPUTE", ", ".join(metadata["compute"]))
|
||||
if "deployment" in metadata:
|
||||
this_row = this_row.replace("NB_DEPLOYMENT", ", ".join(metadata["deployment"]))
|
||||
if "framework" in metadata:
|
||||
this_row = this_row.replace("NB_FRAMEWORK", ", ".join(metadata["framework"]))
|
||||
## Fall back
|
||||
this_row = this_row.replace("NB_TASK","")
|
||||
this_row = this_row.replace("NB_DATASET","")
|
||||
this_row = this_row.replace("NB_COMPUTE","")
|
||||
this_row = this_row.replace("NB_DEPLOYMENT","")
|
||||
this_row = this_row.replace("NB_FRAMEWORK","")
|
||||
|
||||
if "tags" in metadata:
|
||||
if "featured" in metadata["tags"]:
|
||||
this_row = this_row.replace("NB_SYMBOL",":star:")
|
||||
## Fall back
|
||||
this_row =this_row.replace("NB_SYMBOL","")
|
||||
|
||||
index_order = 9999999
|
||||
if "index_order" in metadata:
|
||||
index_order = metadata["index_order"]
|
||||
|
||||
if "category" in metadata:
|
||||
self.nb_rows[metadata["category"]].append((index_order, this_row))
|
||||
else:
|
||||
self.nb_rows["other"].append((index_order, this_row))
|
||||
|
||||
def sort_and_stringify(self,section):
|
||||
sorted_index = sorted(self.nb_rows[section], key = lambda x: x[0])
|
||||
sorted_index = [x[1] for x in sorted_index]
|
||||
## TODO: Make this portable
|
||||
return "\n".join(sorted_index)
|
||||
|
||||
def write_index(self, index_file):
|
||||
for nb_type, replace_string in zip(nb_types, replace_strings):
|
||||
nb_string = self.sort_and_stringify(nb_type)
|
||||
self.index = self.index.replace(replace_string, nb_string)
|
||||
with open(index_file,"w") as fin:
|
||||
fin.write(self.index)
|
||||
|
||||
try:
|
||||
dest_repo = sys.argv[1]
|
||||
except:
|
||||
dest_repo = "./MachineLearningNotebooks"
|
||||
|
||||
post_process(dest_repo)
|
||||
@@ -103,7 +103,7 @@
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"print(\"This notebook was created using version 1.0.48\r\n of the Azure ML SDK\")\n",
|
||||
"print(\"This notebook was created using version 1.0.55 of the Azure ML SDK\")\n",
|
||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||
]
|
||||
},
|
||||
@@ -214,7 +214,8 @@
|
||||
"* You do not have permission to create a resource group if it's non-existing.\n",
|
||||
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||
"\n",
|
||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.\n",
|
||||
"To learn more about the Enterprise SKU, please visit the Pricing and SKU details page."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -230,11 +231,14 @@
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"# Create the workspace using the specified parameters\n",
|
||||
"# To create an Enterprise workspace, please specify the sku = enterprise\n",
|
||||
|
||||
"ws = Workspace.create(name = workspace_name,\n",
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group, \n",
|
||||
" location = workspace_region,\n",
|
||||
" create_resource_group = True,\n",
|
||||
" create_resource_group = True,\n",
|
||||
" sku = basic,\n",
|
||||
" exist_ok = True)\n",
|
||||
"ws.get_details()\n",
|
||||
"\n",
|
||||
@@ -380,4 +384,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
||||
|
||||
@@ -175,10 +175,19 @@ jupyter notebook
|
||||
- Example of training an automated ML forecasting model on multiple time-series
|
||||
|
||||
- [auto-ml-classification-with-onnx.ipynb](classification-with-onnx/auto-ml-classification-with-onnx.ipynb)
|
||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||
- Dataset: scikit learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)
|
||||
- Simple example of using automated ML for classification with ONNX models
|
||||
- Uses local compute for training
|
||||
|
||||
- [auto-ml-remote-amlcompute-with-onnx.ipynb](remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb)
|
||||
- Dataset: scikit learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)
|
||||
- Example of using automated ML for classification using remote AmlCompute for training
|
||||
- Train the models with ONNX compatible config on
|
||||
- Parallel execution of iterations
|
||||
- Async tracking of progress
|
||||
- Cancelling individual iterations or entire run
|
||||
- Retrieving the ONNX models and do the inference with them
|
||||
|
||||
- [auto-ml-bank-marketing-subscribers-with-deployment.ipynb](bank-marketing-subscribers-with-deployment/auto-ml-bank-marketing-with-deployment.ipynb)
|
||||
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
|
||||
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
|
||||
|
||||
@@ -2,6 +2,7 @@ name: azure_automl
|
||||
dependencies:
|
||||
# The python interpreter version.
|
||||
# Currently Azure ML only supports 3.5.2 and later.
|
||||
- pip
|
||||
- nomkl
|
||||
- python>=3.5.2,<3.6.8
|
||||
- nb_conda
|
||||
|
||||
@@ -1,15 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -20,38 +11,40 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Automated Machine Learning\n",
|
||||
"_**Classification with Deployment using a Bank Marketing Dataset**_\n",
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"## Contents\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Train](#Train)\n",
|
||||
"1. [Results](#Results)\n",
|
||||
"1. [Deploy](#Deploy)\n",
|
||||
"1. [Test](#Test)\n",
|
||||
"1. [Acknowledgements](#Acknowledgements)"
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"# Unique Descriptive Title\n",
|
||||
"_**Unique Subtitle**_\n",
|
||||
"\n",
|
||||
"In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n",
|
||||
"Introduction that describes in a customer friendly language, what they will do and accomplish.\n".
|
||||
"## Contents\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Prerequisites](#Prerequisites)\n",
|
||||
"1. [Configuration and Setup](#Setup)\n",
|
||||
"1. [Working with Data](#Working with Data)\n",
|
||||
"1. [Training](#Training)\n",
|
||||
"1. [Productionizing](#Productionizing)\n",
|
||||
"1. [Model Monitoring](#Model Monitoring)\n",
|
||||
"1. [Clean up resources](#Clean up resources)\n",
|
||||
"1. [Next Steps](#Next Steps)\n",
|
||||
"1. [Acknowledgements](#Acknowledgements)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuration\n",
|
||||
"\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||
"\n",
|
||||
"In this notebook you will learn how to:\n",
|
||||
"1. Create an experiment using an existing workspace.\n",
|
||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||
"3. Train the model using local compute.\n",
|
||||
"4. Explore the results.\n",
|
||||
"5. Register the model.\n",
|
||||
"6. Create a container image.\n",
|
||||
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||
"8. Test the ACI service."
|
||||
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||
"Please note that a Basic edition workspace is created by default in the configuration.ipynb file.\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -60,8 +53,7 @@
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||
]
|
||||
"As part of the setup you have already created an Azure ML `Workspace` object....\n",
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -69,22 +61,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import logging\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"import os\n",
|
||||
"from sklearn import datasets\n",
|
||||
"import azureml.dataprep as dprep\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.train.automl import AutoMLConfig\n",
|
||||
"from azureml.train.automl.run import AutoMLRun"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -93,26 +70,21 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"tenant_id = os.environ['TENANT_ID’]\n",
|
||||
"client_id = os.environ['CLIENT_ID’]\n",
|
||||
"run = Run.get_context()\n",
|
||||
"secret_name = “{0}-secret”.format(client_id)\n",
|
||||
"secret = run.get_secret(name=secret_name)\n",
|
||||
"sp_auth = ServicePrincipalAuthentication(tenant_id, client_id, secret)\n",
|
||||
"ws = Workspace.from_config(auth=sp_auth)\n",
|
||||
"\n",
|
||||
"# choose a name for experiment\n",
|
||||
"experiment_name = 'automl-classification-bmarketing'\n",
|
||||
"# choose a unique name for experiment\n",
|
||||
"experiment_name = 'unique-name'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './sample_projects/automl-classification-bankmarketing'\n",
|
||||
"project_folder = './sample_projects/test'\n",
|
||||
"\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
"output = {}\n",
|
||||
"output['SDK version'] = azureml.core.VERSION\n",
|
||||
"output['Subscription ID'] = ws.subscription_id\n",
|
||||
"output['Workspace'] = ws.name\n",
|
||||
"output['Resource Group'] = ws.resource_group\n",
|
||||
"output['Location'] = ws.location\n",
|
||||
"output['Project Directory'] = project_folder\n",
|
||||
"output['Experiment Name'] = experiment.name\n",
|
||||
"pd.set_option('display.max_colwidth', -1)\n",
|
||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -120,583 +92,77 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create or Attach existing AmlCompute\n",
|
||||
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||
"You will need to create a compute target for your run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import AmlCompute\n",
|
||||
"from azureml.core.compute import ComputeTarget\n",
|
||||
"# Working with Data\n",
|
||||
"\n",
|
||||
"# Choose a name for your cluster.\n",
|
||||
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||
"\n",
|
||||
"found = False\n",
|
||||
"# Check if this compute target already exists in the workspace.\n",
|
||||
"cts = ws.compute_targets\n",
|
||||
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||
" found = True\n",
|
||||
" print('Found existing compute target.')\n",
|
||||
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||
" \n",
|
||||
"if not found:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||
" #vm_priority = 'lowpriority', # optional\n",
|
||||
" max_nodes = 6)\n",
|
||||
"\n",
|
||||
" # Create the cluster.\n",
|
||||
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||
" \n",
|
||||
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||
" \n",
|
||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||
"Here you would learn how to perform Data labeling and use Open Datasets etc..\n",
|
||||
"To do this first load....\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Data\n",
|
||||
"## Training\n",
|
||||
"\n",
|
||||
"Here load the data in the get_data() script to be utilized in azure compute. To do this first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_Run_config."
|
||||
"Here you would learn how to train a DNN using...\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if not os.path.isdir('data'):\n",
|
||||
" os.mkdir('data')\n",
|
||||
" \n",
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)"
|
||||
"# Productionizing\n",
|
||||
"\n",
|
||||
"Here you would learn how to deploy your model to ACI to perform...\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.runconfig import RunConfiguration\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"import pkg_resources\n",
|
||||
"# Model Monitoring\n",
|
||||
"\n",
|
||||
"# create a new RunConfig object\n",
|
||||
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||
"Here you would learn how to detect datadrift etc...\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Clean up resources\n",
|
||||
"\n",
|
||||
"# Set compute target to AmlCompute\n",
|
||||
"conda_run_config.target = compute_target\n",
|
||||
"conda_run_config.environment.docker.enabled = True\n",
|
||||
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||
"Now, let's clean up the resources we created...\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next Steps\n",
|
||||
"\n",
|
||||
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||
"In this notebook, you’ve done x, y, z. You can learn more with these resources:\n",
|
||||
"+ [SDK reference documentation for `MyClass`]()\n",
|
||||
"+ [About this feature](https://docs.microsoft.com/azure/machine-learning/service/thisfeature)\n",
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Data\n",
|
||||
"\n",
|
||||
"Here we create the script to be run in azure comput for loading the data, we load the bank marketing dataset into X_train and y_train. Next X_train and y_train is returned for training the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
|
||||
"dflow = dprep.auto_read_file(data)\n",
|
||||
"dflow.get_profile()\n",
|
||||
"X_train = dflow.drop_columns(columns=['y'])\n",
|
||||
"y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||
"dflow.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
"|**task**|classification or regression|\n",
|
||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||
"\n",
|
||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"automl_settings = {\n",
|
||||
" \"iteration_timeout_minutes\": 5,\n",
|
||||
" \"iterations\": 10,\n",
|
||||
" \"n_cross_validations\": 2,\n",
|
||||
" \"primary_metric\": 'AUC_weighted',\n",
|
||||
" \"preprocess\": True,\n",
|
||||
" \"max_concurrent_iterations\": 5,\n",
|
||||
" \"verbosity\": logging.INFO,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||
" debug_log = 'automl_errors.log',\n",
|
||||
" path = project_folder,\n",
|
||||
" run_configuration=conda_run_config,\n",
|
||||
" X = X_train,\n",
|
||||
" y = y_train,\n",
|
||||
" **automl_settings\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"remote_run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Widget for Monitoring Runs\n",
|
||||
"\n",
|
||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||
"\n",
|
||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"RunDetails(remote_run).show() "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy\n",
|
||||
"\n",
|
||||
"### Retrieve the Best Model\n",
|
||||
"\n",
|
||||
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"best_run, fitted_model = remote_run.get_output()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Register the Fitted Model for Deployment\n",
|
||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
|
||||
"tags = None\n",
|
||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||
"\n",
|
||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Scoring Script\n",
|
||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile score.py\n",
|
||||
"import pickle\n",
|
||||
"import json\n",
|
||||
"import numpy\n",
|
||||
"import azureml.train.automl\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def init():\n",
|
||||
" global model\n",
|
||||
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||
" # deserialize the model file back into a sklearn model\n",
|
||||
" model = joblib.load(model_path)\n",
|
||||
"\n",
|
||||
"def run(rawdata):\n",
|
||||
" try:\n",
|
||||
" data = json.loads(rawdata)['data']\n",
|
||||
" data = numpy.array(data)\n",
|
||||
" result = model.predict(data)\n",
|
||||
" except Exception as e:\n",
|
||||
" result = str(e)\n",
|
||||
" return json.dumps({\"error\": result})\n",
|
||||
" return json.dumps({\"result\":result.tolist()})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a YAML File for the Environment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"\n",
|
||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||
"\n",
|
||||
"conda_env_file_name = 'myenv.yml'\n",
|
||||
"myenv.save_to_file('.', conda_env_file_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Substitute the actual version number in the environment file.\n",
|
||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||
"\n",
|
||||
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||
" content = cefr.read()\n",
|
||||
"\n",
|
||||
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||
"\n",
|
||||
"# Substitute the actual model id in the script file.\n",
|
||||
"\n",
|
||||
"script_file_name = 'score.py'\n",
|
||||
"\n",
|
||||
"with open(script_file_name, 'r') as cefr:\n",
|
||||
" content = cefr.read()\n",
|
||||
"\n",
|
||||
"with open(script_file_name, 'w') as cefw:\n",
|
||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a Container Image\n",
|
||||
"\n",
|
||||
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||
"or when testing a model that is under development."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.image import Image, ContainerImage\n",
|
||||
"\n",
|
||||
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||
" execution_script = script_file_name,\n",
|
||||
" conda_file = conda_env_file_name,\n",
|
||||
" tags = {'area': \"bmData\", 'type': \"automl_classification\"},\n",
|
||||
" description = \"Image for automl classification sample\")\n",
|
||||
"\n",
|
||||
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||
" # this is the model object \n",
|
||||
" models = [model],\n",
|
||||
" image_config = image_config, \n",
|
||||
" workspace = ws)\n",
|
||||
"\n",
|
||||
"image.wait_for_creation(show_output = True)\n",
|
||||
"\n",
|
||||
"if image.creation_state == 'Failed':\n",
|
||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||
"\n",
|
||||
"Deploy an image that contains the model and other assets needed by the service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||
" memory_gb = 1, \n",
|
||||
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
|
||||
" description = 'sample service for Automl Classification')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"\n",
|
||||
"aci_service_name = 'automl-sample-bankmarketing'\n",
|
||||
"print(aci_service_name)\n",
|
||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||
" image = image,\n",
|
||||
" name = aci_service_name,\n",
|
||||
" workspace = ws)\n",
|
||||
"aci_service.wait_for_deployment(True)\n",
|
||||
"print(aci_service.state)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete a Web Service\n",
|
||||
"\n",
|
||||
"Deletes the specified web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#aci_service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Get Logs from a Deployed Web Service\n",
|
||||
"\n",
|
||||
"Gets logs from a deployed web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#aci_service.get_logs()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test\n",
|
||||
"\n",
|
||||
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load the bank marketing datasets.\n",
|
||||
"from sklearn.datasets import load_diabetes\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"from numpy import array"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
|
||||
"dflow = dprep.auto_read_file(data)\n",
|
||||
"dflow.get_profile()\n",
|
||||
"X_test = dflow.drop_columns(columns=['y'])\n",
|
||||
"y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||
"dflow.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_test = X_test.to_pandas_dataframe()\n",
|
||||
"y_test = y_test.to_pandas_dataframe()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = fitted_model.predict(X_test)\n",
|
||||
"actual = array(y_test)\n",
|
||||
"actual = actual[:,0]\n",
|
||||
"print(y_pred.shape, \" \", actual.shape)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Calculate metrics for the prediction\n",
|
||||
"\n",
|
||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||
"from the trained model that was returned."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib notebook\n",
|
||||
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
|
||||
"test_test = plt.scatter(actual, actual, color='g')\n",
|
||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgements"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
||||
"This dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
||||
"\n",
|
||||
"_**Acknowledgements**_\n",
|
||||
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
||||
"This dataset is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
||||
"\n",
|
||||
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
||||
]
|
||||
@@ -705,9 +171,32 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "v-rasav"
|
||||
"name": "YOUR ALIAS"
|
||||
}
|
||||
],
|
||||
"category": "tutorial",
|
||||
"compute": [
|
||||
"AML Compute"
|
||||
],
|
||||
"datasets": [
|
||||
"MNIST"
|
||||
],
|
||||
"deployment": [
|
||||
"AKS"
|
||||
],
|
||||
"exclude_from_index": false,
|
||||
"framework": [
|
||||
"PyTorch"
|
||||
],
|
||||
"friendly_name": "How to use ModuleStep with AML Pipelines",
|
||||
},
|
||||
"order_index": 14,
|
||||
"star_tag": [],
|
||||
"tags": [
|
||||
"Pipeline Builder"
|
||||
],
|
||||
"task": "Demonstrates the use of ModuleStep"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
@@ -728,4 +217,8 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -221,7 +221,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
|
||||
"dflow = dprep.auto_read_file(data)\n",
|
||||
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
|
||||
"dflow.get_profile()\n",
|
||||
"X = dflow.drop_columns(columns=['Class'])\n",
|
||||
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
|
||||
|
||||
@@ -41,6 +41,8 @@
|
||||
"\n",
|
||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||
"\n",
|
||||
"An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace#upgrade).\n",
|
||||
|
||||
"In this notebook you will learn how to:\n",
|
||||
"1. Create an experiment using an existing workspace.\n",
|
||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||
@@ -61,61 +63,13 @@
|
||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import logging\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from sklearn import datasets\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.train.automl import AutoMLConfig\n",
|
||||
"from azureml.train.automl.run import AutoMLRun"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# choose a name for experiment\n",
|
||||
"experiment_name = 'automl-classification-deployment'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './sample_projects/automl-classification-deployment'\n",
|
||||
"\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
"output = {}\n",
|
||||
"output['SDK version'] = azureml.core.VERSION\n",
|
||||
"output['Subscription ID'] = ws.subscription_id\n",
|
||||
"output['Workspace'] = ws.name\n",
|
||||
"output['Resource Group'] = ws.resource_group\n",
|
||||
"output['Location'] = ws.location\n",
|
||||
"output['Project Directory'] = project_folder\n",
|
||||
"output['Experiment Name'] = experiment.name\n",
|
||||
"pd.set_option('display.max_colwidth', -1)\n",
|
||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"The following steps require an Enterprise workspace to gain access to these features.To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace#upgrade).\n",
|
||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
@@ -484,7 +438,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "savitam"
|
||||
"name": "shwinne"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -507,4 +461,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
||||
|
||||
@@ -29,7 +29,6 @@
|
||||
"1. [Data](#Data)\n",
|
||||
"1. [Train](#Train)\n",
|
||||
"1. [Results](#Results)\n",
|
||||
"1. [Test](#Test)\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
@@ -39,7 +38,7 @@
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||
"In this example we use the scikit-learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||
"\n",
|
||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||
"\n",
|
||||
@@ -49,7 +48,8 @@
|
||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||
"3. Train the model using local compute with ONNX compatible config on.\n",
|
||||
"4. Explore the results and save the ONNX model."
|
||||
"4. Explore the results and save the ONNX model.\n",
|
||||
"5. Inference with the ONNX model."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,11 +156,11 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train with enable ONNX compatible models config on\n",
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||
"\n",
|
||||
"Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
||||
"**Note:** Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
|
||||
@@ -41,7 +41,7 @@
|
||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||
"\n",
|
||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||
"This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
|
||||
"This notebooks shows how can automl can be trained on a selected list of models, see the readme.md for the models.\n",
|
||||
"This trains the model exclusively on tensorflow based models.\n",
|
||||
"\n",
|
||||
"In this notebook you will learn how to:\n",
|
||||
|
||||
@@ -258,7 +258,11 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"widget-rundetails-sample"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
@@ -475,7 +479,27 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
}
|
||||
},
|
||||
"friendly_name": "Testing index",
|
||||
"exclude_from_index": false,
|
||||
"order_index": 1,
|
||||
"category": "tutorial",
|
||||
"tags": [
|
||||
"featured"
|
||||
],
|
||||
"task": "Regression",
|
||||
"datasets": [
|
||||
"NYC Taxi"
|
||||
],
|
||||
"compute": [
|
||||
"local"
|
||||
],
|
||||
"deployment": [
|
||||
"None"
|
||||
],
|
||||
"framework": [
|
||||
"Azure ML AutoML"
|
||||
],
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
||||
@@ -128,7 +128,7 @@
|
||||
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
||||
"# and convert column types manually.\n",
|
||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||
"dflow = dprep.read_csv(example_data, infer_column_types=True)\n",
|
||||
"dflow.get_profile()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -197,12 +197,12 @@
|
||||
"display(HTML('<h3>Iterations</h3>'))\n",
|
||||
"RunDetails(ml_run).show() \n",
|
||||
"\n",
|
||||
"children = list(ml_run.get_children())\n",
|
||||
"all_metrics = ml_run.get_metrics(recursive=True)\n",
|
||||
"metricslist = {}\n",
|
||||
"for run in children:\n",
|
||||
" properties = run.get_properties()\n",
|
||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||
"for run_id, metrics in all_metrics.items():\n",
|
||||
" iteration = int(run_id.split('_')[-1])\n",
|
||||
" float_metrics = {k: v for k, v in metrics.items() if isinstance(v, float)}\n",
|
||||
" metricslist[iteration] = float_metrics\n",
|
||||
"\n",
|
||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||
"display(HTML('<h3>Metrics</h3>'))\n",
|
||||
|
||||
@@ -345,7 +345,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
|
||||
"# Get the featurization summary as a list of JSON\n",
|
||||
"featurization_summary = fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()\n",
|
||||
"# View the featurization summary as a pandas dataframe\n",
|
||||
"pd.DataFrame.from_records(featurization_summary)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -522,7 +525,7 @@
|
||||
"print('MAPE: %.2f' % MAPE(df_all[target_column_name], df_all['predicted']))\n",
|
||||
"\n",
|
||||
"# Plot outputs\n",
|
||||
"%matplotlib notebook\n",
|
||||
"%matplotlib inline\n",
|
||||
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
|
||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||
@@ -564,7 +567,7 @@
|
||||
"df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all['predicted']))\n",
|
||||
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, max_horizon + 1)]\n",
|
||||
"\n",
|
||||
"%matplotlib notebook\n",
|
||||
"%matplotlib inline\n",
|
||||
"plt.boxplot(APEs)\n",
|
||||
"plt.yscale('log')\n",
|
||||
"plt.xlabel('horizon')\n",
|
||||
@@ -578,7 +581,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "xiaga@microsoft.com, tosingli@microsoft.com, erwright@microsoft.com"
|
||||
"name": "erwright"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
|
||||
@@ -432,7 +432,7 @@
|
||||
"print('MAPE: %.2f' % MAPE(df_all[target_column_name], df_all['predicted']))\n",
|
||||
"\n",
|
||||
"# Plot outputs\n",
|
||||
"%matplotlib notebook\n",
|
||||
"%matplotlib inline\n",
|
||||
"pred, = plt.plot(df_all[time_column_name], df_all['predicted'], color='b')\n",
|
||||
"actual, = plt.plot(df_all[time_column_name], df_all[target_column_name], color='g')\n",
|
||||
"plt.xticks(fontsize=8)\n",
|
||||
@@ -543,7 +543,7 @@
|
||||
"print('MAPE: %.2f' % MAPE(df_lags[target_column_name], df_lags['predicted']))\n",
|
||||
"\n",
|
||||
"# Plot outputs\n",
|
||||
"%matplotlib notebook\n",
|
||||
"%matplotlib inline\n",
|
||||
"pred, = plt.plot(df_lags[time_column_name], df_lags['predicted'], color='b')\n",
|
||||
"actual, = plt.plot(df_lags[time_column_name], df_lags[target_column_name], color='g')\n",
|
||||
"plt.xticks(fontsize=8)\n",
|
||||
@@ -587,7 +587,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "xiaga, tosingli, erwright"
|
||||
"name": "erwright"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
|
||||
@@ -463,7 +463,7 @@
|
||||
"# Plot outputs\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"%matplotlib notebook\n",
|
||||
"%matplotlib inline\n",
|
||||
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
|
||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||
@@ -829,7 +829,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "erwright, tosingli"
|
||||
"name": "erwright"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
|
||||
@@ -360,7 +360,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||
"# Get the featurization summary as a list of JSON\n",
|
||||
"featurization_summary = fitted_model.named_steps['datatransformer'].get_featurization_summary()\n",
|
||||
"# View the featurization summary as a pandas dataframe\n",
|
||||
"pd.DataFrame.from_records(featurization_summary)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -216,7 +216,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
|
||||
"dflow = dprep.auto_read_file(data)\n",
|
||||
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
|
||||
"dflow.get_profile()\n",
|
||||
"X = dflow.drop_columns(columns=['CONCRETE'])\n",
|
||||
"y = dflow.keep_columns(columns=['CONCRETE'], validate_column_exists=True)\n",
|
||||
|
||||
@@ -216,7 +216,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
|
||||
"dflow = dprep.auto_read_file(data)\n",
|
||||
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
|
||||
"dflow.get_profile()\n",
|
||||
"X = dflow.drop_columns(columns=['ERP'])\n",
|
||||
"y = dflow.keep_columns(columns=['ERP'], validate_column_exists=True)\n",
|
||||
|
||||
@@ -0,0 +1,554 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Automated Machine Learning\n",
|
||||
"_**Remote Execution using AmlCompute**_\n",
|
||||
"\n",
|
||||
"## Contents\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Data](#Data)\n",
|
||||
"1. [Train](#Train)\n",
|
||||
"1. [Results](#Results)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"In this example we use the scikit-learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||
"\n",
|
||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||
"\n",
|
||||
"In this notebook you would see\n",
|
||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||
"2. Create or Attach existing AmlCompute to a workspace.\n",
|
||||
"3. Configure AutoML using `AutoMLConfig`.\n",
|
||||
"4. Train the model using AmlCompute with ONNX compatible config on.\n",
|
||||
"5. Explore the results and save the ONNX model.\n",
|
||||
"6. Inference with the ONNX model.\n",
|
||||
"\n",
|
||||
"In addition this notebook showcases the following features\n",
|
||||
"- **Parallel** executions for iterations\n",
|
||||
"- **Asynchronous** tracking of progress\n",
|
||||
"- **Cancellation** of individual iterations or the entire run\n",
|
||||
"- Retrieving models for any iteration or logged metric\n",
|
||||
"- Specifying AutoML settings as `**kwargs`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"import os\n",
|
||||
"import csv\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.train.automl import AutoMLConfig\n",
|
||||
"import azureml.dataprep as dprep"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# Choose a name for the run history container in the workspace.\n",
|
||||
"experiment_name = 'automl-remote-amlcompute-with-onnx'\n",
|
||||
"project_folder = './project'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
"output = {}\n",
|
||||
"output['SDK version'] = azureml.core.VERSION\n",
|
||||
"output['Subscription ID'] = ws.subscription_id\n",
|
||||
"output['Workspace Name'] = ws.name\n",
|
||||
"output['Resource Group'] = ws.resource_group\n",
|
||||
"output['Location'] = ws.location\n",
|
||||
"output['Project Directory'] = project_folder\n",
|
||||
"output['Experiment Name'] = experiment.name\n",
|
||||
"pd.set_option('display.max_colwidth', -1)\n",
|
||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create or Attach existing AmlCompute\n",
|
||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
||||
"\n",
|
||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||
"\n",
|
||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import AmlCompute\n",
|
||||
"from azureml.core.compute import ComputeTarget\n",
|
||||
"\n",
|
||||
"# Choose a name for your cluster.\n",
|
||||
"amlcompute_cluster_name = \"cpu-cluster\"\n",
|
||||
"\n",
|
||||
"found = False\n",
|
||||
"# Check if this compute target already exists in the workspace.\n",
|
||||
"cts = ws.compute_targets\n",
|
||||
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||
" found = True\n",
|
||||
" print('Found existing compute target.')\n",
|
||||
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||
"\n",
|
||||
"if not found:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||
" #vm_priority = 'lowpriority', # optional\n",
|
||||
" max_nodes = 6)\n",
|
||||
"\n",
|
||||
" # Create the cluster.\\n\",\n",
|
||||
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||
"\n",
|
||||
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||
"\n",
|
||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data\n",
|
||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||
"This can be done by uploading the data to DataStore.\n",
|
||||
"In this example, we upload scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iris = datasets.load_iris()\n",
|
||||
"\n",
|
||||
"if not os.path.isdir('data'):\n",
|
||||
" os.mkdir('data')\n",
|
||||
"\n",
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)\n",
|
||||
"\n",
|
||||
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
||||
" iris.target, \n",
|
||||
" test_size=0.2, \n",
|
||||
" random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Ensure the x_train and x_test are pandas DataFrame."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
||||
"# This is needed for initializing the input variable names of ONNX model, \n",
|
||||
"# and the prediction with the ONNX model using the inference helper.\n",
|
||||
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
||||
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
||||
"y_train = pd.DataFrame(y_train, columns=['label'])\n",
|
||||
"\n",
|
||||
"X_train.to_csv(\"data/X_train.csv\", index=False)\n",
|
||||
"y_train.to_csv(\"data/y_train.csv\", index=False)\n",
|
||||
"\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"ds.upload(src_dir='./data', target_path='irisdata', overwrite=True, show_progress=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.runconfig import RunConfiguration\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"import pkg_resources\n",
|
||||
"\n",
|
||||
"# create a new RunConfig object\n",
|
||||
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||
"\n",
|
||||
"# Set compute target to AmlCompute\n",
|
||||
"conda_run_config.target = compute_target\n",
|
||||
"conda_run_config.environment.docker.enabled = True\n",
|
||||
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||
"\n",
|
||||
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Dprep reference\n",
|
||||
"\n",
|
||||
"Defined X and y as dprep references, which are passed to automated machine learning in the AutoMLConfig."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = dprep.read_csv(path=ds.path('irisdata/X_train.csv'), infer_column_types=True)\n",
|
||||
"y = dprep.read_csv(path=ds.path('irisdata/y_train.csv'), infer_column_types=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
|
||||
"\n",
|
||||
"**Note:** Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
||||
"\n",
|
||||
"**Note:** When using AmlCompute, you can't pass Numpy arrays directly to the fit method.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n",
|
||||
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"automl_settings = {\n",
|
||||
" \"iteration_timeout_minutes\": 10,\n",
|
||||
" \"iterations\": 10,\n",
|
||||
" \"n_cross_validations\": 5,\n",
|
||||
" \"primary_metric\": 'AUC_weighted',\n",
|
||||
" \"preprocess\": True,\n",
|
||||
" \"max_concurrent_iterations\": 5,\n",
|
||||
" \"verbosity\": logging.INFO\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||
" debug_log = 'automl_errors.log',\n",
|
||||
" path = project_folder,\n",
|
||||
" run_configuration=conda_run_config,\n",
|
||||
" X = X,\n",
|
||||
" y = y,\n",
|
||||
" enable_onnx_compatible_models=True, # This will generate ONNX compatible models.\n",
|
||||
" **automl_settings\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"remote_run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Results\n",
|
||||
"\n",
|
||||
"#### Loading executed runs\n",
|
||||
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Widget for Monitoring Runs\n",
|
||||
"\n",
|
||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||
"\n",
|
||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||
"\n",
|
||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"remote_run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"RunDetails(remote_run).show() "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Wait until the run finishes.\n",
|
||||
"remote_run.wait_for_completion(show_output = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Cancelling Runs\n",
|
||||
"\n",
|
||||
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||
"# remote_run.cancel()\n",
|
||||
"\n",
|
||||
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||
"# remote_run.cancel_iteration(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieve the Best ONNX Model\n",
|
||||
"\n",
|
||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
||||
"\n",
|
||||
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"best_run, onnx_mdl = remote_run.get_output(return_onnx_model=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Save the best ONNX model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
||||
"onnx_fl_path = \"./best_model.onnx\"\n",
|
||||
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Predict with the ONNX model, using onnxruntime package"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"import json\n",
|
||||
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
|
||||
"from azureml.train.automl import constants\n",
|
||||
"\n",
|
||||
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
|
||||
" python_version_compatible = True\n",
|
||||
"else:\n",
|
||||
" python_version_compatible = False\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" import onnxruntime\n",
|
||||
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
|
||||
" onnxrt_present = True\n",
|
||||
"except ImportError:\n",
|
||||
" onnxrt_present = False\n",
|
||||
"\n",
|
||||
"def get_onnx_res(run):\n",
|
||||
" res_path = 'onnx_resource.json'\n",
|
||||
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
||||
" with open(res_path) as f:\n",
|
||||
" onnx_res = json.load(f)\n",
|
||||
" return onnx_res\n",
|
||||
"\n",
|
||||
"if onnxrt_present and python_version_compatible: \n",
|
||||
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
||||
" onnx_res = get_onnx_res(best_run)\n",
|
||||
"\n",
|
||||
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
|
||||
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
|
||||
"\n",
|
||||
" print(pred_onnx)\n",
|
||||
" print(pred_prob_onnx)\n",
|
||||
"else:\n",
|
||||
" if not python_version_compatible:\n",
|
||||
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
||||
" if not onnxrt_present:\n",
|
||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "savitam"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
name: auto-ml-remote-amlcompute-with-onnx
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-train-automl
|
||||
- azureml-widgets
|
||||
- matplotlib
|
||||
- pandas_ml
|
||||
- onnxruntime
|
||||
@@ -233,8 +233,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = dprep.auto_read_file(path=ds.path('digitsdata/X_train.csv'))\n",
|
||||
"y = dprep.auto_read_file(path=ds.path('digitsdata/y_train.csv'))"
|
||||
"X = dprep.read_csv(path=ds.path('digitsdata/X_train.csv'), infer_column_types=True)\n",
|
||||
"y = dprep.read_csv(path=ds.path('digitsdata/y_train.csv'), infer_column_types=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -87,7 +87,7 @@ These instruction setup the integration for SQL Server 2017 on Windows.
|
||||
sudo /opt/mssql/mlservices/bin/python/python -m pip install --upgrade sklearn
|
||||
```
|
||||
7. Start SQL Server.
|
||||
8. Execute the files aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql and AutoMLTrain.sql in SQL Server Management Studio.
|
||||
8. Execute the files aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql, AutoMLForecast.sql and AutoMLTrain.sql in SQL Server Management Studio.
|
||||
9. Create an Azure Machine Learning Workspace. You can use the instructions at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)
|
||||
10. Create a config.json file file using the subscription id, resource group name and workspace name that you use to create the workspace. The file is described at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace)
|
||||
11. Create an Azure service principal. You can do this with the commands:
|
||||
@@ -109,5 +109,5 @@ First you need to load the sample data in the database.
|
||||
|
||||
You can then run the queries in the energy-demand folder:
|
||||
* TrainEnergyDemand.sql runs AutoML, trains multiple models on data and selects the best model.
|
||||
* PredictEnergyDemand.sql predicts based on the most recent training run.
|
||||
* ForecastEnergyDemand.sql forecasts based on the most recent training run.
|
||||
* GetMetrics.sql returns all the metrics for each model in the most recent training run.
|
||||
|
||||
@@ -593,7 +593,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||
"# Get the featurization summary as a list of JSON\n",
|
||||
"featurization_summary = fitted_model.named_steps['datatransformer'].get_featurization_summary()\n",
|
||||
"# View the featurization summary as a pandas dataframe\n",
|
||||
"pd.DataFrame.from_records(featurization_summary)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Using Databricks as a Compute Target from Azure Machine Learning Pipeline\n",
|
||||
"To use Databricks as a compute target from [Azure Machine Learning Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines), a [DatabricksStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py) is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.\n",
|
||||
"To use Databricks as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a [DatabricksStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py) is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.\n",
|
||||
"\n",
|
||||
"The notebook will show:\n",
|
||||
"1. Running an arbitrary Databricks notebook that the customer has in Databricks workspace\n",
|
||||
@@ -675,7 +675,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: ADLA as a Compute Target\n",
|
||||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](https://aka.ms/pl-adla) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
## Using data drift APIs
|
||||
|
||||
1. [Detect data drift for a model](azure-ml-datadrift.ipynb): Detect data drift for a deployed model.
|
||||
@@ -77,7 +77,7 @@
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -108,11 +108,11 @@
|
||||
"source": [
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
||||
" model_name = \"sklearn_regression_model.pkl\",\n",
|
||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace = ws)"
|
||||
"model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n",
|
||||
" model_name=\"sklearn_regression_model.pkl\",\n",
|
||||
" tags={'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description=\"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace=ws)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -177,7 +177,7 @@
|
||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||
"from azureml.exceptions import WebserviceException\n",
|
||||
"\n",
|
||||
"deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)\n",
|
||||
"deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n",
|
||||
"aci_service_name = 'aciservice1'\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
@@ -215,7 +215,7 @@
|
||||
" [10,9,8,7,6,5,4,3,2,1]\n",
|
||||
"]})\n",
|
||||
"\n",
|
||||
"test_sample_encoded = bytes(test_sample,encoding = 'utf8')\n",
|
||||
"test_sample_encoded = bytes(test_sample, encoding='utf8')\n",
|
||||
"prediction = service.run(input_data=test_sample_encoded)\n",
|
||||
"print(prediction)"
|
||||
]
|
||||
@@ -247,15 +247,38 @@
|
||||
"source": [
|
||||
"### Model Profiling\n",
|
||||
"\n",
|
||||
"you can also take advantage of profiling feature for model\n",
|
||||
"You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
"profile = model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n",
|
||||
"profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n",
|
||||
"profile.wait_for_profiling(True)\n",
|
||||
"profiling_results = profile.get_results()\n",
|
||||
"print(profiling_results)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Model Packaging\n",
|
||||
"\n",
|
||||
"If you want to build a Docker image that encapsulates your model and its dependencies, you can use the model packaging option. The output image will be pushed to your workspace's ACR.\n",
|
||||
"\n",
|
||||
"You must include an Environment object in your inference configuration to use `Model.package()`.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"package = Model.package(ws, [model], inference_config)\n",
|
||||
"package.wait_for_creation(show_output=True) # Or show_output=False to hide the Docker build logs.\n",
|
||||
"package.pull()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Instead of a fully-built image, you can also generate a Dockerfile and download all the assets needed to build an image on top of your Environment.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"package = Model.package(ws, [model], inference_config, generate_dockerfile=True)\n",
|
||||
"package.wait_for_creation(show_output=True)\n",
|
||||
"package.save(\"./local_context_dir\")\n",
|
||||
"```"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -72,7 +72,7 @@
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -103,11 +103,11 @@
|
||||
"source": [
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
||||
" model_name = \"sklearn_regression_model.pkl\",\n",
|
||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace = ws)"
|
||||
"model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n",
|
||||
" model_name=\"sklearn_regression_model.pkl\",\n",
|
||||
" tags={'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description=\"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace=ws)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -127,10 +127,10 @@
|
||||
"\n",
|
||||
"source_directory = \"C:/abc\"\n",
|
||||
"\n",
|
||||
"os.makedirs(source_directory, exist_ok = True)\n",
|
||||
"os.makedirs(\"C:/abc/x/y\", exist_ok = True)\n",
|
||||
"os.makedirs(\"C:/abc/env\", exist_ok = True)\n",
|
||||
"os.makedirs(\"C:/abc/dockerstep\", exist_ok = True)"
|
||||
"os.makedirs(source_directory, exist_ok=True)\n",
|
||||
"os.makedirs(\"C:/abc/x/y\", exist_ok=True)\n",
|
||||
"os.makedirs(\"C:/abc/env\", exist_ok=True)\n",
|
||||
"os.makedirs(\"C:/abc/dockerstep\", exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -253,7 +253,7 @@
|
||||
"from azureml.core.model import InferenceConfig\n",
|
||||
"\n",
|
||||
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
||||
" runtime= \"python\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" entry_script=\"x/y/score.py\",\n",
|
||||
" conda_file=\"env/myenv.yml\", \n",
|
||||
" extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")"
|
||||
@@ -271,15 +271,10 @@
|
||||
"\n",
|
||||
"NOTE:\n",
|
||||
"\n",
|
||||
"we require docker running with linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running\n",
|
||||
"The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running:\n",
|
||||
"\n",
|
||||
" powershell command to switch to linux engine\n",
|
||||
" & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine\n",
|
||||
"\n",
|
||||
"and c drive is shared https://docs.docker.com/docker-for-windows/#shared-drives\n",
|
||||
"sometimes you have to reshare c drive as docker \n",
|
||||
"\n",
|
||||
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
||||
" # PowerShell command to switch to Linux engine\n",
|
||||
" & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -295,7 +290,7 @@
|
||||
"source": [
|
||||
"from azureml.core.webservice import LocalWebservice\n",
|
||||
"\n",
|
||||
"#this is optional, if not provided we choose random port\n",
|
||||
"# This is optional, if not provided Docker will choose a random unused port.\n",
|
||||
"deployment_config = LocalWebservice.deploy_configuration(port=6789)\n",
|
||||
"\n",
|
||||
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
||||
@@ -427,9 +422,8 @@
|
||||
"local_service.reload()\n",
|
||||
"print(\"--------------------------------------------------------------\")\n",
|
||||
"\n",
|
||||
"# after reload now if you call run this will return updated return message\n",
|
||||
"\n",
|
||||
"print(local_service.run(input_data=sample_input))"
|
||||
"# After calling reload(), run() will return the updated message.\n",
|
||||
"local_service.run(input_data=sample_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -442,9 +436,9 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
"local_service.update(models = [SomeOtherModelObject],\n",
|
||||
" deployment_config = local_config,\n",
|
||||
" inference_config = inference_config)\n",
|
||||
"local_service.update(models=[SomeOtherModelObject],\n",
|
||||
" deployment_config=local_config,\n",
|
||||
" inference_config=inference_config)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -468,7 +462,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "raymondl"
|
||||
"name": "keriehm"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
|
||||
@@ -68,7 +68,7 @@
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -99,11 +99,31 @@
|
||||
"source": [
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
||||
" model_name = \"sklearn_regression_model.pkl\",\n",
|
||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace = ws)"
|
||||
"model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n",
|
||||
" model_name=\"sklearn_regression_model.pkl\",\n",
|
||||
" tags={'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||
" description=\"Ridge regression model to predict diabetes\",\n",
|
||||
" workspace=ws)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create Environment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"from azureml.core.environment import Environment\n",
|
||||
"\n",
|
||||
"environment = Environment(\"LocalDeploy\")\n",
|
||||
"environment.python.conda_dependencies = CondaDependencies(\"myenv.yml\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -121,9 +141,8 @@
|
||||
"source": [
|
||||
"from azureml.core.model import InferenceConfig\n",
|
||||
"\n",
|
||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
||||
" entry_script=\"score.py\",\n",
|
||||
" conda_file=\"myenv.yml\")"
|
||||
"inference_config = InferenceConfig(entry_script=\"score.py\",\n",
|
||||
" environment=environment)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,15 +157,10 @@
|
||||
"\n",
|
||||
"NOTE:\n",
|
||||
"\n",
|
||||
"we require docker running with linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running\n",
|
||||
"The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running:\n",
|
||||
"\n",
|
||||
" powershell command to switch to linux engine\n",
|
||||
" & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine\n",
|
||||
"\n",
|
||||
"and c drive is shared https://docs.docker.com/docker-for-windows/#shared-drives\n",
|
||||
"sometimes you have to reshare c drive as docker \n",
|
||||
"\n",
|
||||
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
||||
" # PowerShell command to switch to Linux engine\n",
|
||||
" & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -157,7 +171,7 @@
|
||||
"source": [
|
||||
"from azureml.core.webservice import LocalWebservice\n",
|
||||
"\n",
|
||||
"#this is optional, if not provided we choose random port\n",
|
||||
"# This is optional, if not provided Docker will choose a random unused port.\n",
|
||||
"deployment_config = LocalWebservice.deploy_configuration(port=6789)\n",
|
||||
"\n",
|
||||
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
||||
@@ -221,7 +235,7 @@
|
||||
"\n",
|
||||
"sample_input = bytes(sample_input, encoding='utf-8')\n",
|
||||
"\n",
|
||||
"print(local_service.run(input_data=sample_input))"
|
||||
"local_service.run(input_data=sample_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -282,9 +296,8 @@
|
||||
"local_service.reload()\n",
|
||||
"print(\"--------------------------------------------------------------\")\n",
|
||||
"\n",
|
||||
"# after reload now if you call run this will return updated return message\n",
|
||||
"\n",
|
||||
"print(local_service.run(input_data=sample_input))"
|
||||
"# After calling reload(), run() will return the updated message.\n",
|
||||
"local_service.run(input_data=sample_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -296,10 +309,9 @@
|
||||
"If you want to change your model(s), Conda dependencies, or deployment configuration, call `update()` to rebuild the Docker image.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"\n",
|
||||
"local_service.update(models = [SomeOtherModelObject],\n",
|
||||
" deployment_config = local_config,\n",
|
||||
" inference_config = inference_config)\n",
|
||||
"local_service.update(models=[SomeOtherModelObject],\n",
|
||||
" inference_config=inference_config,\n",
|
||||
" deployment_config=local_config)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -323,7 +335,7 @@
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "raymondl"
|
||||
"name": "keriehm"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
|
||||
@@ -12,7 +12,7 @@ Easily create and train a model using various deep neural networks (DNNs) as a f
|
||||
To learn more about the azureml-accel-model classes, see the section [Model Classes](#model-classes) below or the [Azure ML Accel Models SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py).
|
||||
|
||||
### Step 1: Create an Azure ML workspace
|
||||
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
|
||||
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
|
||||
|
||||
### Step 2: Check your FPGA quota
|
||||
Use the Azure CLI to check whether you have quota.
|
||||
|
||||
@@ -1,5 +1,12 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -230,11 +237,14 @@
|
||||
"\n",
|
||||
"# Convert model\n",
|
||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
|
||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
||||
"convert_request.wait_for_completion(show_output=False)\n",
|
||||
"converted_model = convert_request.result\n",
|
||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||
" # If the above call succeeded, get the converted model\n",
|
||||
" converted_model = convert_request.result\n",
|
||||
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||
"else:\n",
|
||||
" print(\"Model conversion failed. Showing output.\")\n",
|
||||
" convert_request.wait_for_completion(show_output = True)\n",
|
||||
"\n",
|
||||
"# Package into AccelContainerImage\n",
|
||||
"image_config = AccelContainerImage.image_configuration()\n",
|
||||
@@ -298,6 +308,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"aks_target.wait_for_completion(show_output = True)\n",
|
||||
"print(aks_target.provisioning_state)\n",
|
||||
"print(aks_target.provisioning_errors)"
|
||||
@@ -316,6 +327,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||
"\n",
|
||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||
@@ -342,10 +354,9 @@
|
||||
"## 5. Test the service\n",
|
||||
"<a id=\"create-client\"></a>\n",
|
||||
"### 5.a. Create Client\n",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||
]
|
||||
},
|
||||
@@ -356,18 +367,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||
"from azureml.accel.client import PredictionClient\n",
|
||||
"\n",
|
||||
"address = aks_service.scoring_uri\n",
|
||||
"ssl_enabled = address.startswith(\"https\")\n",
|
||||
"address = address[address.find('/')+2:].strip('/')\n",
|
||||
"port = 443 if ssl_enabled else 80\n",
|
||||
"from azureml.accel import client_from_service\n",
|
||||
"\n",
|
||||
"# Initialize AzureML Accelerated Models client\n",
|
||||
"client = PredictionClient(address=address,\n",
|
||||
" port=port,\n",
|
||||
" use_ssl=ssl_enabled,\n",
|
||||
" service_name=aks_service.name)"
|
||||
"client = client_from_service(aks_service)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -486,7 +489,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.0"
|
||||
"version": "3.5.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -0,0 +1,8 @@
|
||||
name: accelerated-models-object-detection
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-accel-models
|
||||
- tensorflow
|
||||
- opencv-python
|
||||
- matplotlib
|
||||
@@ -1,5 +1,12 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -270,12 +277,15 @@
|
||||
"from azureml.accel import AccelOnnxConverter\n",
|
||||
"\n",
|
||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
||||
"convert_request.wait_for_completion(show_output = False)\n",
|
||||
"# If the above call succeeded, get the converted model\n",
|
||||
"converted_model = convert_request.result\n",
|
||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')"
|
||||
"\n",
|
||||
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||
" # If the above call succeeded, get the converted model\n",
|
||||
" converted_model = convert_request.result\n",
|
||||
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||
"else:\n",
|
||||
" print(\"Model conversion failed. Showing output.\")\n",
|
||||
" convert_request.wait_for_completion(show_output = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -366,6 +376,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"aks_target.wait_for_completion(show_output = True)\n",
|
||||
"print(aks_target.provisioning_state)\n",
|
||||
"print(aks_target.provisioning_errors)"
|
||||
@@ -384,9 +395,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||
"\n",
|
||||
"#Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||
"# Authentication is enabled by default, but for testing we specify False\n",
|
||||
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
|
||||
" num_replicas=1,\n",
|
||||
@@ -415,10 +427,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 7.a. Create Client\n",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions.\n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||
]
|
||||
},
|
||||
@@ -429,18 +440,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||
"from azureml.accel.client import PredictionClient\n",
|
||||
"\n",
|
||||
"address = aks_service.scoring_uri\n",
|
||||
"ssl_enabled = address.startswith(\"https\")\n",
|
||||
"address = address[address.find('/')+2:].strip('/')\n",
|
||||
"port = 443 if ssl_enabled else 80\n",
|
||||
"from azureml.accel import client_from_service\n",
|
||||
"\n",
|
||||
"# Initialize AzureML Accelerated Models client\n",
|
||||
"client = PredictionClient(address=address,\n",
|
||||
" port=port,\n",
|
||||
" use_ssl=ssl_enabled,\n",
|
||||
" service_name=aks_service.name)"
|
||||
"client = client_from_service(aks_service)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -540,7 +543,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.0"
|
||||
"version": "3.5.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
name: accelerated-models-quickstart
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-accel-models
|
||||
- tensorflow
|
||||
@@ -1,5 +1,12 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -410,6 +417,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"# Launch the training\n",
|
||||
"tf.reset_default_graph()\n",
|
||||
"sess = tf.Session(graph=tf.get_default_graph())\n",
|
||||
@@ -582,11 +590,14 @@
|
||||
"\n",
|
||||
"# Convert model\n",
|
||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
||||
"convert_request.wait_for_completion(show_output=False)\n",
|
||||
"converted_model = convert_request.result\n",
|
||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||
" # If the above call succeeded, get the converted model\n",
|
||||
" converted_model = convert_request.result\n",
|
||||
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||
"else:\n",
|
||||
" print(\"Model conversion failed. Showing output.\")\n",
|
||||
" convert_request.wait_for_completion(show_output = True)\n",
|
||||
"\n",
|
||||
"# Package into AccelContainerImage\n",
|
||||
"image_config = AccelContainerImage.image_configuration()\n",
|
||||
@@ -655,6 +666,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"aks_target.wait_for_completion(show_output = True)\n",
|
||||
"print(aks_target.provisioning_state)\n",
|
||||
"print(aks_target.provisioning_errors)"
|
||||
@@ -673,6 +685,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||
"\n",
|
||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||
@@ -700,10 +713,9 @@
|
||||
"\n",
|
||||
"<a id=\"create-client\"></a>\n",
|
||||
"### 9.a. Create Client\n",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||
"\n",
|
||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||
]
|
||||
},
|
||||
@@ -714,18 +726,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||
"from azureml.accel.client import PredictionClient\n",
|
||||
"\n",
|
||||
"address = aks_service.scoring_uri\n",
|
||||
"ssl_enabled = address.startswith(\"https\")\n",
|
||||
"address = address[address.find('/')+2:].strip('/')\n",
|
||||
"port = 443 if ssl_enabled else 80\n",
|
||||
"from azureml.accel import client_from_service\n",
|
||||
"\n",
|
||||
"# Initialize AzureML Accelerated Models client\n",
|
||||
"client = PredictionClient(address=address,\n",
|
||||
" port=port,\n",
|
||||
" use_ssl=ssl_enabled,\n",
|
||||
" service_name=aks_service.name)"
|
||||
"client = client_from_service(aks_service)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -854,7 +858,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.0"
|
||||
"version": "3.5.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
name: accelerated-models-training
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-accel-models
|
||||
- tensorflow
|
||||
- keras
|
||||
- tqdm
|
||||
- sklearn
|
||||
@@ -470,7 +470,27 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
}
|
||||
},
|
||||
"friendly_name": "Prepare data for regression modeling",
|
||||
"exclude_from_index": false,
|
||||
"order_index": 1,
|
||||
"category": "deployment",
|
||||
"tags": [
|
||||
"featured"
|
||||
],
|
||||
"task": "Regression",
|
||||
"datasets": [
|
||||
"test"
|
||||
],
|
||||
"compute": [
|
||||
"localtest"
|
||||
],
|
||||
"deployment": [
|
||||
"AKS"
|
||||
],
|
||||
"framework": [
|
||||
"test1"
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
||||
@@ -1,8 +1,11 @@
|
||||
## Using explain model APIs
|
||||
|
||||
<a name="samples"></a>
|
||||
# Explain Model SDK Sample Notebooks
|
||||
|
||||
Follow these sample notebooks to learn:
|
||||
|
||||
1. [Explain tabular data locally](explain-tabular-data-local): Basic example of explaining model trained on tabular data.
|
||||
4. [Explain on remote AMLCompute](explain-on-amlcompute): Explain a model on a remote AMLCompute target.
|
||||
5. [Explain tabular data with Run History](explain-tabular-data-run-history): Explain a model with Run History.
|
||||
7. [Explain raw features](explain-tabular-data-raw-features): Explain the raw features of a trained model.
|
||||
1. [Explain tabular data locally](tabular-data): Basic examples of explaining model trained on tabular data.
|
||||
2. [Explain on remote AMLCompute](azure-integration/remote-explanation): Explain a model on a remote AMLCompute target.
|
||||
3. [Explain tabular data with Run History](azure-integration/run-history): Explain a model with Run History.
|
||||
4. [Operationalize model explanation](azure-integration/scoring-time): Operationalize model explanation as a web service.
|
||||
|
||||
@@ -0,0 +1,645 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Save and retrieve explanations via Azure Machine Learning Run History\n",
|
||||
"\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to save and retrieve classification model explanations to/from Azure Machine Learning Run History.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Apply feature transformations\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model on raw features\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Upload model explanations to Azure Machine Learning Run History](#Upload)\n",
|
||||
"1. [Download model explanations from Azure Machine Learning Run History](#Download)\n",
|
||||
"1. [Visualize explanations](#Visualize)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook showcases how to explain a classification model predictions locally at training time, upload explanations to the Azure Machine Learning's run history, and download previously-uploaded explanations from the Run History.\n",
|
||||
"It demonstrates the API calls that you need to make to upload/download the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and downloaded explanations.\n",
|
||||
"\n",
|
||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Problem: IBM employee attrition classification with scikit-learn (run model explainer locally and upload explanation to the Azure Machine Learning Run History)\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||
"from sklearn.svm import SVC\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the IBM employee attrition data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the IBM employee attrition dataset\n",
|
||||
"outdirname = 'dataset.6.21.19'\n",
|
||||
"try:\n",
|
||||
" from urllib import urlretrieve\n",
|
||||
"except ImportError:\n",
|
||||
" from urllib.request import urlretrieve\n",
|
||||
"import zipfile\n",
|
||||
"zipfilename = outdirname + '.zip'\n",
|
||||
"urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
|
||||
"with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
|
||||
" unzip.extractall('.')\n",
|
||||
"attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')\n",
|
||||
"\n",
|
||||
"# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
|
||||
"attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
|
||||
"# Dropping Employee Number since it is merely an identifier\n",
|
||||
"attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
|
||||
"\n",
|
||||
"attritionData = attritionData.drop(['Over18'], axis=1)\n",
|
||||
"\n",
|
||||
"# Since all values are 80\n",
|
||||
"attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
|
||||
"\n",
|
||||
"# Converting target variables from string to numerical values\n",
|
||||
"target_map = {'Yes': 1, 'No': 0}\n",
|
||||
"attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
|
||||
"target = attritionData[\"Attrition_numerical\"]\n",
|
||||
"\n",
|
||||
"attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(attritionXData, \n",
|
||||
" target, \n",
|
||||
" test_size = 0.2,\n",
|
||||
" random_state=0,\n",
|
||||
" stratify=target)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating dummy columns for each categorical feature\n",
|
||||
"categorical = []\n",
|
||||
"for col, value in attritionXData.iteritems():\n",
|
||||
" if value.dtype == 'object':\n",
|
||||
" categorical.append(col)\n",
|
||||
" \n",
|
||||
"# Store the numerical columns in a list numerical\n",
|
||||
"numerical = attritionXData.columns.difference(categorical) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Transform raw features"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.compose import ColumnTransformer\n",
|
||||
"\n",
|
||||
"# We create the preprocessing pipelines for both numeric and categorical data.\n",
|
||||
"numeric_transformer = Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())])\n",
|
||||
"\n",
|
||||
"categorical_transformer = Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),\n",
|
||||
" ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n",
|
||||
"\n",
|
||||
"transformations = ColumnTransformer(\n",
|
||||
" transformers=[\n",
|
||||
" ('num', numeric_transformer, numerical),\n",
|
||||
" ('cat', categorical_transformer, categorical)])\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', SVC(kernel='linear', C = 1.0, probability=True))])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"'''\n",
|
||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
||||
"#!pip install sklearn-pandas\n",
|
||||
"from sklearn_pandas import DataFrameMapper\n",
|
||||
"\n",
|
||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"numeric_transformations = [([f], Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])) for f in numerical]\n",
|
||||
"\n",
|
||||
"categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]\n",
|
||||
"\n",
|
||||
"transformations = numeric_transformations + categorical_transformations\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', SVC(kernel='linear', C = 1.0, probability=True))]) \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"'''"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"# clf.steps[-1][1] returns the trained classification model\n",
|
||||
"explainer = TabularExplainer(clf.steps[-1][1], \n",
|
||||
" initialization_examples=x_train, \n",
|
||||
" features=attritionXData.columns, \n",
|
||||
" classes=[\"Not leaving\", \"leaving\"], \n",
|
||||
" transformations=transformations)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=attritionXData.columns, \n",
|
||||
"# classes=[\"Not leaving\", \"leaving\"], \n",
|
||||
"# transformations=transformations)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
|
||||
"# features=x_train.columns, \n",
|
||||
"# transformations=transformations,\n",
|
||||
"# classes=[\"Not leaving\", \"leaving\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"\n",
|
||||
"# Note: PFIExplainer does not support per class explanations\n",
|
||||
"# Per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# Per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"instance_num = 1\n",
|
||||
"local_explanation = explainer.explain_local(x_test[:instance_num])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Upload\n",
|
||||
"Upload explanations to Azure Machine Learning Run History"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"from azureml.core import Workspace, Experiment, Run\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'explain_model'\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"run = experiment.start_logging()\n",
|
||||
"client = ExplanationClient.from_run(run)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Uploading model explanation data for storage or visualization in webUX\n",
|
||||
"# The explanation can then be downloaded on any compute\n",
|
||||
"# Multiple explanations can be uploaded\n",
|
||||
"client.upload_model_explanation(global_explanation, comment='global explanation: all features')\n",
|
||||
"# Or you can only upload the explanation object with the top k feature info\n",
|
||||
"#client.upload_model_explanation(global_explanation, top_k=2, comment='global explanation: Only top 2 features')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Uploading model explanation data for storage or visualization in webUX\n",
|
||||
"# The explanation can then be downloaded on any compute\n",
|
||||
"# Multiple explanations can be uploaded\n",
|
||||
"client.upload_model_explanation(local_explanation, comment='local explanation for test point 1: all features')\n",
|
||||
"\n",
|
||||
"# Alterntively, you can only upload the local explanation object with the top k feature info\n",
|
||||
"#client.upload_model_explanation(local_explanation, top_k=2, comment='local explanation: top 2 features')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Download\n",
|
||||
"Download explanations from Azure Machine Learning Run History"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# List uploaded explanations\n",
|
||||
"client.list_model_explanations()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for explanation in client.list_model_explanations():\n",
|
||||
" \n",
|
||||
" if explanation['comment'] == 'local explanation for test point 1: all features':\n",
|
||||
" downloaded_local_explanation = client.download_model_explanation(explanation_id=explanation['id'])\n",
|
||||
" # You can pass a k value to only download the top k feature importance values\n",
|
||||
" downloaded_local_explanation_top2 = client.download_model_explanation(top_k=2, explanation_id=explanation['id'])\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" elif explanation['comment'] == 'global explanation: all features':\n",
|
||||
" downloaded_global_explanation = client.download_model_explanation(explanation_id=explanation['id'])\n",
|
||||
" # You can pass a k value to only download the top k feature importance values\n",
|
||||
" downloaded_global_explanation_top2 = client.download_model_explanation(top_k=2, explanation_id=explanation['id'])\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(downloaded_global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n",
|
||||
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
name: explain-run-history-sklearn-regression
|
||||
name: save-retrieve-explanations-run-history
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -0,0 +1,33 @@
|
||||
import json
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import os
|
||||
import pickle
|
||||
from sklearn.externals import joblib
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from azureml.core.model import Model
|
||||
|
||||
|
||||
def init():
|
||||
|
||||
global original_model
|
||||
global scoring_explainer
|
||||
|
||||
# Retrieve the path to the model file using the model name
|
||||
# Assume original model is named original_prediction_model
|
||||
original_model_path = Model.get_model_path('original_model')
|
||||
scoring_explainer_path = Model.get_model_path('IBM_attrition_explainer')
|
||||
|
||||
original_model = joblib.load(original_model_path)
|
||||
scoring_explainer = joblib.load(scoring_explainer_path)
|
||||
|
||||
|
||||
def run(raw_data):
|
||||
# Get predictions and explanations for each data point
|
||||
data = pd.read_json(raw_data)
|
||||
# Make prediction
|
||||
predictions = original_model.predict(data)
|
||||
# Retrieve model explanations
|
||||
local_importance_values = scoring_explainer.explain(data)
|
||||
# You can return any data type as long as it is JSON-serializable
|
||||
return {'predictions': predictions.tolist(), 'local_importance_values': local_importance_values}
|
||||
@@ -0,0 +1,513 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Train and explain models locally and deploy model and scoring explainer\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"_**This notebook illustrates how to use the Azure Machine Learning Interpretability SDK to deploy a locally-trained model and its corresponding scoring explainer to Azure Container Instances (ACI) as a web service.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Problem: IBM employee attrition classification with scikit-learn (train and explain a model locally and use Azure Container Instances (ACI) for deploying your model and its corresponding scoring explainer as a web service.)\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Apply feature transformations\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model on raw features\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize explanations](#Visualize)\n",
|
||||
"1. [Deploy model and scoring explainer](#Deploy)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook showcases how to train and explain a classification model locally, and deploy the trained model and its corresponding explainer to Azure Container Instances (ACI).\n",
|
||||
"It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations. It also demonstrates how to use Azure Machine Learning MLOps capabilities to deploy your model and its corresponding explainer.\n",
|
||||
"\n",
|
||||
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP) and follow these steps:\n",
|
||||
"1.\tDevelop a machine learning script in Python which involves the training script and the explanation script.\n",
|
||||
"2.\tRun the script locally.\n",
|
||||
"3.\tUse the interpretability toolkit\u00e2\u20ac\u2122s visualization dashboard to visualize predictions and their explanation. If the metrics and explanations don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.\n",
|
||||
"5.\tAfter a satisfactory run is found, create a scoring explainer and register the persisted model and its corresponding explainer in the model registry.\n",
|
||||
"6.\tDevelop a scoring script.\n",
|
||||
"7.\tCreate an image and register it in the image registry.\n",
|
||||
"8.\tDeploy the image as a web service in Azure.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"Make sure you go through the [configuration notebook](../../../../configuration.ipynb) first if you haven't."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Check core SDK version number\n",
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialize a Workspace\n",
|
||||
"\n",
|
||||
"Initialize a workspace object from persisted configuration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"create workspace"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"Create An Experiment: **Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Experiment\n",
|
||||
"experiment_name = 'explain_model_at_scoring_time'\n",
|
||||
"experiment = Experiment(workspace=ws, name=experiment_name)\n",
|
||||
"run = experiment.start_logging()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get IBM attrition data\n",
|
||||
"import os\n",
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"outdirname = 'dataset.6.21.19'\n",
|
||||
"try:\n",
|
||||
" from urllib import urlretrieve\n",
|
||||
"except ImportError:\n",
|
||||
" from urllib.request import urlretrieve\n",
|
||||
"import zipfile\n",
|
||||
"zipfilename = outdirname + '.zip'\n",
|
||||
"urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
|
||||
"with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
|
||||
" unzip.extractall('.')\n",
|
||||
"attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"from sklearn.ensemble import RandomForestClassifier\n",
|
||||
"from sklearn_pandas import DataFrameMapper\n",
|
||||
"\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"os.makedirs('./outputs', exist_ok=True)\n",
|
||||
"\n",
|
||||
"# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
|
||||
"attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
|
||||
"# Dropping Employee Number since it is merely an identifier\n",
|
||||
"attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
|
||||
"attritionData = attritionData.drop(['Over18'], axis=1)\n",
|
||||
"# Since all values are 80\n",
|
||||
"attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
|
||||
"\n",
|
||||
"# Converting target variables from string to numerical values\n",
|
||||
"target_map = {'Yes': 1, 'No': 0}\n",
|
||||
"attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
|
||||
"target = attritionData[\"Attrition_numerical\"]\n",
|
||||
"\n",
|
||||
"attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)\n",
|
||||
"\n",
|
||||
"# Creating dummy columns for each categorical feature\n",
|
||||
"categorical = []\n",
|
||||
"for col, value in attritionXData.iteritems():\n",
|
||||
" if value.dtype == 'object':\n",
|
||||
" categorical.append(col)\n",
|
||||
"\n",
|
||||
"# Store the numerical columns in a list numerical\n",
|
||||
"numerical = attritionXData.columns.difference(categorical)\n",
|
||||
"\n",
|
||||
"numeric_transformations = [([f], Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())])) for f in numerical]\n",
|
||||
"\n",
|
||||
"categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]\n",
|
||||
"\n",
|
||||
"transformations = numeric_transformations + categorical_transformations\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
||||
" ('classifier', RandomForestClassifier())])\n",
|
||||
"\n",
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(attritionXData,\n",
|
||||
" target,\n",
|
||||
" test_size = 0.2,\n",
|
||||
" random_state=0,\n",
|
||||
" stratify=target)\n",
|
||||
"\n",
|
||||
"# preprocess the data and fit the classification model\n",
|
||||
"clf.fit(x_train, y_train)\n",
|
||||
"model = clf.steps[-1][1]\n",
|
||||
"\n",
|
||||
"model_file_name = 'log_reg.pkl'\n",
|
||||
"\n",
|
||||
"# save model in the outputs folder so it automatically get uploaded\n",
|
||||
"with open(model_file_name, 'wb') as file:\n",
|
||||
" joblib.dump(value=clf, filename=os.path.join('./outputs/',\n",
|
||||
" model_file_name))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Explain predictions on your local machine\n",
|
||||
"tabular_explainer = TabularExplainer(model, \n",
|
||||
" initialization_examples=x_train, \n",
|
||||
" features=attritionXData.columns, \n",
|
||||
" classes=[\"Not leaving\", \"leaving\"], \n",
|
||||
" transformations=transformations)\n",
|
||||
"\n",
|
||||
"# Explain overall model predictions (global explanation)\n",
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations it will\n",
|
||||
"# take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save\n",
|
||||
"# ScoringExplainer\n",
|
||||
"scoring_explainer = TreeScoringExplainer(tabular_explainer)\n",
|
||||
"# Pickle scoring explainer locally\n",
|
||||
"save(scoring_explainer, exist_ok=True)\n",
|
||||
"\n",
|
||||
"# Register original model\n",
|
||||
"run.upload_file('original_model.pkl', os.path.join('./outputs/', model_file_name))\n",
|
||||
"original_model = run.register_model(model_name='original_model', model_path='original_model.pkl')\n",
|
||||
"\n",
|
||||
"# Register scoring explainer\n",
|
||||
"run.upload_file('IBM_attrition_explainer.pkl', 'scoring_explainer.pkl')\n",
|
||||
"scoring_explainer_model = run.register_model(model_name='IBM_attrition_explainer', model_path='IBM_attrition_explainer.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Visualize the explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, clf, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy \n",
|
||||
"\n",
|
||||
"Deploy Model and ScoringExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||
" memory_gb=1, \n",
|
||||
" tags={\"data\": \"IBM_Attrition\", \n",
|
||||
" \"method\" : \"local_explanation\"}, \n",
|
||||
" description='Get local explanations for IBM Employee Attrition data')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||
"\n",
|
||||
"# WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)\n",
|
||||
"azureml_pip_packages = [\n",
|
||||
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
|
||||
" 'azureml-explain-model'\n",
|
||||
"]\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"# specify CondaDependencies obj\n",
|
||||
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],\n",
|
||||
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
|
||||
" pin_sdk_version=False)\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||
" f.write(myenv.serialize_to_string())\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"r\") as f:\n",
|
||||
" print(f.read())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile dockerfile\n",
|
||||
"RUN apt-get update && apt-get install -y g++ "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.model import Model\n",
|
||||
"# retrieve scoring explainer for deployment\n",
|
||||
"scoring_explainer_model = Model(ws, 'IBM_attrition_explainer')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"# Use the custom scoring, docker, and conda files we created above\n",
|
||||
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||
" docker_file=\"dockerfile\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" conda_file=\"myenv.yml\")\n",
|
||||
"\n",
|
||||
"# Use configs and models generated above\n",
|
||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||
" name='model-scoring',\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
" models=[scoring_explainer_model, original_model],\n",
|
||||
" image_config=image_config)\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Create data to test service with\n",
|
||||
"sample_data = '{\"Age\":{\"899\":49},\"BusinessTravel\":{\"899\":\"Travel_Rarely\"},\"DailyRate\":{\"899\":1098},\"Department\":{\"899\":\"Research & Development\"},\"DistanceFromHome\":{\"899\":4},\"Education\":{\"899\":2},\"EducationField\":{\"899\":\"Medical\"},\"EnvironmentSatisfaction\":{\"899\":1},\"Gender\":{\"899\":\"Male\"},\"HourlyRate\":{\"899\":85},\"JobInvolvement\":{\"899\":2},\"JobLevel\":{\"899\":5},\"JobRole\":{\"899\":\"Manager\"},\"JobSatisfaction\":{\"899\":3},\"MaritalStatus\":{\"899\":\"Married\"},\"MonthlyIncome\":{\"899\":18711},\"MonthlyRate\":{\"899\":12124},\"NumCompaniesWorked\":{\"899\":2},\"OverTime\":{\"899\":\"No\"},\"PercentSalaryHike\":{\"899\":13},\"PerformanceRating\":{\"899\":3},\"RelationshipSatisfaction\":{\"899\":3},\"StockOptionLevel\":{\"899\":1},\"TotalWorkingYears\":{\"899\":23},\"TrainingTimesLastYear\":{\"899\":2},\"WorkLifeBalance\":{\"899\":4},\"YearsAtCompany\":{\"899\":1},\"YearsInCurrentRole\":{\"899\":0},\"YearsSinceLastPromotion\":{\"899\":0},\"YearsWithCurrManager\":{\"899\":0}}'\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"headers = {'Content-Type':'application/json'}\n",
|
||||
"\n",
|
||||
"# send request to service\n",
|
||||
"resp = requests.post(service.scoring_uri, sample_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"POST to url\", service.scoring_uri)\n",
|
||||
"# can covert back to Python objects from json string if desired\n",
|
||||
"print(\"prediction:\", resp.text)\n",
|
||||
"result = json.loads(resp.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#plot the feature importance for the prediction\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt; plt.rcdefaults()\n",
|
||||
"\n",
|
||||
"labels = json.loads(sample_data)\n",
|
||||
"labels = labels.keys()\n",
|
||||
"objects = labels\n",
|
||||
"y_pos = np.arange(len(objects))\n",
|
||||
"performance = result[\"local_importance_values\"][0][0]\n",
|
||||
"\n",
|
||||
"plt.bar(y_pos, performance, align='center', alpha=0.5)\n",
|
||||
"plt.xticks(y_pos, objects)\n",
|
||||
"locs, labels = plt.xticks()\n",
|
||||
"plt.setp(labels, rotation=90)\n",
|
||||
"plt.ylabel('Feature impact - leaving vs not leaving')\n",
|
||||
"plt.title('Local feature importance for prediction')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n",
|
||||
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. [Inferencing time: deploy a remotely-trained model and explainer](./train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
name: explain-sklearn-raw-features
|
||||
name: train-explain-model-locally-and-deploy
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -13,33 +13,66 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Train using Azure Machine Learning Compute\n",
|
||||
"# Train and explain models remotely via Azure Machine Learning Compute and deploy model and scoring explainer\n",
|
||||
"\n",
|
||||
"* Initialize a Workspace\n",
|
||||
"* Create an Experiment\n",
|
||||
"* Introduction to AmlCompute\n",
|
||||
"* Submit an AmlCompute run in a few different ways\n",
|
||||
" - Provision as a run based compute target \n",
|
||||
" - Provision as a persistent compute target (Basic)\n",
|
||||
" - Provision as a persistent compute target (Advanced)\n",
|
||||
"* Additional operations to perform on AmlCompute\n",
|
||||
"* Download model explanation data from the Run History Portal\n",
|
||||
"* Print the explanation data"
|
||||
"\n",
|
||||
"_**This notebook illustrates how to use the Azure Machine Learning Interpretability SDK to train and explain a classification model remotely on an Azure Machine Leanrning Compute Target (AMLCompute), and use Azure Container Instances (ACI) for deploying your model and its corresponding scoring explainer as a web service.**_\n",
|
||||
"\n",
|
||||
"Problem: IBM employee attrition classification with scikit-learn (train a model and run an explainer remotely via AMLCompute, and deploy model and its corresponding explainer.)\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Apply feature transformations\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model on raw features\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Deploy model and scoring explainer](#Deploy)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook showcases how to train and explain a classification model remotely via Azure Machine Learning Compute (AMLCompute), download the calculated explanations locally for visualization and inspection, and deploy the final model and its corresponding explainer to Azure Container Instances (ACI).\n",
|
||||
"It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations, and using Azure Machine Learning MLOps capabilities to deploy your model and its corresponding explainer.\n",
|
||||
"\n",
|
||||
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP) and follow these steps:\n",
|
||||
"1.\tDevelop a machine learning script in Python which involves the training script and the explanation script.\n",
|
||||
"2.\tCreate and configure a compute target.\n",
|
||||
"3.\tSubmit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution (e.g., model, metrics, prediction explanations) are saved as runs in the workspace and grouped under experiments.\n",
|
||||
"4.\tQuery the experiment for logged metrics and explanations from the current and past runs. Use the interpretability toolkit\u00e2\u20ac\u2122s visualization dashboard to visualize predictions and their explanation. If the metrics and explanations don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.\n",
|
||||
"5.\tAfter a satisfactory run is found, create a scoring explainer and register the persisted model and its corresponding explainer in the model registry.\n",
|
||||
"6.\tDevelop a scoring script.\n",
|
||||
"7.\tCreate an image and register it in the image registry.\n",
|
||||
"8.\tDeploy the image as a web service in Azure.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"Make sure you go through the [configuration notebook](../../../../configuration.ipynb) first if you haven't."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -83,9 +116,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create An Experiment\n",
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
||||
"Create An Experiment: **Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -162,7 +195,7 @@
|
||||
"\n",
|
||||
"project_folder = './explainer-remote-run-on-amlcompute'\n",
|
||||
"os.makedirs(project_folder, exist_ok=True)\n",
|
||||
"shutil.copy('run_explainer.py', project_folder)"
|
||||
"shutil.copy('train_explain.py', project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -203,20 +236,25 @@
|
||||
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
|
||||
"run_config.environment.python.user_managed_dependencies = False\n",
|
||||
"\n",
|
||||
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
|
||||
"run_config.auto_prepare_environment = True\n",
|
||||
"\n",
|
||||
"azureml_pip_packages = [\n",
|
||||
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
|
||||
" 'azureml-explain-model'\n",
|
||||
"]\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# specify CondaDependencies obj\n",
|
||||
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
|
||||
" pip_packages=azureml_pip_packages)\n",
|
||||
"\n",
|
||||
" pip_packages=['sklearn_pandas', 'pyyaml'] + azureml_pip_packages,\n",
|
||||
" pin_sdk_version=False)\n",
|
||||
"# Now submit a run on AmlCompute\n",
|
||||
"from azureml.core.script_run_config import ScriptRunConfig\n",
|
||||
"\n",
|
||||
"script_run_config = ScriptRunConfig(source_directory=project_folder,\n",
|
||||
" script='run_explainer.py',\n",
|
||||
" script='train_explain.py',\n",
|
||||
" run_config=run_config)\n",
|
||||
"\n",
|
||||
"run = experiment.submit(script_run_config)\n",
|
||||
@@ -243,262 +281,6 @@
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Provision as a persistent compute target (Basic)\n",
|
||||
"\n",
|
||||
"You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
|
||||
"\n",
|
||||
"* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n",
|
||||
"* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||
"\n",
|
||||
"# Choose a name for your CPU cluster\n",
|
||||
"cpu_cluster_name = \"cpu-cluster\"\n",
|
||||
"\n",
|
||||
"# Verify that cluster does not exist already\n",
|
||||
"try:\n",
|
||||
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
|
||||
" print('Found existing cluster, use it.')\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
|
||||
" max_nodes=4)\n",
|
||||
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
|
||||
"\n",
|
||||
"cpu_cluster.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure & Run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.runconfig import RunConfiguration\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"\n",
|
||||
"# create a new RunConfig object\n",
|
||||
"run_config = RunConfiguration(framework=\"python\")\n",
|
||||
"\n",
|
||||
"# Set compute target to AmlCompute target created in previous step\n",
|
||||
"run_config.target = cpu_cluster.name\n",
|
||||
"\n",
|
||||
"# enable Docker \n",
|
||||
"run_config.environment.docker.enabled = True\n",
|
||||
"\n",
|
||||
"azureml_pip_packages = [\n",
|
||||
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
|
||||
" 'azureml-explain-model'\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# specify CondaDependencies obj\n",
|
||||
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
|
||||
" pip_packages=azureml_pip_packages)\n",
|
||||
"\n",
|
||||
"from azureml.core import Run\n",
|
||||
"from azureml.core import ScriptRunConfig\n",
|
||||
"\n",
|
||||
"src = ScriptRunConfig(source_directory=project_folder, \n",
|
||||
" script='run_explainer.py', \n",
|
||||
" run_config=run_config) \n",
|
||||
"run = experiment.submit(config=src)\n",
|
||||
"run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"# Shows output of the run on stdout.\n",
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.get_metrics()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Provision as a persistent compute target (Advanced)\n",
|
||||
"\n",
|
||||
"You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n",
|
||||
"\n",
|
||||
"In addition to `vm_size` and `max_nodes`, you can specify:\n",
|
||||
"* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
|
||||
"* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
|
||||
"* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
|
||||
"* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n",
|
||||
"* `vnet_name`: Name of VNet\n",
|
||||
"* `subnet_name`: Name of SubNet within the VNet"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||
"\n",
|
||||
"# Choose a name for your CPU cluster\n",
|
||||
"cpu_cluster_name = \"cpu-cluster\"\n",
|
||||
"\n",
|
||||
"# Verify that cluster does not exist already\n",
|
||||
"try:\n",
|
||||
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
|
||||
" print('Found existing cluster, use it.')\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
|
||||
" vm_priority='lowpriority',\n",
|
||||
" min_nodes=2,\n",
|
||||
" max_nodes=4,\n",
|
||||
" idle_seconds_before_scaledown='300',\n",
|
||||
" vnet_resourcegroup_name='<my-resource-group>',\n",
|
||||
" vnet_name='<my-vnet-name>',\n",
|
||||
" subnet_name='<my-subnet-name>')\n",
|
||||
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
|
||||
"\n",
|
||||
"cpu_cluster.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure & Run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.runconfig import RunConfiguration\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"\n",
|
||||
"# create a new RunConfig object\n",
|
||||
"run_config = RunConfiguration(framework=\"python\")\n",
|
||||
"\n",
|
||||
"# Set compute target to AmlCompute target created in previous step\n",
|
||||
"run_config.target = cpu_cluster.name\n",
|
||||
"\n",
|
||||
"# enable Docker \n",
|
||||
"run_config.environment.docker.enabled = True\n",
|
||||
"\n",
|
||||
"azureml_pip_packages = [\n",
|
||||
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
|
||||
" 'azureml-explain-model'\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# specify CondaDependencies obj\n",
|
||||
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
|
||||
" pip_packages=azureml_pip_packages)\n",
|
||||
"\n",
|
||||
"from azureml.core import Run\n",
|
||||
"from azureml.core import ScriptRunConfig\n",
|
||||
"\n",
|
||||
"src = ScriptRunConfig(source_directory=project_folder, \n",
|
||||
" script='run_explainer.py', \n",
|
||||
" run_config=run_config) \n",
|
||||
"run = experiment.submit(config=src)\n",
|
||||
"run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"# Shows output of the run on stdout.\n",
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.get_metrics()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
|
||||
"\n",
|
||||
"client = ExplanationClient.from_run(run)\n",
|
||||
"# Get the top k (e.g., 4) most important features with their importance values\n",
|
||||
"explanation = client.download_model_explanation(top_k=4)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Additional operations to perform on AmlCompute\n",
|
||||
"\n",
|
||||
"You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get_status () gets the latest status of the AmlCompute target\n",
|
||||
"cpu_cluster.get_status().serialize()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n",
|
||||
"# cpu_cluster.update(min_nodes=1)\n",
|
||||
"# cpu_cluster.update(max_nodes=10)\n",
|
||||
"cpu_cluster.update(idle_seconds_before_scaledown=300)\n",
|
||||
"# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -506,7 +288,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
|
||||
"# 'cpu-cluster' in this case but use a different VM family for instance.\n",
|
||||
"# 'cpucluster' in this case but use a different VM family for instance.\n",
|
||||
"\n",
|
||||
"# cpu_cluster.delete()"
|
||||
]
|
||||
@@ -515,7 +297,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Download Model Explanation Data"
|
||||
"## Download Model Explanation, Model, and Data"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -524,13 +306,26 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# retrieve model for visualization and deployment\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"original_model = Model(ws, 'original_model')\n",
|
||||
"model_path = original_model.download(exist_ok=True)\n",
|
||||
"original_svm_model = joblib.load(model_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# retrieve global explanation for visualization\n",
|
||||
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
|
||||
"\n",
|
||||
"# Get model explanation data\n",
|
||||
"# get model explanation data\n",
|
||||
"client = ExplanationClient.from_run(run)\n",
|
||||
"explanation = client.download_model_explanation()\n",
|
||||
"local_importance_values = explanation.local_importance_values\n",
|
||||
"expected_values = explanation.expected_values\n"
|
||||
"global_explanation = client.download_model_explanation()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -539,42 +334,189 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Or you can use the saved run.id to retrive the feature importance values\n",
|
||||
"client = ExplanationClient.from_run_id(ws, experiment_name, run.id)\n",
|
||||
"explanation = client.download_model_explanation()\n",
|
||||
"local_importance_values = explanation.local_importance_values\n",
|
||||
"expected_values = explanation.expected_values"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the top k (e.g., 4) most important features with their importance values\n",
|
||||
"explanation = client.download_model_explanation(top_k=4)\n",
|
||||
"global_importance_values = explanation.get_ranked_global_values()\n",
|
||||
"global_importance_names = explanation.get_ranked_global_names()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('global importance values: {}'.format(global_importance_values))\n",
|
||||
"print('global importance names: {}'.format(global_importance_names))"
|
||||
"# retrieve x_test for visualization\n",
|
||||
"from sklearn.externals import joblib\n",
|
||||
"x_test_path = './x_test.pkl'\n",
|
||||
"run.download_file('x_test_ibm.pkl', output_file_path=x_test_path)\n",
|
||||
"x_test = joblib.load(x_test_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Success!\n",
|
||||
"Great, you are ready to move on to the remaining notebooks."
|
||||
"## Visualize\n",
|
||||
"Visualize the explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, original_svm_model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy\n",
|
||||
"Deploy Model and ScoringExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||
" memory_gb=1, \n",
|
||||
" tags={\"data\": \"IBM_Attrition\", \n",
|
||||
" \"method\" : \"local_explanation\"}, \n",
|
||||
" description='Get local explanations for IBM Employee Attrition data')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||
"\n",
|
||||
"# WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)\n",
|
||||
"azureml_pip_packages = [\n",
|
||||
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
|
||||
" 'azureml-explain-model'\n",
|
||||
"]\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"# specify CondaDependencies obj\n",
|
||||
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],\n",
|
||||
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
|
||||
" pin_sdk_version=False)\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||
" f.write(myenv.serialize_to_string())\n",
|
||||
"\n",
|
||||
"with open(\"myenv.yml\",\"r\") as f:\n",
|
||||
" print(f.read())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile dockerfile\n",
|
||||
"RUN apt-get update && apt-get install -y g++ "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# retrieve scoring explainer for deployment\n",
|
||||
"scoring_explainer_model = Model(ws, 'IBM_attrition_explainer')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"# Use the custom scoring, docker, and conda files we created above\n",
|
||||
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||
" docker_file=\"dockerfile\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" conda_file=\"myenv.yml\")\n",
|
||||
"\n",
|
||||
"# Use configs and models generated above\n",
|
||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||
" name='model-scoring-service',\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
" models=[scoring_explainer_model, original_model],\n",
|
||||
" image_config=image_config)\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"# create data to test service with\n",
|
||||
"examples = x_test[:4]\n",
|
||||
"input_data = examples.to_json()\n",
|
||||
"\n",
|
||||
"headers = {'Content-Type':'application/json'}\n",
|
||||
"\n",
|
||||
"# send request to service\n",
|
||||
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"POST to url\", service.scoring_uri)\n",
|
||||
"# can covert back to Python objects from json string if desired\n",
|
||||
"print(\"prediction:\", resp.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n",
|
||||
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -0,0 +1,8 @@
|
||||
name: train-explain-model-on-amlcompute-and-deploy
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-explain-model
|
||||
- azureml-contrib-explain-model
|
||||
- azureml-dataprep
|
||||
- sklearn-pandas
|
||||
@@ -0,0 +1,128 @@
|
||||
# ---------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# ---------------------------------------------------------
|
||||
|
||||
import os
|
||||
import pandas as pd
|
||||
import zipfile
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.externals import joblib
|
||||
from sklearn.preprocessing import StandardScaler, OneHotEncoder
|
||||
from sklearn.impute import SimpleImputer
|
||||
from sklearn.pipeline import Pipeline
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn_pandas import DataFrameMapper
|
||||
|
||||
from azureml.core.run import Run
|
||||
from azureml.explain.model.tabular_explainer import TabularExplainer
|
||||
from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient
|
||||
from azureml.explain.model.scoring.scoring_explainer import LinearScoringExplainer, save
|
||||
|
||||
OUTPUT_DIR = './outputs/'
|
||||
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
||||
|
||||
# get the IBM employee attrition dataset
|
||||
outdirname = 'dataset.6.21.19'
|
||||
try:
|
||||
from urllib import urlretrieve
|
||||
except ImportError:
|
||||
from urllib.request import urlretrieve
|
||||
zipfilename = outdirname + '.zip'
|
||||
urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)
|
||||
with zipfile.ZipFile(zipfilename, 'r') as unzip:
|
||||
unzip.extractall('.')
|
||||
attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')
|
||||
|
||||
# dropping Employee count as all values are 1 and hence attrition is independent of this feature
|
||||
attritionData = attritionData.drop(['EmployeeCount'], axis=1)
|
||||
# dropping Employee Number since it is merely an identifier
|
||||
attritionData = attritionData.drop(['EmployeeNumber'], axis=1)
|
||||
attritionData = attritionData.drop(['Over18'], axis=1)
|
||||
# since all values are 80
|
||||
attritionData = attritionData.drop(['StandardHours'], axis=1)
|
||||
|
||||
# converting target variables from string to numerical values
|
||||
target_map = {'Yes': 1, 'No': 0}
|
||||
attritionData["Attrition_numerical"] = attritionData["Attrition"].apply(lambda x: target_map[x])
|
||||
target = attritionData["Attrition_numerical"]
|
||||
|
||||
attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)
|
||||
|
||||
# creating dummy columns for each categorical feature
|
||||
categorical = []
|
||||
for col, value in attritionXData.iteritems():
|
||||
if value.dtype == 'object':
|
||||
categorical.append(col)
|
||||
|
||||
# store the numerical columns
|
||||
numerical = attritionXData.columns.difference(categorical)
|
||||
|
||||
numeric_transformations = [([f], Pipeline(steps=[
|
||||
('imputer', SimpleImputer(strategy='median')),
|
||||
('scaler', StandardScaler())])) for f in numerical]
|
||||
|
||||
categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]
|
||||
|
||||
transformations = numeric_transformations + categorical_transformations
|
||||
|
||||
# append classifier to preprocessing pipeline
|
||||
clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),
|
||||
('classifier', LogisticRegression(solver='lbfgs'))])
|
||||
|
||||
# get the run this was submitted from to interact with run history
|
||||
run = Run.get_context()
|
||||
|
||||
# create an explanation client to store the explanation (contrib API)
|
||||
client = ExplanationClient.from_run(run)
|
||||
|
||||
# Split data into train and test
|
||||
x_train, x_test, y_train, y_test = train_test_split(attritionXData,
|
||||
target,
|
||||
test_size=0.2,
|
||||
random_state=0,
|
||||
stratify=target)
|
||||
|
||||
# write x_test out as a pickle file for later visualization
|
||||
x_test_pkl = 'x_test.pkl'
|
||||
with open(x_test_pkl, 'wb') as file:
|
||||
joblib.dump(value=x_test, filename=os.path.join(OUTPUT_DIR, x_test_pkl))
|
||||
run.upload_file('x_test_ibm.pkl', os.path.join(OUTPUT_DIR, x_test_pkl))
|
||||
|
||||
# preprocess the data and fit the classification model
|
||||
clf.fit(x_train, y_train)
|
||||
model = clf.steps[-1][1]
|
||||
|
||||
# save model for use outside the script
|
||||
model_file_name = 'log_reg.pkl'
|
||||
with open(model_file_name, 'wb') as file:
|
||||
joblib.dump(value=clf, filename=os.path.join(OUTPUT_DIR, model_file_name))
|
||||
|
||||
# register the model with the model management service for later use
|
||||
run.upload_file('original_model.pkl', os.path.join(OUTPUT_DIR, model_file_name))
|
||||
original_model = run.register_model(model_name='original_model', model_path='original_model.pkl')
|
||||
|
||||
# create an explainer to validate or debug the model
|
||||
tabular_explainer = TabularExplainer(model,
|
||||
initialization_examples=x_train,
|
||||
features=attritionXData.columns,
|
||||
classes=["Not leaving", "leaving"],
|
||||
transformations=transformations)
|
||||
|
||||
# explain overall model predictions (global explanation)
|
||||
# passing in test dataset for evaluation examples - note it must be a representative sample of the original data
|
||||
# more data (e.g. x_train) will likely lead to higher accuracy, but at a time cost
|
||||
global_explanation = tabular_explainer.explain_global(x_test)
|
||||
|
||||
# uploading model explanation data for storage or visualization
|
||||
comment = 'Global explanation on classification model trained on IBM employee attrition dataset'
|
||||
client.upload_model_explanation(global_explanation, comment=comment)
|
||||
|
||||
# also create a lightweight explainer for scoring time
|
||||
scoring_explainer = LinearScoringExplainer(tabular_explainer)
|
||||
# pickle scoring explainer locally
|
||||
save(scoring_explainer, directory=OUTPUT_DIR, exist_ok=True)
|
||||
|
||||
# register scoring explainer
|
||||
run.upload_file('IBM_attrition_explainer.pkl', os.path.join(OUTPUT_DIR, 'scoring_explainer.pkl'))
|
||||
scoring_explainer_model = run.register_model(model_name='IBM_attrition_explainer',
|
||||
model_path='IBM_attrition_explainer.pkl')
|
||||
@@ -1,52 +0,0 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from sklearn import datasets
|
||||
from sklearn.linear_model import Ridge
|
||||
from azureml.explain.model.tabular_explainer import TabularExplainer
|
||||
from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient
|
||||
from sklearn.model_selection import train_test_split
|
||||
from azureml.core.run import Run
|
||||
from sklearn.externals import joblib
|
||||
import os
|
||||
import numpy as np
|
||||
|
||||
os.makedirs('./outputs', exist_ok=True)
|
||||
|
||||
boston_data = datasets.load_boston()
|
||||
|
||||
run = Run.get_context()
|
||||
client = ExplanationClient.from_run(run)
|
||||
|
||||
X_train, X_test, y_train, y_test = train_test_split(boston_data.data,
|
||||
boston_data.target,
|
||||
test_size=0.2,
|
||||
random_state=0)
|
||||
|
||||
alpha = 0.5
|
||||
# Use Ridge algorithm to create a regression model
|
||||
reg = Ridge(alpha)
|
||||
model = reg.fit(X_train, y_train)
|
||||
|
||||
preds = reg.predict(X_test)
|
||||
run.log('alpha', alpha)
|
||||
|
||||
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
|
||||
# save model in the outputs folder so it automatically get uploaded
|
||||
with open(model_file_name, 'wb') as file:
|
||||
joblib.dump(value=reg, filename=os.path.join('./outputs/',
|
||||
model_file_name))
|
||||
|
||||
# Explain predictions on your local machine
|
||||
tabular_explainer = TabularExplainer(model, X_train, features=boston_data.feature_names)
|
||||
|
||||
# Explain overall model predictions (global explanation)
|
||||
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
|
||||
# x_train can be passed as well, but with more examples explanations it will
|
||||
# take longer although they may be more accurate
|
||||
global_explanation = tabular_explainer.explain_global(X_test)
|
||||
|
||||
# Uploading model explanation data for storage or visualization in webUX
|
||||
# The explanation can then be downloaded on any compute
|
||||
comment = 'Global explanation on regression model trained on boston dataset'
|
||||
client.upload_model_explanation(global_explanation, comment=comment)
|
||||
@@ -1,279 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Breast cancer diagnosis classification with scikit-learn (run model explainer locally)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n",
|
||||
"3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.datasets import load_breast_cancer\n",
|
||||
"from sklearn import svm\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1. Run model explainer locally with full data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the breast cancer diagnosis data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"breast_cancer_data = load_breast_cancer()\n",
|
||||
"classes = breast_cancer_data.target_names.tolist()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"# per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# explain the first member of the test set\n",
|
||||
"instance_num = 0\n",
|
||||
"local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"dict(zip(sorted_local_importance_names, sorted_local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 2. Load visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
|
||||
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"# Or, in Jupyter Labs, uncomment below\n",
|
||||
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"# jupyter labextension install microsoft-mli-widget"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,6 +0,0 @@
|
||||
name: explain-local-sklearn-binary-classification
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-explain-model
|
||||
- azureml-contrib-explain-model
|
||||
@@ -1,280 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Iris flower classification with scikit-learn (run model explainer locally)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n",
|
||||
"3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.datasets import load_iris\n",
|
||||
"from sklearn import svm\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1. Run model explainer locally with full data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the breast cancer diagnosis data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iris = load_iris()\n",
|
||||
"X = iris['data']\n",
|
||||
"y = iris['target']\n",
|
||||
"classes = iris['target_names']\n",
|
||||
"feature_names = iris['feature_names']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(model, x_train, features = feature_names, classes=classes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"# per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# explain the first member of the test set\n",
|
||||
"instance_num = 0\n",
|
||||
"local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"dict(zip(sorted_local_importance_names, sorted_local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
|
||||
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"# Or, in Jupyter Labs, uncomment below\n",
|
||||
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"# jupyter labextension install microsoft-mli-widget"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,272 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Boston Housing Price Prediction with scikit-learn (run model explainer locally)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package\n",
|
||||
"\n",
|
||||
"1. Train a GradientBoosting regression model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import datasets\n",
|
||||
"from sklearn.ensemble import GradientBoostingRegressor\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1. Run model explainer locally with full data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the Boston house price data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"boston_data = datasets.load_boston()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a GradientBoosting Regression model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
|
||||
" learning_rate=0.1, loss='huber',\n",
|
||||
" random_state=1)\n",
|
||||
"model = reg.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(model, x_train, features = boston_data.feature_names)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values \n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"local_explanation = tabular_explainer.explain_local(x_test[0,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# sorted local feature importance information; reflects the original feature order\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()\n",
|
||||
"\n",
|
||||
"print('sorted local importance names: {}'.format(sorted_local_importance_names))\n",
|
||||
"print('sorted local importance values: {}'.format(sorted_local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
|
||||
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"# Or, in Jupyter Labs, uncomment below\n",
|
||||
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"# jupyter labextension install microsoft-mli-widget"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,337 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Summary\n",
|
||||
"From raw data that is a mixture of categoricals and numeric, featurize the categoricals using one hot encoding. Use tabular explainer to get explain object and then get raw feature importances"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package on raw features\n",
|
||||
"\n",
|
||||
"1. Train a Logistic Regression model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"titanic_url = ('https://raw.githubusercontent.com/amueller/'\n",
|
||||
" 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n",
|
||||
"data = pd.read_csv(titanic_url)\n",
|
||||
"# fill missing values\n",
|
||||
"data = data.fillna(method=\"ffill\")\n",
|
||||
"data = data.fillna(method=\"bfill\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1. Run model explainer locally with full data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"\n",
|
||||
"numeric_features = ['age', 'fare']\n",
|
||||
"categorical_features = ['embarked', 'sex', 'pclass']\n",
|
||||
"\n",
|
||||
"y = data['survived'].values\n",
|
||||
"X = data[categorical_features + numeric_features]\n",
|
||||
"\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"sklearn imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.compose import ColumnTransformer\n",
|
||||
"\n",
|
||||
"transformations = ColumnTransformer([\n",
|
||||
" (\"age_fare\", Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())\n",
|
||||
" ]), [\"age\", \"fare\"]),\n",
|
||||
" (\"embarked\", Pipeline(steps=[\n",
|
||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
||||
" (\"encoder\", OneHotEncoder(sparse=False))]), [\"embarked\"]),\n",
|
||||
" (\"sex_pclass\", OneHotEncoder(sparse=False), [\"sex\", \"pclass\"]) \n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"'''\n",
|
||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
||||
"#!pip install sklearn-pandas\n",
|
||||
"from sklearn_pandas import DataFrameMapper\n",
|
||||
"\n",
|
||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
||||
"\n",
|
||||
"transformations = [\n",
|
||||
" ([\"age\", \"fare\"], Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())\n",
|
||||
" ])),\n",
|
||||
" ([\"embarked\"], Pipeline(steps=[\n",
|
||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
||||
" (\"encoder\", OneHotEncoder(sparse=False))])),\n",
|
||||
" ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
||||
"'''"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a Logistic Regression model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(clf.steps[-1][1], initialization_examples=x_train, features=x_train.columns, transformations=transformations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sorted_global_importance_values = global_explanation.get_ranked_global_values()\n",
|
||||
"sorted_global_importance_names = global_explanation.get_ranked_global_names()\n",
|
||||
"dict(zip(sorted_global_importance_names, sorted_global_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# explain the first member of the test set\n",
|
||||
"local_explanation = tabular_explainer.explain_local(x_test[:1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[0]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"# Sorted local SHAP values\n",
|
||||
"print('ranked local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 2. Load visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
|
||||
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"# Or, in Jupyter Labs, uncomment below\n",
|
||||
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"# jupyter labextension install microsoft-mli-widget"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,262 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Breast cancer diagnosis classification with scikit-learn (save model explanations via AML Run History)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.datasets import load_breast_cancer\n",
|
||||
"from sklearn import svm\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 1. Run model explainer locally with full data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the breast cancer diagnosis data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"breast_cancer_data = load_breast_cancer()\n",
|
||||
"classes = breast_cancer_data.target_names.tolist()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(model, x_train, features=breast_cancer_data.feature_names, classes=classes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 2. Save Model Explanation With AML Run History"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"from azureml.core import Workspace, Experiment, Run\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'explain_model'\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"run = experiment.start_logging()\n",
|
||||
"client = ExplanationClient.from_run(run)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Uploading model explanation data for storage or visualization in webUX\n",
|
||||
"# The explanation can then be downloaded on any compute\n",
|
||||
"client.upload_model_explanation(global_explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get model explanation data\n",
|
||||
"explanation = client.download_model_explanation()\n",
|
||||
"local_importance_values = explanation.local_importance_values\n",
|
||||
"expected_values = explanation.expected_values"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the top k (e.g., 4) most important features with their importance values\n",
|
||||
"explanation = client.download_model_explanation(top_k=4)\n",
|
||||
"global_importance_values = explanation.get_ranked_global_values()\n",
|
||||
"global_importance_names = explanation.get_ranked_global_names()\n",
|
||||
"per_class_names = explanation.get_ranked_per_class_names()[0]\n",
|
||||
"per_class_values = explanation.get_ranked_per_class_values()[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('per class feature importance values: {}'.format(per_class_values))\n",
|
||||
"print('per class feature importance names: {}'.format(per_class_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dict(zip(per_class_names, per_class_values))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,276 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Boston Housing Price Prediction with scikit-learn (save model explanations via AML Run History)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Explain a model with the AML explain-model package\n",
|
||||
"\n",
|
||||
"1. Train a GradientBoosting regression model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Save Model Explanation With AML Run History"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Import Iris dataset\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from sklearn.ensemble import GradientBoostingRegressor\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core import Workspace, Experiment, Run\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'explain_model'\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"run = experiment.start_logging()\n",
|
||||
"client = ExplanationClient.from_run(run)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the Boston house price data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"boston_data = datasets.load_boston()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train a GradientBoosting Regression model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
|
||||
" learning_rate=0.1, loss='huber',\n",
|
||||
" random_state=1)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tabular_explainer = TabularExplainer(model, x_train, features=boston_data.feature_names)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = tabular_explainer.explain_global(x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Uploading model explanation data for storage or visualization in webUX\n",
|
||||
"# The explanation can then be downloaded on any compute\n",
|
||||
"client.upload_model_explanation(global_explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get model explanation data\n",
|
||||
"explanation = client.download_model_explanation()\n",
|
||||
"local_importance_values = explanation.local_importance_values\n",
|
||||
"expected_values = explanation.expected_values"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print the values\n",
|
||||
"print('expected values: {}'.format(expected_values))\n",
|
||||
"print('local importance values: {}'.format(local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the top k (e.g., 4) most important features with their importance values\n",
|
||||
"explanation = client.download_model_explanation(top_k=4)\n",
|
||||
"global_importance_values = explanation.get_ranked_global_values()\n",
|
||||
"global_importance_names = explanation.get_ranked_global_names()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('global importance values: {}'.format(global_importance_values))\n",
|
||||
"print('global importance names: {}'.format(global_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain individual instance predictions (local explanation) ##### needs to get updated with the new build"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"local_explanation = tabular_explainer.explain_local(x_test[0,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# local feature importance information\n",
|
||||
"local_importance_values = local_explanation.local_importance_values\n",
|
||||
"print('local importance values: {}'.format(local_importance_values))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,523 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Explain binary classification model predictions with raw feature transformations\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model that uses advanced many to one or many to many feature transformations.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Apply feature transformations\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model on raw features\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook illustrates creating explanations for a binary classification model, Titanic passenger data classification, that uses many to one and many to many feature transformations from raw data to engineered features. For the many to one transformation, we sum 2 features `age` and `fare`. For many to many transformations two features are computed: one that is product of `age` and `fare` and another that is square of this product. Our tabular data explainer is then used to get the explanation object with the flag `allow_all_transformations` passed. The object is then used to get raw feature importances.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We will showcase raw feature transformations with three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|\n",
|
||||
"| *Interpretability Toolkit Architecture* |\n",
|
||||
"\n",
|
||||
"Problem: Titanic passenger data classification with scikit-learn (run model explainer locally)\n",
|
||||
"\n",
|
||||
"1. Transform raw features to engineered features\n",
|
||||
"2. Train a Logistic Regression model using Scikit-learn\n",
|
||||
"3. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the Titanic passenger data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"titanic_url = ('https://raw.githubusercontent.com/amueller/'\n",
|
||||
" 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n",
|
||||
"data = pd.read_csv(titanic_url)\n",
|
||||
"# fill missing values\n",
|
||||
"data = data.fillna(method=\"ffill\")\n",
|
||||
"data = data.fillna(method=\"bfill\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"\n",
|
||||
"numeric_features = ['age', 'fare']\n",
|
||||
"categorical_features = ['embarked', 'sex', 'pclass']\n",
|
||||
"\n",
|
||||
"y = data['survived'].values\n",
|
||||
"X = data[categorical_features + numeric_features]\n",
|
||||
"\n",
|
||||
"# Split data into train and test\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Transform raw features"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# We add many to one and many to many transformations for illustration purposes.\n",
|
||||
"# The support for raw feature explanations with many to one and many to many transformations are only supported \n",
|
||||
"# When allow_all_transformations is set to True on explainer creation\n",
|
||||
"from sklearn.preprocessing import FunctionTransformer\n",
|
||||
"many_to_one_transformer = FunctionTransformer(lambda x: x.sum(axis=1).reshape(-1, 1))\n",
|
||||
"many_to_many_transformer = FunctionTransformer(lambda x: np.hstack(\n",
|
||||
" (np.prod(x, axis=1).reshape(-1, 1), (np.prod(x, axis=1)**2).reshape(-1, 1))\n",
|
||||
"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.compose import ColumnTransformer\n",
|
||||
"\n",
|
||||
"transformations = ColumnTransformer([\n",
|
||||
" (\"age_fare_1\", Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())\n",
|
||||
" ]), [\"age\", \"fare\"]),\n",
|
||||
" (\"age_fare_2\", many_to_one_transformer, [\"age\", \"fare\"]),\n",
|
||||
" (\"age_fare_3\", many_to_many_transformer, [\"age\", \"fare\"]),\n",
|
||||
" (\"embarked\", Pipeline(steps=[\n",
|
||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
||||
" (\"encoder\", OneHotEncoder(sparse=False))]), [\"embarked\"]),\n",
|
||||
" (\"sex_pclass\", OneHotEncoder(sparse=False), [\"sex\", \"pclass\"]) \n",
|
||||
"])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"'''\n",
|
||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
||||
"#!pip install sklearn-pandas\n",
|
||||
"from sklearn_pandas import DataFrameMapper\n",
|
||||
"\n",
|
||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
||||
"\n",
|
||||
"transformations = [\n",
|
||||
" ([\"age\", \"fare\"], Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())\n",
|
||||
" ])),\n",
|
||||
" ([\"age\", \"fare\"], many_to_one_transformer),\n",
|
||||
" ([\"age\", \"fare\"], many_to_many_transformer),\n",
|
||||
" ([\"embarked\"], Pipeline(steps=[\n",
|
||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
||||
" (\"encoder\", OneHotEncoder(sparse=False))])),\n",
|
||||
" ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
||||
"'''"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a Logistic Regression model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"# When the last parameter allow_all_transformations is passed, we handle many to one and many to many transformations to \n",
|
||||
"# generate approximations to raw feature importances. When this flag is passed, for transformations not recognized as one to \n",
|
||||
"# many, we distribute feature importances evenly to raw features generating them.\n",
|
||||
"# clf.steps[-1][1] returns the trained classification model\n",
|
||||
"explainer = TabularExplainer(clf.steps[-1][1], \n",
|
||||
" initialization_examples=x_train, \n",
|
||||
" features=x_train.columns, \n",
|
||||
" transformations=transformations, \n",
|
||||
" allow_all_transformations=True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=x_train.columns, \n",
|
||||
"# transformations=transformations, \n",
|
||||
"# allow_all_transformations=True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
|
||||
"# features=x_train.columns, \n",
|
||||
"# transformations=transformations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"# Per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# Per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"instance_num = 1\n",
|
||||
"local_explanation = explainer.explain_local(x_test[:instance_num])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
" \n",
|
||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. [Explain models with simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,7 @@
|
||||
name: advanced-feature-transformations-explain-local
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-explain-model
|
||||
- azureml-contrib-explain-model
|
||||
- sklearn-pandas
|
||||
@@ -0,0 +1,404 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Explain binary classification model predictions\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model predictions.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook illustrates how to explain a binary classification model predictions locally at training time without contacting any Azure services.\n",
|
||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
||||
"\n",
|
||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|\n",
|
||||
"| *Interpretability Toolkit Architecture* |\n",
|
||||
"\n",
|
||||
"Problem: Breast cancer diagnosis classification with scikit-learn (run model explainer locally)\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.datasets import load_breast_cancer\n",
|
||||
"from sklearn import svm\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the breast cancer diagnosis data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"breast_cancer_data = load_breast_cancer()\n",
|
||||
"classes = breast_cancer_data.target_names.tolist()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"explainer = TabularExplainer(model, \n",
|
||||
" x_train, \n",
|
||||
" features=breast_cancer_data.feature_names, \n",
|
||||
" classes=classes)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(model, \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=breast_cancer_data.feature_names, \n",
|
||||
"# classes=classes)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(model, \n",
|
||||
"# features=breast_cancer_data.feature_names, \n",
|
||||
"# classes=classes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"\n",
|
||||
"# Note: PFIExplainer does not support per class explanations\n",
|
||||
"# Per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# Per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"instance_num = 0\n",
|
||||
"local_explanation = explainer.explain_local(x_test[instance_num,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
" \n",
|
||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
name: explain-local-sklearn-regression
|
||||
name: explain-binary-classification-local
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -0,0 +1,402 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Explain multiclass classification model's predictions\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a multiclass classification model predictions.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Train a multiclass classification model\n",
|
||||
" 1. Explain the model\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook illustrates how to explain a multiclass classification model predictions locally at training time without contacting any Azure services.\n",
|
||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
||||
"\n",
|
||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|\n",
|
||||
"| *Interpretability Toolkit Architecture* |\n",
|
||||
"\n",
|
||||
"Problem: Iris flower classification with scikit-learn (run model explainer locally)\n",
|
||||
"\n",
|
||||
"1. Train a SVM classification model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.datasets import load_iris\n",
|
||||
"from sklearn import svm\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the Iris flower dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iris = load_iris()\n",
|
||||
"X = iris['data']\n",
|
||||
"y = iris['target']\n",
|
||||
"classes = iris['target_names']\n",
|
||||
"feature_names = iris['feature_names']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"explainer = TabularExplainer(model, \n",
|
||||
" x_train, \n",
|
||||
" features=feature_names, \n",
|
||||
" classes=classes)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(model, \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=feature_names, \n",
|
||||
"# classes=classes)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(model, \n",
|
||||
"# features=feature_names, \n",
|
||||
"# classes=classes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"\n",
|
||||
"# Note: PFIExplainer does not support per class explanations\n",
|
||||
"# Per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# Per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"instance_num = 0\n",
|
||||
"local_explanation = explainer.explain_local(x_test[instance_num,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
"\n",
|
||||
"1. [Training time: regression problem](./explain-regression-local.ipynb) \n",
|
||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)\n",
|
||||
"\u00e2\u20ac\u2039\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
name: explain-run-history-sklearn-classification
|
||||
name: explain-multiclass-classification-local
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -0,0 +1,397 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Explain regression model predictions\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a regression model predictions.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Train a regressor model\n",
|
||||
" 1. Explain the model\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Next steps](#Next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook illustrates how to explain regression model predictions locally at training time without contacting any Azure services.\n",
|
||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
||||
"\n",
|
||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|\n",
|
||||
"| *Interpretability Toolkit Architecture* |\n",
|
||||
"\n",
|
||||
"Problem: Boston Housing Price Prediction with scikit-learn (run model explainer locally)\n",
|
||||
"\n",
|
||||
"1. Train a GradientBoosting regression model using Scikit-learn\n",
|
||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import datasets\n",
|
||||
"from sklearn.ensemble import GradientBoostingRegressor\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the Boston house price data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"boston_data = datasets.load_boston()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a GradientBoosting regression model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
|
||||
" learning_rate=0.1, loss='huber',\n",
|
||||
" random_state=1)\n",
|
||||
"model = reg.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"explainer = TabularExplainer(model, \n",
|
||||
" x_train, \n",
|
||||
" features = boston_data.feature_names)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(model, \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=boston_data.feature_names)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(model, \n",
|
||||
"# features=boston_data.feature_names)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values \n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"local_explanation = explainer.explain_local(x_test[0,:])\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first five data points in the test set\n",
|
||||
"# local_explanation_group = explainer.explain_local(x_test[0:4,:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted local feature importance information; reflects the original feature order\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()\n",
|
||||
"\n",
|
||||
"print('sorted local importance names: {}'.format(sorted_local_importance_names))\n",
|
||||
"print('sorted local importance values: {}'.format(sorted_local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
" \n",
|
||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. Explain models with engineered features:\n",
|
||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
name: regression-sklearn-on-amlcompute
|
||||
name: explain-regression-local
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -0,0 +1,531 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Explain binary classification model predictions with raw feature transformations\n",
|
||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model that uses one to one and one to many feature transformations.**_\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
"1. [Introduction](#Introduction)\n",
|
||||
"1. [Setup](#Setup)\n",
|
||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
||||
" 1. Apply feature transformations\n",
|
||||
" 1. Train a binary classification model\n",
|
||||
" 1. Explain the model on raw features\n",
|
||||
" 1. Generate global explanations\n",
|
||||
" 1. Generate local explanations\n",
|
||||
"1. [Visualize results](#Visualize)\n",
|
||||
"1. [Next steps](#Next%20steps)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"This notebook illustrates creating explanations for a binary classification model, IBM employee attrition classification, that uses one to one and one to many feature transformations from raw data to engineered features. The one to many feature transformations include one hot encoding on categorical features. The one to one feature transformations apply standard scaling on numeric features. Our tabular data explainer is then used to get raw feature importances.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We will showcase raw feature transformations with three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
||||
"\n",
|
||||
"|  |\n",
|
||||
"|:--:|\n",
|
||||
"| *Interpretability Toolkit Architecture* |\n",
|
||||
"\n",
|
||||
"Problem: IBM employee attrition classification with scikit-learn (run model explainer locally)\n",
|
||||
"\n",
|
||||
"1. Transform raw features to engineered features\n",
|
||||
"2. Train a SVC classification model using Scikit-learn\n",
|
||||
"3. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
||||
"4. Visualize the global and local explanations with the visualization dashboard.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
|
||||
"```\n",
|
||||
"Or\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using Jupyter Labs run the following commands instead:\n",
|
||||
"```\n",
|
||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
||||
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Explain\n",
|
||||
"\n",
|
||||
"### Run model explainer locally at training time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.pipeline import Pipeline\n",
|
||||
"from sklearn.impute import SimpleImputer\n",
|
||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||
"from sklearn.svm import SVC\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# Explainers:\n",
|
||||
"# 1. SHAP Tabular Explainer\n",
|
||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 2. Mimic Explainer\n",
|
||||
"from azureml.explain.model.mimic.mimic_explainer import MimicExplainer\n",
|
||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
||||
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import LinearExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.linear_model import SGDExplainableModel\n",
|
||||
"from azureml.explain.model.mimic.models.tree_model import DecisionTreeExplainableModel\n",
|
||||
"\n",
|
||||
"# OR\n",
|
||||
"\n",
|
||||
"# 3. PFI Explainer\n",
|
||||
"from azureml.explain.model.permutation.permutation_importance import PFIExplainer "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the IBM employee attrition data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the IBM employee attrition dataset\n",
|
||||
"outdirname = 'dataset.6.21.19'\n",
|
||||
"try:\n",
|
||||
" from urllib import urlretrieve\n",
|
||||
"except ImportError:\n",
|
||||
" from urllib.request import urlretrieve\n",
|
||||
"import zipfile\n",
|
||||
"zipfilename = outdirname + '.zip'\n",
|
||||
"urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
|
||||
"with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
|
||||
" unzip.extractall('.')\n",
|
||||
"attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')\n",
|
||||
"\n",
|
||||
"# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
|
||||
"attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
|
||||
"# Dropping Employee Number since it is merely an identifier\n",
|
||||
"attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
|
||||
"\n",
|
||||
"attritionData = attritionData.drop(['Over18'], axis=1)\n",
|
||||
"\n",
|
||||
"# Since all values are 80\n",
|
||||
"attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
|
||||
"\n",
|
||||
"# Converting target variables from string to numerical values\n",
|
||||
"target_map = {'Yes': 1, 'No': 0}\n",
|
||||
"attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
|
||||
"target = attritionData[\"Attrition_numerical\"]\n",
|
||||
"\n",
|
||||
"attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split data into train and test\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"x_train, x_test, y_train, y_test = train_test_split(attritionXData, \n",
|
||||
" target, \n",
|
||||
" test_size = 0.2,\n",
|
||||
" random_state=0,\n",
|
||||
" stratify=target)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating dummy columns for each categorical feature\n",
|
||||
"categorical = []\n",
|
||||
"for col, value in attritionXData.iteritems():\n",
|
||||
" if value.dtype == 'object':\n",
|
||||
" categorical.append(col)\n",
|
||||
" \n",
|
||||
"# Store the numerical columns in a list numerical\n",
|
||||
"numerical = attritionXData.columns.difference(categorical) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Transform raw features"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.compose import ColumnTransformer\n",
|
||||
"\n",
|
||||
"# We create the preprocessing pipelines for both numeric and categorical data.\n",
|
||||
"numeric_transformer = Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||
" ('scaler', StandardScaler())])\n",
|
||||
"\n",
|
||||
"categorical_transformer = Pipeline(steps=[\n",
|
||||
" ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),\n",
|
||||
" ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n",
|
||||
"\n",
|
||||
"transformations = ColumnTransformer(\n",
|
||||
" transformers=[\n",
|
||||
" ('num', numeric_transformer, numerical),\n",
|
||||
" ('cat', categorical_transformer, categorical)])\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', SVC(kernel='linear', C = 1.0, probability=True))])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"'''\n",
|
||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
||||
"#!pip install sklearn-pandas\n",
|
||||
"from sklearn_pandas import DataFrameMapper\n",
|
||||
"\n",
|
||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"numeric_transformations = [([f], Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])) for f in numerical]\n",
|
||||
"\n",
|
||||
"categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]\n",
|
||||
"\n",
|
||||
"transformations = numeric_transformations + categorical_transformations\n",
|
||||
"\n",
|
||||
"# Append classifier to preprocessing pipeline.\n",
|
||||
"# Now we have a full prediction pipeline.\n",
|
||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||
" ('classifier', SVC(kernel='linear', C = 1.0, probability=True))]) \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"'''"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train a SVM classification model, which you want to explain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = clf.fit(x_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain predictions on your local machine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 1. Using SHAP TabularExplainer\n",
|
||||
"# clf.steps[-1][1] returns the trained classification model\n",
|
||||
"explainer = TabularExplainer(clf.steps[-1][1], \n",
|
||||
" initialization_examples=x_train, \n",
|
||||
" features=attritionXData.columns, \n",
|
||||
" classes=[\"Not leaving\", \"leaving\"], \n",
|
||||
" transformations=transformations)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 2. Using MimicExplainer\n",
|
||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
||||
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
|
||||
"# x_train, \n",
|
||||
"# LGBMExplainableModel, \n",
|
||||
"# augment_data=True, \n",
|
||||
"# max_num_of_augmentations=10, \n",
|
||||
"# features=attritionXData.columns, \n",
|
||||
"# classes=[\"Not leaving\", \"leaving\"], \n",
|
||||
"# transformations=transformations)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Using PFIExplainer\n",
|
||||
"\n",
|
||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
||||
"# Default metrics: \n",
|
||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
||||
"# Mean absolute error for regression\n",
|
||||
"\n",
|
||||
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
|
||||
"# features=x_train.columns, \n",
|
||||
"# transformations=transformations,\n",
|
||||
"# classes=[\"Not leaving\", \"leaving\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate global explanations\n",
|
||||
"Explain overall model predictions (global explanation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
||||
"global_explanation = explainer.explain_global(x_test)\n",
|
||||
"\n",
|
||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Sorted SHAP values\n",
|
||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
||||
"# Corresponding feature names\n",
|
||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
||||
"# Feature ranks (based on original order of features)\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
||||
"\n",
|
||||
"# Note: PFIExplainer does not support per class explanations\n",
|
||||
"# Per class feature names\n",
|
||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
||||
"# Per class feature importance values\n",
|
||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feature shap values for all features and all data points in the training data\n",
|
||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate local explanations\n",
|
||||
"Explain local data points (individual instances)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: PFIExplainer does not support local explanations\n",
|
||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
||||
"\n",
|
||||
"# E.g., Explain the first data point in the test set\n",
|
||||
"instance_num = 1\n",
|
||||
"local_explanation = explainer.explain_local(x_test[:instance_num])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
||||
"\n",
|
||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
||||
"\n",
|
||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize\n",
|
||||
"Load the visualization dashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next\n",
|
||||
"Learn about other use cases of the explain package on a:\n",
|
||||
" \n",
|
||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
||||
"1. [Explain models with advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "mesameki"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
name: explain-local-sklearn-multiclass-classification
|
||||
name: simple-feature-transformations-explain-local
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-explain-model
|
||||
- azureml-contrib-explain-model
|
||||
- sklearn-pandas
|
||||
@@ -12,7 +12,9 @@ These notebooks below are designed to go in sequence.
|
||||
7. [aml-pipelines-how-to-use-estimatorstep.ipynb](https://aka.ms/pl-estimator): This notebook shows how to use the EstimatorStep.
|
||||
8. [aml-pipelines-parameter-tuning-with-hyperdrive.ipynb](https://aka.ms/pl-hyperdrive): HyperDriveStep in Pipelines shows how you can do hyper parameter tuning using Pipelines.
|
||||
9. [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb](https://aka.ms/pl-azbatch): AzureBatchStep can be used to run your custom code in AzureBatch cluster.
|
||||
10. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore.
|
||||
10. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore.
|
||||
11. [aml-pipelines-with-automated-machine-learning-step.ipynb](https://aka.ms/pl-automl): AutoMLStep in Pipelines shows how you can do automated machine learning using Pipelines.
|
||||
12. [aml-pipelines-setup-versioned-pipeline-endpoints.ipynb](https://aka.ms/pl-ver-endpoint): This notebook shows how you can setup PipelineEndpoint and submit a Pipeline using the PipelineEndpoint.
|
||||
|
||||
|
||||

|
||||
|
||||
@@ -22,9 +22,19 @@
|
||||
"# Azure Machine Learning Pipeline with DataTranferStep\n",
|
||||
"This notebook is used to demonstrate the use of DataTranferStep in Azure Machine Learning Pipeline.\n",
|
||||
"\n",
|
||||
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
|
||||
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Azure SQL Database and you may want to move it to Azure Data Lake storage. Or, your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
|
||||
"\n",
|
||||
"The below example shows how to move data between an ADLS account, Blob storage, SQL Server, PostgreSQL server. "
|
||||
"The below examples show how to move data between an ADLS account, Blob storage, SQL Server, PostgreSQL server. \n",
|
||||
"\n",
|
||||
"## Data transfer currently supports following storage types:\n",
|
||||
"\n",
|
||||
"| Data store | Supported as a source | Supported as a sink |\n",
|
||||
"| --- | --- | --- |\n",
|
||||
"| Azure Blob Storage | Yes | Yes |\n",
|
||||
"| Azure Data Lake Storage Gen 1 | Yes | Yes |\n",
|
||||
"| Azure Data Lake Storage Gen 2 | Yes | Yes |\n",
|
||||
"| Azure SQL Database | Yes | Yes |\n",
|
||||
"| Azure Database for PostgreSQL | Yes | No |"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -62,8 +72,7 @@
|
||||
"\n",
|
||||
"Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json\n",
|
||||
"\n",
|
||||
"If you don't have a config.json file, please go through the configuration Notebook located here:\n",
|
||||
"https://github.com/Azure/MachineLearningNotebooks. \n",
|
||||
"If you don't have a config.json file, please go through the [configuration Notebook](https://aka.ms/pl-config) first.\n",
|
||||
"\n",
|
||||
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
|
||||
]
|
||||
@@ -86,15 +95,58 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Register Datastores\n",
|
||||
"\n",
|
||||
"In the code cell below, you will need to fill in the appropriate values for the workspace name, datastore name, subscription id, resource group, store name, tenant id, client id, and client secret that are associated with your ADLS datastore. \n",
|
||||
"## Register Datastores and create DataReferences\n",
|
||||
"\n",
|
||||
"For background on registering your data store, consult this article:\n",
|
||||
"\n",
|
||||
"https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory\n",
|
||||
"\n",
|
||||
"### register datastores for Azure Data Lake and Azure Blob storage"
|
||||
"> Please make sure to update the following code examples with appropriate values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure Blob Storage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from msrest.exceptions import HttpOperationError\n",
|
||||
"\n",
|
||||
"blob_datastore_name='MyBlobDatastore'\n",
|
||||
"account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
|
||||
"container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
|
||||
"account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
|
||||
" print(\"found blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||
"except HttpOperationError:\n",
|
||||
" blob_datastore = Datastore.register_azure_blob_container(\n",
|
||||
" workspace=ws,\n",
|
||||
" datastore_name=blob_datastore_name,\n",
|
||||
" account_name=account_name, # Storage account name\n",
|
||||
" container_name=container_name, # Name of Azure blob container\n",
|
||||
" account_key=account_key) # Storage account key\n",
|
||||
" print(\"registered blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||
"\n",
|
||||
"blob_data_ref = DataReference(\n",
|
||||
" datastore=blob_datastore,\n",
|
||||
" data_reference_name=\"blob_test_data\",\n",
|
||||
" path_on_datastore=\"testdata\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure Data Lake Storage Gen1"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -128,34 +180,57 @@
|
||||
" client_secret=client_secret) # the secret of service principal\n",
|
||||
" print(\"registered datastore with name: %s\" % datastore_name)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"blob_datastore_name='MyBlobDatastore'\n",
|
||||
"account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
|
||||
"container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
|
||||
"account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
|
||||
" print(\"found blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||
"except HttpOperationError:\n",
|
||||
" blob_datastore = Datastore.register_azure_blob_container(\n",
|
||||
" workspace=ws,\n",
|
||||
" datastore_name=blob_datastore_name,\n",
|
||||
" account_name=account_name, # Storage account name\n",
|
||||
" container_name=container_name, # Name of Azure blob container\n",
|
||||
" account_key=account_key) # Storage account key\"\n",
|
||||
" print(\"registered blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||
"\n",
|
||||
"# CLI:\n",
|
||||
"# az ml datastore attach-blob -n <datastore-name> -a <account-name> -c <container-name> -k <account-key> [-t <sas-token>]"
|
||||
"adls_data_ref = DataReference(\n",
|
||||
" datastore=adls_datastore,\n",
|
||||
" data_reference_name=\"adls_test_data\",\n",
|
||||
" path_on_datastore=\"testdata\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### register datastores for Azure SQL Server and Azure database for PostgreSQL"
|
||||
"### Azure Data Lake Storage Gen2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"adlsgen2_datastore_name = 'myadlsgen2datastore'\n",
|
||||
"account_name=os.getenv(\"ADLSGEN2_ACCOUNTNAME_62\", \"<my-account-name>\") # ADLS Gen2 account name\n",
|
||||
"tenant_id=os.getenv(\"ADLSGEN2_TENANT_62\", \"<my-tenant-id>\") # tenant id of service principal\n",
|
||||
"client_id=os.getenv(\"ADLSGEN2_CLIENTID_62\", \"<my-client-id>\") # client id of service principal\n",
|
||||
"client_secret=os.getenv(\"ADLSGEN2_CLIENT_SECRET_62\", \"<my-client-secret>\") # the secret of service principal\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" adlsgen2_datastore = Datastore.get(ws, adlsgen2_datastore_name)\n",
|
||||
" print(\"found ADLS Gen2 datastore with name: %s\" % adlsgen2_datastore_name)\n",
|
||||
"except:\n",
|
||||
" adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(\n",
|
||||
" workspace=ws,\n",
|
||||
" datastore_name=adlsgen2_datastore_name,\n",
|
||||
" filesystem='test', # Name of ADLS Gen2 filesystem\n",
|
||||
" account_name=account_name, # ADLS Gen2 account name\n",
|
||||
" tenant_id=tenant_id, # tenant id of service principal\n",
|
||||
" client_id=client_id, # client id of service principal\n",
|
||||
" client_secret=client_secret) # the secret of service principal\n",
|
||||
" print(\"registered datastore with name: %s\" % adlsgen2_datastore_name)\n",
|
||||
"\n",
|
||||
"adlsgen2_data_ref = DataReference(\n",
|
||||
" datastore=adlsgen2_datastore,\n",
|
||||
" data_reference_name='adlsgen2_test_data',\n",
|
||||
" path_on_datastore='testdata')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure SQL Database"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -186,7 +261,28 @@
|
||||
" tenant_id=tenant_id)\n",
|
||||
" print(\"registered sql databse datastore with name: %s\" % sql_datastore_name)\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"from azureml.data.sql_data_reference import SqlDataReference\n",
|
||||
"\n",
|
||||
"sql_query_data_ref = SqlDataReference(\n",
|
||||
" datastore=sql_datastore,\n",
|
||||
" data_reference_name=\"sql_query_data_ref\",\n",
|
||||
" sql_query=\"select top 1 * from TestData\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure Database for PostgreSQL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"psql_datastore_name=\"MyPostgreSqlDatastore\"\n",
|
||||
"server_name=os.getenv(\"PSQL_SERVERNAME_62\", \"<my-server-name>\") # Name of PostgreSQL server \n",
|
||||
"database_name=os.getenv(\"PSQL_DATBASENAME_62\", \"<my-database-name>\") # Name of PostgreSQL database\n",
|
||||
@@ -205,73 +301,13 @@
|
||||
" user_id=user_id,\n",
|
||||
" user_password=user_password)\n",
|
||||
" print(\"registered PostgreSQL databse datastore with name: %s\" % psql_datastore_name)\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create DataReferences\n",
|
||||
"### create DataReferences for Azure Data Lake and Azure Blob storage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"adls_datastore = Datastore(workspace=ws, name=\"MyAdlsDatastore\")\n",
|
||||
"\n",
|
||||
"# adls\n",
|
||||
"adls_data_ref = DataReference(\n",
|
||||
" datastore=adls_datastore,\n",
|
||||
" data_reference_name=\"adls_test_data\",\n",
|
||||
" path_on_datastore=\"testdata\")\n",
|
||||
"\n",
|
||||
"blob_datastore = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n",
|
||||
"\n",
|
||||
"# blob data\n",
|
||||
"blob_data_ref = DataReference(\n",
|
||||
" datastore=blob_datastore,\n",
|
||||
" data_reference_name=\"blob_test_data\",\n",
|
||||
" path_on_datastore=\"testdata\")\n",
|
||||
"\n",
|
||||
"print(\"obtained adls, blob data references\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### create DataReferences for Azure SQL Server and Azure database for PostgreSQL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.data.sql_data_reference import SqlDataReference\n",
|
||||
"\n",
|
||||
"sql_datastore = Datastore(workspace=ws, name=\"MySqlDatastore\")\n",
|
||||
"\n",
|
||||
"sql_query_data_ref = SqlDataReference(\n",
|
||||
" datastore=sql_datastore,\n",
|
||||
" data_reference_name=\"sql_query_data_ref\",\n",
|
||||
" sql_query=\"select top 1 * from TestData\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"psql_datastore = Datastore(workspace=ws, name=\"MyPostgreSqlDatastore\")\n",
|
||||
"\n",
|
||||
"psql_query_data_ref = SqlDataReference(\n",
|
||||
" datastore=psql_datastore,\n",
|
||||
" data_reference_name=\"psql_query_data_ref\",\n",
|
||||
" sql_query=\"SELECT * FROM testtable\")\n",
|
||||
"\n",
|
||||
"print(\"obtained Sql server, PostgreSQL data references\")"
|
||||
" sql_query=\"SELECT * FROM testtable\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -304,11 +340,7 @@
|
||||
" \n",
|
||||
"data_factory_compute = get_or_create_data_factory(ws, data_factory_name)\n",
|
||||
"\n",
|
||||
"print(\"setup data factory account complete\")\n",
|
||||
"\n",
|
||||
"# CLI:\n",
|
||||
"# Create: az ml computetarget setup datafactory -n <name>\n",
|
||||
"# BYOC: az ml computetarget attach datafactory -n <name> -i <resource-id>"
|
||||
"print(\"setup data factory account complete\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -357,6 +389,13 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"transfer_adlsgen2_to_blob = DataTransferStep(\n",
|
||||
" name='transfer_adlsgen2_to_blob',\n",
|
||||
" source_data_reference=adlsgen2_data_ref,\n",
|
||||
" destination_data_reference=blob_data_ref,\n",
|
||||
" compute_target=data_factory_compute)\n",
|
||||
"\n",
|
||||
"transfer_sql_to_blob = DataTransferStep(\n",
|
||||
" name=\"transfer_sql_to_blob\",\n",
|
||||
" source_data_reference=sql_query_data_ref,\n",
|
||||
@@ -405,7 +444,7 @@
|
||||
"pipeline_02 = Pipeline(\n",
|
||||
" description=\"data_transfer_02\",\n",
|
||||
" workspace=ws,\n",
|
||||
" steps=[transfer_sql_to_blob,transfer_psql_to_blob])\n",
|
||||
" steps=[transfer_sql_to_blob,transfer_psql_to_blob, transfer_adlsgen2_to_blob])\n",
|
||||
"\n",
|
||||
"pipeline_run_02 = Experiment(ws, \"Data_Transfer_example_02\").submit(pipeline_02)\n",
|
||||
"pipeline_run_02.wait_for_completion()"
|
||||
@@ -439,11 +478,12 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: Databricks as a Compute Target\n",
|
||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This [notebook](./aml-pipelines-use-databricks-as-compute-target.ipynb) demonstrates the use of a DatabricksStep in an Azure Machine Learning Pipeline."
|
||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This [notebook](https://aka.ms/pl-databricks) demonstrates the use of a DatabricksStep in an Azure Machine Learning Pipeline."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
"\n",
|
||||
"Azure's Machine Learning pipelines give you a way to combine multiple steps like these into one configurable workflow, so that multiple agents/users can share and/or reuse this workflow. Machine learning pipelines thus provide a consistent, reproducible mechanism for building, evaluating, deploying, and running ML systems.\n",
|
||||
"\n",
|
||||
"To get more information about Azure machine learning pipelines, please read our [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) overview, or the [readme article](../README.md).\n",
|
||||
"To get more information about Azure machine learning pipelines, please read our [Azure Machine Learning Pipelines](https://aka.ms/pl-concept) overview, or the [readme article](https://aka.ms/pl-readme).\n",
|
||||
"\n",
|
||||
"In this notebook, we provide a gentle introduction to Azure machine learning pipelines. We build a pipeline that runs jobs unattended on different compute clusters; in this notebook, you'll see how to use the basic Azure ML SDK APIs for constructing this pipeline.\n",
|
||||
" "
|
||||
@@ -44,7 +44,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n"
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -127,7 +127,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Required data and script files for the the tutorial\n",
|
||||
"Sample files required to finish this tutorial are already copied to the corresponding source_directory locations. Even though the .py provided in the samples don't have much \"ML work,\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only."
|
||||
"Sample files required to finish this tutorial are already copied to the corresponding source_directory locations. Even though the .py provided in the samples does not have much \"ML work\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -597,7 +597,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: Pipelines with data dependency\n",
|
||||
"The next [notebook](./aml-pipelines-with-data-dependency-steps.ipynb) demostrates how to construct a pipeline with data dependency."
|
||||
"The next [notebook](https://aka.ms/pl-data-dep) demostrates how to construct a pipeline with data dependency."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -74,7 +74,7 @@
|
||||
"source": [
|
||||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
|
||||
"\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the [configuration Notebook](https://aka.ms/pl-config) located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
|
||||
"\n",
|
||||
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
|
||||
]
|
||||
|
||||
@@ -27,7 +27,7 @@
|
||||
"\n",
|
||||
"## Prerequisite:\n",
|
||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](https://aka.ms/pl-config) to:\n",
|
||||
" * install the AML SDK\n",
|
||||
" * create a workspace and its configuration file (`config.json`)"
|
||||
]
|
||||
@@ -150,7 +150,9 @@
|
||||
"> Estimator object initialization involves specifying a list of DataReference objects in its 'inputs' parameter.\n",
|
||||
" In Pipelines, a step can take another step's output or DataReferences as input. So when creating an EstimatorStep,\n",
|
||||
" the parameters 'inputs' and 'outputs' need to be set explicitly and that will override 'inputs' parameter\n",
|
||||
" specified in the Estimator object."
|
||||
" specified in the Estimator object.\n",
|
||||
" \n",
|
||||
"> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -170,7 +172,9 @@
|
||||
" data_reference_name=\"input_data\",\n",
|
||||
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
||||
"\n",
|
||||
"output = PipelineData(\"output\", datastore=def_blob_store)"
|
||||
"output = PipelineData(\"output\", datastore=def_blob_store)\n",
|
||||
"\n",
|
||||
"source_directory = 'estimator_train'"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -181,7 +185,7 @@
|
||||
"source": [
|
||||
"from azureml.train.estimator import Estimator\n",
|
||||
"\n",
|
||||
"est = Estimator(source_directory='.', \n",
|
||||
"est = Estimator(source_directory=source_directory, \n",
|
||||
" compute_target=cpu_cluster, \n",
|
||||
" entry_script='dummy_train.py', \n",
|
||||
" conda_packages=['scikit-learn'])"
|
||||
|
||||
@@ -30,7 +30,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"\n",
|
||||
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
||||
]
|
||||
@@ -88,7 +88,11 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create an Azure ML experiment\n",
|
||||
"Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n"
|
||||
"Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. \n",
|
||||
"\n",
|
||||
"> The best practice is to use separate folders for scripts and its dependent files for each step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step. \n",
|
||||
"\n",
|
||||
"> The script runs will be recorded under the experiment in Azure."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"\n",
|
||||
"### Initialization Steps"
|
||||
]
|
||||
@@ -57,10 +57,8 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||
"\n",
|
||||
"# Default datastore (Azure file storage)\n",
|
||||
"def_file_store = ws.get_default_datastore() \n",
|
||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
||||
"\n",
|
||||
"# Default datastore (Azure blob storage)\n",
|
||||
"# def_blob_store = ws.get_default_datastore()\n",
|
||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
||||
]
|
||||
@@ -147,7 +145,9 @@
|
||||
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
||||
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
||||
"\n",
|
||||
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
||||
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** \n",
|
||||
"\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -158,13 +158,16 @@
|
||||
"source": [
|
||||
"# trainStep consumes the datasource (Datareference) in the previous step\n",
|
||||
"# and produces processed_data1\n",
|
||||
"\n",
|
||||
"source_directory = \"publish_run_train\"\n",
|
||||
"\n",
|
||||
"trainStep = PythonScriptStep(\n",
|
||||
" script_name=\"train.py\", \n",
|
||||
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
||||
" inputs=[blob_input_data],\n",
|
||||
" outputs=[processed_data1],\n",
|
||||
" compute_target=aml_compute, \n",
|
||||
" source_directory='.'\n",
|
||||
" source_directory=source_directory\n",
|
||||
")\n",
|
||||
"print(\"trainStep created\")"
|
||||
]
|
||||
@@ -188,6 +191,7 @@
|
||||
"# extractStep to use the intermediate data produced by step4\n",
|
||||
"# This step also produces an output processed_data2\n",
|
||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||
"source_directory = \"publish_run_extract\"\n",
|
||||
"\n",
|
||||
"extractStep = PythonScriptStep(\n",
|
||||
" script_name=\"extract.py\",\n",
|
||||
@@ -195,7 +199,7 @@
|
||||
" inputs=[processed_data1],\n",
|
||||
" outputs=[processed_data2],\n",
|
||||
" compute_target=aml_compute, \n",
|
||||
" source_directory='.')\n",
|
||||
" source_directory=source_directory)\n",
|
||||
"print(\"extractStep created\")"
|
||||
]
|
||||
},
|
||||
@@ -247,8 +251,7 @@
|
||||
"source": [
|
||||
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"source_directory = \"publish_run_compare\"\n",
|
||||
"\n",
|
||||
"compareStep = PythonScriptStep(\n",
|
||||
" script_name=\"compare.py\",\n",
|
||||
@@ -256,7 +259,7 @@
|
||||
" inputs=[processed_data1, processed_data2],\n",
|
||||
" outputs=[processed_data3], \n",
|
||||
" compute_target=aml_compute, \n",
|
||||
" source_directory='.')\n",
|
||||
" source_directory=source_directory)\n",
|
||||
"print(\"compareStep created\")"
|
||||
]
|
||||
},
|
||||
@@ -390,7 +393,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: Data Transfer\n",
|
||||
"The next [notebook](./aml-pipelines-data-transfer.ipynb) will showcase data transfer steps between different types of data stores."
|
||||
"The next [notebook](https://aka.ms/pl-data-trans) will showcase data transfer steps between different types of data stores."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites and AML Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n",
|
||||
"\n",
|
||||
"### Initialization Steps"
|
||||
]
|
||||
@@ -103,7 +103,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define a pipeline step\n",
|
||||
"Define a single step pipeline for demonstration purpose."
|
||||
"Define a single step pipeline for demonstration purpose. The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,11 +114,13 @@
|
||||
"source": [
|
||||
"from azureml.pipeline.steps import PythonScriptStep\n",
|
||||
"\n",
|
||||
"source_directory = \"publish_run_train\"\n",
|
||||
"\n",
|
||||
"trainStep = PythonScriptStep(\n",
|
||||
" name=\"Training_Step\",\n",
|
||||
" script_name=\"train.py\", \n",
|
||||
" compute_target=aml_compute_target, \n",
|
||||
" source_directory='.'\n",
|
||||
" source_directory=source_directory\n",
|
||||
")\n",
|
||||
"print(\"TrainStep created\")"
|
||||
]
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites and AML Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://github.com/Azure/MachineLearningNotebooks) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n"
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -76,7 +76,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Initialization, Steps to create a Pipeline"
|
||||
"#### Initialization, Steps to create a Pipeline\n",
|
||||
"\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -105,7 +107,7 @@
|
||||
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||
"\n",
|
||||
"# source_directory\n",
|
||||
"source_directory = '.'\n",
|
||||
"source_directory = 'publish_run_train'\n",
|
||||
"# define a single step pipeline for demonstration purpose.\n",
|
||||
"trainStep = PythonScriptStep(\n",
|
||||
" name=\"Training_Step\",\n",
|
||||
|
||||
@@ -290,7 +290,9 @@
|
||||
"- **priority:** the priority value to use for the current job *(optional)*\n",
|
||||
"- **runtime_version:** the runtime version of the Data Lake Analytics engine *(optional)*\n",
|
||||
"- **source_directory:** folder that contains the script, assemblies etc. *(optional)*\n",
|
||||
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*"
|
||||
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*\n",
|
||||
"\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Using Databricks as a Compute Target from Azure Machine Learning Pipeline\n",
|
||||
"To use Databricks as a compute target from [Azure Machine Learning Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines), a [DatabricksStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py) is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.\n",
|
||||
"To use Databricks as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a [DatabricksStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py) is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.\n",
|
||||
"\n",
|
||||
"The notebook will show:\n",
|
||||
"1. Running an arbitrary Databricks notebook that the customer has in Databricks workspace\n",
|
||||
@@ -175,7 +175,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data Connections with Inputs and Outputs\n",
|
||||
"The DatabricksStep supports Azure Blob and ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
|
||||
"The DatabricksStep supports DBFS, Azure Blob and ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
|
||||
"\n",
|
||||
"- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n",
|
||||
"- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n",
|
||||
@@ -290,7 +290,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use Databricks from Azure Machine Learning Pipeline\n",
|
||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. Let's define a datasource (via DataReference) and intermediate data (via PipelineData) to be used in DatabricksStep."
|
||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. Let's define a datasource (via DataReference), intermediate data (via PipelineData) and a pipeline parameter (via PipelineParameter) to be used in DatabricksStep."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -299,10 +299,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.pipelince.core import PipelineParameter\n",
|
||||
"\n",
|
||||
"# Use the default blob storage\n",
|
||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||
"print('Datastore {} will be used'.format(def_blob_store.name))\n",
|
||||
"\n",
|
||||
"pipeline_param = PipelineParameter(name=\"my_pipeline_param\", default_value=\"pipeline_param1\")\n",
|
||||
"\n",
|
||||
"# We are uploading a sample file in the local directory to be used as a datasource\n",
|
||||
"def_blob_store.upload_files(files=[\"./testdata.txt\"], target_path=\"dbtest\", overwrite=False)\n",
|
||||
"\n",
|
||||
@@ -406,7 +410,9 @@
|
||||
"### 1. Running the demo notebook already added to the Databricks workspace\n",
|
||||
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:\n",
|
||||
"\n",
|
||||
"your notebook's path in Azure Databricks UI by hovering over to notebook's title. A typical path of notebook looks like this `/Users/example@databricks.com/example`. See [Databricks Workspace](https://docs.azuredatabricks.net/user-guide/workspace.html) to learn about the folder structure."
|
||||
"your notebook's path in Azure Databricks UI by hovering over to notebook's title. A typical path of notebook looks like this `/Users/example@databricks.com/example`. See [Databricks Workspace](https://docs.azuredatabricks.net/user-guide/workspace.html) to learn about the folder structure.\n",
|
||||
"\n",
|
||||
"Note: DataPath `PipelineParameter` should be provided in list of inputs. Such parameters can be accessed by the datapath `name`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -423,7 +429,8 @@
|
||||
" outputs=[step_1_output],\n",
|
||||
" num_workers=1,\n",
|
||||
" notebook_path=notebook_path,\n",
|
||||
" notebook_params={'myparam': 'testparam'},\n",
|
||||
" notebook_params={'myparam': 'testparam', \n",
|
||||
" 'myparam2': pipeline_param},\n",
|
||||
" run_name='DB_Notebook_demo',\n",
|
||||
" compute_target=databricks_compute,\n",
|
||||
" allow_reuse=True\n",
|
||||
@@ -434,7 +441,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Build and submit the Experiment"
|
||||
"#### Build and submit the Experiment\n",
|
||||
"\n",
|
||||
"Note: Default value of `pipeline_param` will be used if different value is not specified in pipeline parameters during submission"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -479,7 +488,9 @@
|
||||
"dbfs cp ./train-db-dbfs.py dbfs:/train-db-dbfs.py\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The code in the below cell assumes that you have completed the previous step of uploading the script `train-db-dbfs.py` to the root folder in DBFS."
|
||||
"The code in the below cell assumes that you have completed the previous step of uploading the script `train-db-dbfs.py` to the root folder in DBFS.\n",
|
||||
"\n",
|
||||
"Note: `pipeline_param` will add two values in the python_script_params, a name followed by value. the name will be in this format `--MY_PIPELINE_PARAM`. For example, in the given case, python_script_params will be `[\"arg1\", \"--MY_PIPELINE_PARAM\", \"pipeline_param1\", \"arg2\"]`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -495,7 +506,7 @@
|
||||
" inputs=[step_1_input],\n",
|
||||
" num_workers=1,\n",
|
||||
" python_script_path=python_script_path,\n",
|
||||
" python_script_params={'--input_data'},\n",
|
||||
" python_script_params={'arg1', pipeline_param, 'arg2},\n",
|
||||
" run_name='DB_Python_demo',\n",
|
||||
" compute_target=databricks_compute,\n",
|
||||
" allow_reuse=True\n",
|
||||
@@ -545,7 +556,9 @@
|
||||
"### 3. Running a Python script in Databricks that currenlty is in local computer\n",
|
||||
"To run a Python script that is currently in your local computer, follow the instructions below. \n",
|
||||
"\n",
|
||||
"The commented out code below code assumes that you have `train-db-local.py` in the `scripts` subdirectory under the current working directory.\n",
|
||||
"The commented out code below code assumes that you have `train-db-local.py` in the `source_directory` subdirectory under the current working directory. \n",
|
||||
"\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step.\n",
|
||||
"\n",
|
||||
"In this case, the Python script will be uploaded first to DBFS, and then the script will be run in Databricks."
|
||||
]
|
||||
@@ -557,7 +570,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"python_script_name = \"train-db-local.py\"\n",
|
||||
"source_directory = \".\"\n",
|
||||
"source_directory = \"./databricks_train\"\n",
|
||||
"\n",
|
||||
"dbPythonInLocalMachineStep = DatabricksStep(\n",
|
||||
" name=\"DBPythonInLocalMachine\",\n",
|
||||
@@ -618,7 +631,9 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"dbfs cp ./train-db-dbfs.jar dbfs:/train-db-dbfs.jar\n",
|
||||
"```"
|
||||
"```\n",
|
||||
"\n",
|
||||
"Note: `pipeline_param` will add two values in the python_script_params, a name followed by value. the name will be in this format `--MY_PIPELINE_PARAM`. For example, in the given case, python_script_params will be `[\"arg1\", \"--MY_PIPELINE_PARAM\", \"pipeline_param1\", \"arg2\"]`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -635,7 +650,7 @@
|
||||
" inputs=[step_1_input],\n",
|
||||
" num_workers=1,\n",
|
||||
" main_class_name=main_jar_class_name,\n",
|
||||
" jar_params={'arg1', 'arg2'},\n",
|
||||
" jar_params={'arg1', pipeline_param, 'arg2'},\n",
|
||||
" run_name='DB_JAR_demo',\n",
|
||||
" jar_libraries=[JarLibrary(jar_library_dbfs_path)],\n",
|
||||
" compute_target=databricks_compute,\n",
|
||||
@@ -684,7 +699,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: ADLA as a Compute Target\n",
|
||||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](https://aka.ms/pl-adla) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -28,24 +28,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Introduction\n",
|
||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML via AML Pipeline. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
||||
"\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.\n",
|
||||
"\n",
|
||||
"In this notebook you would see\n",
|
||||
"In this notebook you will learn how to:\n",
|
||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||
"2. Create or Attach existing AmlCompute to a workspace.\n",
|
||||
"3. Configure AutoML using `AutoMLConfig`.\n",
|
||||
"4. Use AutoMLStep\n",
|
||||
"5. Train the model using AmlCompute\n",
|
||||
"6. Explore the results.\n",
|
||||
"7. Test the best fitted model.\n",
|
||||
"\n",
|
||||
"In addition this notebook showcases the following features\n",
|
||||
"- **Parallel** executions for iterations\n",
|
||||
"- **Asynchronous** tracking of progress\n",
|
||||
"- Retrieving models for any iteration or logged metric\n",
|
||||
"- Specifying AutoML settings as `**kwargs`"
|
||||
"3. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||
"4. Configure AutoML using `AutoMLConfig`.\n",
|
||||
"5. Use AutoMLStep\n",
|
||||
"6. Train the model using AmlCompute\n",
|
||||
"7. Explore the results.\n",
|
||||
"8. Test the best fitted model."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -69,6 +64,8 @@
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from sklearn import datasets\n",
|
||||
"import pkg_resources\n",
|
||||
"import azureml.dataprep as dprep\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
@@ -108,7 +105,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create an Azure ML experiment\n",
|
||||
"Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n"
|
||||
"Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n",
|
||||
"\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -129,7 +128,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create or Attach existing AmlCompute\n",
|
||||
"### Create or Attach an AmlCompute cluster\n",
|
||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource."
|
||||
]
|
||||
},
|
||||
@@ -166,45 +165,6 @@
|
||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prepare and Point to Data\n",
|
||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||
"This can be done by uploading the data to DataStore.\n",
|
||||
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data_train = datasets.load_digits()\n",
|
||||
"\n",
|
||||
"if not os.path.isdir('data'):\n",
|
||||
" os.mkdir('data')\n",
|
||||
" \n",
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)\n",
|
||||
" \n",
|
||||
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
|
||||
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
|
||||
"\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"ds.upload(src_dir='./data', target_path='bai_data', overwrite=True, show_progress=True)\n",
|
||||
"\n",
|
||||
"from azureml.data.data_reference import DataReference \n",
|
||||
"input_data = DataReference(datastore=ds, \n",
|
||||
" data_reference_name=\"input_data_reference\",\n",
|
||||
" path_on_datastore='bai_data',\n",
|
||||
" mode='download',\n",
|
||||
" path_on_compute='/tmp/azureml_runs',\n",
|
||||
" overwrite=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -214,54 +174,77 @@
|
||||
"# create a new RunConfig object\n",
|
||||
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||
"\n",
|
||||
"# Set compute target to AmlCompute\n",
|
||||
"#conda_run_config.target = compute_target\n",
|
||||
"\n",
|
||||
"conda_run_config.environment.docker.enabled = True\n",
|
||||
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], \n",
|
||||
" conda_packages=['numpy', 'py-xgboost'], \n",
|
||||
" pin_sdk_version=False)\n",
|
||||
" conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
|
||||
"conda_run_config.environment.python.conda_dependencies = cd\n",
|
||||
"\n",
|
||||
"print('run config is ready')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%writefile $project_folder/get_data.py\n",
|
||||
"\n",
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"def get_data():\n",
|
||||
" X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
"\n",
|
||||
" return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }\n"
|
||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
||||
"# and convert column types manually.\n",
|
||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||
"dflow.get_profile()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||
"dflow.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up AutoMLConfig for Training\n",
|
||||
"### Review the Data Preparation Result\n",
|
||||
"\n",
|
||||
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
|
||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||
"\n",
|
||||
"**Note:** When using AmlCompute, you can't pass Numpy arrays directly to the fit method.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)\n",
|
||||
"print('X and y are ready!')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"This creates a general AutoML settings object."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -271,20 +254,19 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"automl_settings = {\n",
|
||||
" \"iteration_timeout_minutes\": 5,\n",
|
||||
" \"iterations\": 20,\n",
|
||||
" \"n_cross_validations\": 5,\n",
|
||||
" \"primary_metric\": 'AUC_weighted',\n",
|
||||
" \"preprocess\": False,\n",
|
||||
" \"max_concurrent_iterations\": 3,\n",
|
||||
" \"verbosity\": logging.INFO\n",
|
||||
" \"iteration_timeout_minutes\" : 5,\n",
|
||||
" \"iterations\" : 2,\n",
|
||||
" \"primary_metric\" : 'AUC_weighted',\n",
|
||||
" \"preprocess\" : True,\n",
|
||||
" \"verbosity\" : logging.INFO\n",
|
||||
"}\n",
|
||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||
" debug_log = 'automl_errors.log',\n",
|
||||
" path = project_folder,\n",
|
||||
" compute_target=compute_target,\n",
|
||||
" run_configuration=conda_run_config,\n",
|
||||
" data_script = project_folder + \"/get_data.py\",\n",
|
||||
" X = X,\n",
|
||||
" y = y,\n",
|
||||
" **automl_settings\n",
|
||||
" )"
|
||||
]
|
||||
@@ -293,15 +275,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define AutoMLStep"
|
||||
"You can define outputs for the AutoMLStep using TrainingOutput."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -312,6 +286,7 @@
|
||||
"source": [
|
||||
"from azureml.pipeline.core import PipelineData, TrainingOutput\n",
|
||||
"\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"metrics_output_name = 'metrics_output'\n",
|
||||
"best_model_output_name = 'best_model_output'\n",
|
||||
"\n",
|
||||
@@ -325,6 +300,13 @@
|
||||
" training_output=TrainingOutput(type='Model'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create an AutoMLStep."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -333,9 +315,7 @@
|
||||
"source": [
|
||||
"automl_step = AutoMLStep(\n",
|
||||
" name='automl_module',\n",
|
||||
" experiment=experiment,\n",
|
||||
" automl_config=automl_config,\n",
|
||||
" inputs=[input_data],\n",
|
||||
" outputs=[metirics_data, model_data],\n",
|
||||
" allow_reuse=True)"
|
||||
]
|
||||
@@ -409,7 +389,7 @@
|
||||
"source": [
|
||||
"import json\n",
|
||||
"with open(metrics_output._path_on_datastore) as f: \n",
|
||||
" metrics_output_result = f.read()\n",
|
||||
" metrics_output_result = f.read()\n",
|
||||
" \n",
|
||||
"deserialized_metrics_output = json.loads(metrics_output_result)\n",
|
||||
"df = pd.DataFrame(deserialized_metrics_output)\n",
|
||||
@@ -439,11 +419,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
" import pickle\n",
|
||||
"import pickle\n",
|
||||
"\n",
|
||||
" with open(best_model_output._path_on_datastore, \"rb\" ) as f:\n",
|
||||
" best_model = pickle.load(f)\n",
|
||||
" best_model"
|
||||
"with open(best_model_output._path_on_datastore, \"rb\" ) as f:\n",
|
||||
" best_model = pickle.load(f)\n",
|
||||
"best_model"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -451,7 +431,8 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Test the Model\n",
|
||||
"#### Load Test Data"
|
||||
"#### Load Test Data\n",
|
||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -460,17 +441,17 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"digits = datasets.load_digits()\n",
|
||||
"X_test = digits.data[:10, :]\n",
|
||||
"y_test = digits.target[:10]\n",
|
||||
"images = digits.images[:10]"
|
||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Testing Best Model"
|
||||
"#### Testing Our Best Fitted Model\n",
|
||||
"\n",
|
||||
"We will use confusion matrix to see how our model works."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -479,17 +460,19 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Randomly select digits and test.\n",
|
||||
"for index in np.random.choice(len(y_test), 3, replace = False):\n",
|
||||
" print(index)\n",
|
||||
" predicted = best_model.predict(X_test[index:index + 1])[0]\n",
|
||||
" label = y_test[index]\n",
|
||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||
" ax1.set_title(title)\n",
|
||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||
" plt.show()"
|
||||
"from pandas_ml import ConfusionMatrix\n",
|
||||
"\n",
|
||||
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
|
||||
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"ypred = best_model.predict(X_test)\n",
|
||||
"\n",
|
||||
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
|
||||
"\n",
|
||||
"print(cm)\n",
|
||||
"\n",
|
||||
"cm.plot()"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||
"\n",
|
||||
"### Azure Machine Learning and Pipeline SDK-specific Imports"
|
||||
]
|
||||
@@ -76,14 +76,20 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||
"\n",
|
||||
"# Default datastore (Azure file storage)\n",
|
||||
"def_file_store = ws.get_default_datastore() \n",
|
||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
||||
"\n",
|
||||
"# Default datastore (Azure blob storage)\n",
|
||||
"# def_blob_store = ws.get_default_datastore()\n",
|
||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Source Directory\n",
|
||||
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -91,7 +97,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# source directory\n",
|
||||
"source_directory = '.'\n",
|
||||
"source_directory = 'data_dependency_run_train'\n",
|
||||
" \n",
|
||||
"print('Sample scripts will be created in {} directory.'.format(source_directory))"
|
||||
]
|
||||
@@ -340,6 +346,7 @@
|
||||
"# step5 to use the intermediate data produced by step4\n",
|
||||
"# This step also produces an output processed_data2\n",
|
||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||
"source_directory = \"data_dependency_run_extract\"\n",
|
||||
"\n",
|
||||
"extractStep = PythonScriptStep(\n",
|
||||
" script_name=\"extract.py\",\n",
|
||||
@@ -386,6 +393,7 @@
|
||||
"source": [
|
||||
"# Now define the compare step which takes two inputs and produces an output\n",
|
||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||
"source_directory = \"data_dependency_run_compare\"\n",
|
||||
"\n",
|
||||
"compareStep = PythonScriptStep(\n",
|
||||
" script_name=\"compare.py\",\n",
|
||||
@@ -509,7 +517,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Next: Publishing the Pipeline and calling it from the REST endpoint\n",
|
||||
"See this [notebook](./aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) to understand how the pipeline is published and you can call the REST endpoint to run the pipeline."
|
||||
"See this [notebook](https://aka.ms/pl-pub-rep) to understand how the pipeline is published and you can call the REST endpoint to run the pipeline."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In compare.py")
|
||||
print("As a data scientist, this is where I use my compare code.")
|
||||
parser = argparse.ArgumentParser("compare")
|
||||
parser.add_argument("--compare_data1", type=str, help="compare_data1 data")
|
||||
parser.add_argument("--compare_data2", type=str, help="compare_data2 data")
|
||||
parser.add_argument("--output_compare", type=str, help="output_compare directory")
|
||||
parser.add_argument("--pipeline_param", type=int, help="pipeline parameter")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.compare_data1)
|
||||
print("Argument 2: %s" % args.compare_data2)
|
||||
print("Argument 3: %s" % args.output_compare)
|
||||
print("Argument 4: %s" % args.pipeline_param)
|
||||
|
||||
if not (args.output_compare is None):
|
||||
os.makedirs(args.output_compare, exist_ok=True)
|
||||
print("%s created" % args.output_compare)
|
||||
@@ -0,0 +1,21 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In extract.py")
|
||||
print("As a data scientist, this is where I use my extract code.")
|
||||
|
||||
parser = argparse.ArgumentParser("extract")
|
||||
parser.add_argument("--input_extract", type=str, help="input_extract data")
|
||||
parser.add_argument("--output_extract", type=str, help="output_extract directory")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.input_extract)
|
||||
print("Argument 2: %s" % args.output_extract)
|
||||
|
||||
if not (args.output_extract is None):
|
||||
os.makedirs(args.output_extract, exist_ok=True)
|
||||
print("%s created" % args.output_extract)
|
||||
@@ -0,0 +1,22 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In train.py")
|
||||
print("As a data scientist, this is where I use my training code.")
|
||||
|
||||
parser = argparse.ArgumentParser("train")
|
||||
|
||||
parser.add_argument("--input_data", type=str, help="input data")
|
||||
parser.add_argument("--output_train", type=str, help="output_train directory")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.input_data)
|
||||
print("Argument 2: %s" % args.output_train)
|
||||
|
||||
if not (args.output_train is None):
|
||||
os.makedirs(args.output_train, exist_ok=True)
|
||||
print("%s created" % args.output_train)
|
||||
@@ -0,0 +1,5 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
print("In train.py")
|
||||
print("As a data scientist, this is where I use my training code.")
|
||||
@@ -0,0 +1,30 @@
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("*********************************************************")
|
||||
print("Hello Azure ML!")
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--datadir', type=str, help="data directory")
|
||||
parser.add_argument('--output', type=str, help="output")
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.datadir)
|
||||
print("Argument 2: %s" % args.output)
|
||||
|
||||
if not (args.output is None):
|
||||
os.makedirs(args.output, exist_ok=True)
|
||||
print("%s created" % args.output)
|
||||
|
||||
try:
|
||||
from azureml.core import Run
|
||||
run = Run.get_context()
|
||||
print("Log Fibonacci numbers.")
|
||||
run.log_list('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
|
||||
run.complete()
|
||||
except:
|
||||
print("Warning: you need to install Azure ML SDK in order to log metrics.")
|
||||
|
||||
print("*********************************************************")
|
||||
@@ -0,0 +1,24 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In compare.py")
|
||||
print("As a data scientist, this is where I use my compare code.")
|
||||
parser = argparse.ArgumentParser("compare")
|
||||
parser.add_argument("--compare_data1", type=str, help="compare_data1 data")
|
||||
parser.add_argument("--compare_data2", type=str, help="compare_data2 data")
|
||||
parser.add_argument("--output_compare", type=str, help="output_compare directory")
|
||||
parser.add_argument("--pipeline_param", type=int, help="pipeline parameter")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.compare_data1)
|
||||
print("Argument 2: %s" % args.compare_data2)
|
||||
print("Argument 3: %s" % args.output_compare)
|
||||
print("Argument 4: %s" % args.pipeline_param)
|
||||
|
||||
if not (args.output_compare is None):
|
||||
os.makedirs(args.output_compare, exist_ok=True)
|
||||
print("%s created" % args.output_compare)
|
||||
@@ -0,0 +1,21 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In extract.py")
|
||||
print("As a data scientist, this is where I use my extract code.")
|
||||
|
||||
parser = argparse.ArgumentParser("extract")
|
||||
parser.add_argument("--input_extract", type=str, help="input_extract data")
|
||||
parser.add_argument("--output_extract", type=str, help="output_extract directory")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.input_extract)
|
||||
print("Argument 2: %s" % args.output_extract)
|
||||
|
||||
if not (args.output_extract is None):
|
||||
os.makedirs(args.output_extract, exist_ok=True)
|
||||
print("%s created" % args.output_extract)
|
||||
@@ -0,0 +1,22 @@
|
||||
# Copyright (c) Microsoft. All rights reserved.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
print("In train.py")
|
||||
print("As a data scientist, this is where I use my training code.")
|
||||
|
||||
parser = argparse.ArgumentParser("train")
|
||||
|
||||
parser.add_argument("--input_data", type=str, help="input data")
|
||||
parser.add_argument("--output_train", type=str, help="output_train directory")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Argument 1: %s" % args.input_data)
|
||||
print("Argument 2: %s" % args.output_train)
|
||||
|
||||
if not (args.output_train is None):
|
||||
os.makedirs(args.output_train, exist_ok=True)
|
||||
print("%s created" % args.output_train)
|
||||
@@ -645,7 +645,26 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.7"
|
||||
}
|
||||
},
|
||||
"friendly_name": "Pipeline test",
|
||||
"exclude_from_index": false,
|
||||
"order_index": 1,
|
||||
"category": "training",
|
||||
"tags": [
|
||||
],
|
||||
"task": "Regression",
|
||||
"datasets": [
|
||||
"NYC Taxi"
|
||||
],
|
||||
"compute": [
|
||||
"local"
|
||||
],
|
||||
"deployment": [
|
||||
"None"
|
||||
],
|
||||
"framework": [
|
||||
"Azure ML AutoML"
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
||||
@@ -1,260 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License.\n",
|
||||
"\n",
|
||||
"## Authentication in Azure Machine Learning\n",
|
||||
"\n",
|
||||
"This notebook shows you how to authenticate to your Azure ML Workspace using\n",
|
||||
"\n",
|
||||
" 1. Interactive Login Authentication\n",
|
||||
" 2. Azure CLI Authentication\n",
|
||||
" 3. Service Principal Authentication\n",
|
||||
" \n",
|
||||
"The interactive authentication is suitable for local experimentation on your own computer. Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once. The Service Principal authentication is suitable for automated workflows, for example as part of Azure Devops build."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Interactive Authentication\n",
|
||||
"\n",
|
||||
"Interactive authentication is the default mode when using Azure ML SDK.\n",
|
||||
"\n",
|
||||
"When you connect to your workspace using workspace.from_config, you will get an interactive login dialog."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Also, if you explicitly specify the subscription ID, resource group and resource group, you will get the dialog."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
|
||||
" resource_group=\"my-ml-rg\",\n",
|
||||
" workspace_name=\"my-ml-workspace\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the user you're authenticated as must have access to the subscription and resource group. If you receive an error\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"AuthenticationException: You don't have access to xxxxxx-xxxx-xxx-xxx-xxxxxxxxxx subscription. All the subscriptions that you have access to = ...\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"check that the you used correct login and entered the correct subscription ID."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In some cases, you may see a version of the error message containing text: ```All the subscriptions that you have access to = []```\n",
|
||||
"\n",
|
||||
"In such a case, you may have to specify the tenant ID of the Azure Active Directory you're using. An example would be accessing a subscription as a guest to a tenant that is not your default. You specify the tenant by explicitly instantiating _InteractiveLoginAuthentication_ with tenant ID as argument ([see instructions how to obtain tenant Id](#get-tenant-id))."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
|
||||
"\n",
|
||||
"interactive_auth = InteractiveLoginAuthentication(tenant_id=\"my-tenant-id\")\n",
|
||||
"\n",
|
||||
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
|
||||
" resource_group=\"my-ml-rg\",\n",
|
||||
" workspace_name=\"my-ml-workspace\",\n",
|
||||
" auth=interactive_auth)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure CLI Authentication\n",
|
||||
"\n",
|
||||
"If you have installed azure-cli package, and used ```az login``` command to log in to your Azure Subscription, you can use _AzureCliAuthentication_ class.\n",
|
||||
"\n",
|
||||
"Note that interactive authentication described above won't use existing Azure CLI auth tokens. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.authentication import AzureCliAuthentication\n",
|
||||
"\n",
|
||||
"cli_auth = AzureCliAuthentication()\n",
|
||||
"\n",
|
||||
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
|
||||
" resource_group=\"my-ml-rg\",\n",
|
||||
" workspace_name=\"my-ml-workspace\",\n",
|
||||
" auth=cli_auth)\n",
|
||||
"\n",
|
||||
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Service Principal Authentication\n",
|
||||
"\n",
|
||||
"When setting up a machine learning workflow as an automated process, we recommend using Service Principal Authentication. This approach decouples the authentication from any specific user login, and allows managed access control.\n",
|
||||
"\n",
|
||||
"Note that you must have administrator privileges over the Azure subscription to complete these steps.\n",
|
||||
"\n",
|
||||
"The first step is to create a service principal. First, go to [Azure Portal](https://portal.azure.com), select **Azure Active Directory** and **App Registrations**. Then select **+New application registration**, give your service principal a name, for example _my-svc-principal_. You can leave application type as is, and specify a dummy value for Sign-on URL, such as _https://invalid_.\n",
|
||||
"\n",
|
||||
"Then click **Create**.\n",
|
||||
"\n",
|
||||
"![service principal creation]<img src=\"images/svc-pr-1.PNG\">"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next step is to obtain the _Application ID_ (also called username) and create _password_ for the service principal.\n",
|
||||
"\n",
|
||||
"From the page for your newly created service principal, copy the _Application ID_. Then select **Settings** and **Keys**, write a description for your key, and select duration. Then click **Save**, and copy the _password_ to a secure location.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a id =\"get-tenant-id\"></a>\n",
|
||||
"\n",
|
||||
"Also, you need to obtain the tenant ID of your Azure subscription. Go back to **Azure Active Directory**, select **Properties** and copy _Directory ID_.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally, you need to give the service principal permissions to access your workspace. Navigate to **Resource Groups**, to the resource group for your Machine Learning Workspace. \n",
|
||||
"\n",
|
||||
"Then select **Access Control (IAM)** and **Add a role assignment**. For _Role_, specify which level of access you need to grant, for example _Contributor_. Start entering your service principal name and once it is found, select it, and click **Save**.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you are ready to use the service principal authentication. For example, to connect to your Workspace, see code below and enter your own values for tenant ID, application ID, subscription ID, resource group and workspace.\n",
|
||||
"\n",
|
||||
"**We strongly recommended that you do not insert the secret password to code**. Instead, you can use environment variables to pass it to your code, for example through Azure Key Vault, or through secret build variables in Azure DevOps. For local testing, you can for example use following PowerShell command to set the environment variable.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"$env:AZUREML_PASSWORD = \"my-password\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
||||
"\n",
|
||||
"svc_pr_password = os.environ.get(\"AZUREML_PASSWORD\")\n",
|
||||
"\n",
|
||||
"svc_pr = ServicePrincipalAuthentication(\n",
|
||||
" tenant_id=\"my-tenant-id\",\n",
|
||||
" service_principal_id=\"my-application-id\",\n",
|
||||
" service_principal_password=svc_pr_password)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"ws = Workspace(\n",
|
||||
" subscription_id=\"my-subscription-id\",\n",
|
||||
" resource_group=\"my-ml-rg\",\n",
|
||||
" workspace_name=\"my-ml-workspace\",\n",
|
||||
" auth=svc_pr\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "roastala"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -17,7 +17,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -33,10 +33,9 @@
|
||||
"source": [
|
||||
"## Install the DataDrift package\n",
|
||||
"\n",
|
||||
"Install the azureml-contrib-datadrift, azureml-contrib-opendatasets and lightgbm packages before running this notebook.\n",
|
||||
"Install the azureml-contrib-datadrift, azureml-opendatasets and lightgbm packages before running this notebook.\n",
|
||||
"```\n",
|
||||
"pip install azureml-contrib-datadrift\n",
|
||||
"pip install azureml-contrib-datasets\n",
|
||||
"pip install lightgbm\n",
|
||||
"```"
|
||||
]
|
||||
@@ -63,7 +62,7 @@
|
||||
"import pandas as pd\n",
|
||||
"import requests\n",
|
||||
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
||||
"from azureml.contrib.opendatasets import NoaaIsdWeather\n",
|
||||
"from azureml.opendatasets import NoaaIsdWeather\n",
|
||||
"from azureml.core import Dataset, Workspace, Run\n",
|
||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
@@ -82,7 +81,7 @@
|
||||
"source": [
|
||||
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||
"\n",
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -140,26 +139,26 @@
|
||||
"\n",
|
||||
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def enrich_weather_noaa_data(noaa_df):\n",
|
||||
" hours_in_day = 23\n",
|
||||
" week_in_year = 52\n",
|
||||
" \n",
|
||||
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
|
||||
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
|
||||
" \n",
|
||||
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||
"\n",
|
||||
" noaa_df = noaa_df.assign(hour=noaa_df[\"datetime\"].dt.hour,\n",
|
||||
" weekofyear=noaa_df[\"datetime\"].dt.week,\n",
|
||||
" sine_weekofyear=noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year)),\n",
|
||||
" cosine_weekofyear=noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year)),\n",
|
||||
" sine_hourofday=noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day)),\n",
|
||||
" cosine_hourofday=noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||
" )\n",
|
||||
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||
" \n",
|
||||
" return noaa_df\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def add_window_col(input_df):\n",
|
||||
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
||||
" df_shifted = input_df.copy()\n",
|
||||
" df_shifted.loc[:,'datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
||||
"\n",
|
||||
" # merge, keeping only observations where -1 lag is present\n",
|
||||
@@ -259,8 +258,7 @@
|
||||
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
||||
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
||||
"\n",
|
||||
"trainingDataSnapshot = trainingDataset.create_snapshot(snapshot_name=snapshot_name, compute_target=None, create_data_snapshot=True)\n",
|
||||
"datasets = [(Dataset.Scenario.TRAINING, trainingDataSnapshot)]\n",
|
||||
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
|
||||
"print(\"dataset registration done.\\n\")\n",
|
||||
"datasets"
|
||||
]
|
||||
@@ -574,6 +572,22 @@
|
||||
" time.sleep(3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"time.sleep(600)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -702,7 +716,8 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
}
|
||||
},
|
||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
@@ -0,0 +1,8 @@
|
||||
name: azure-ml-datadrift
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- azureml-contrib-datadrift
|
||||
- azureml-opendatasets
|
||||
- lightgbm
|
||||
- azureml-widgets
|
||||
@@ -14,6 +14,7 @@ These examples show you:
|
||||
10. [Distributed training using Chainer](distributed-chainer)
|
||||
11. [Export run history records to Tensorboard](export-run-history-to-tensorboard)
|
||||
12. [Use TensorBoard to monitor training execution](tensorboard)
|
||||
13. [Resuming training from previous run](train-tensorflow-resume-training)
|
||||
|
||||
Learn more about how to use `Estimator` class to [train deep neural networks with Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models).
|
||||
|
||||
|
||||
@@ -153,7 +153,11 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"tensorboard-export-sample"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Export Run History to Tensorboard logs\n",
|
||||
|
||||
@@ -227,7 +227,11 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"tensorboard-sample"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.tensorboard import Tensorboard\n",
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
@@ -131,6 +132,8 @@ def main():
|
||||
|
||||
run.log("Accuracy", np.float(val_accuracy))
|
||||
|
||||
serializers.save_npz(os.path.join(args.output_dir, 'model.npz'), model)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
@@ -0,0 +1,45 @@
|
||||
import numpy as np
|
||||
import os
|
||||
import json
|
||||
|
||||
from chainer import serializers, using_config, Variable, datasets
|
||||
import chainer.functions as F
|
||||
import chainer.links as L
|
||||
from chainer import Chain
|
||||
|
||||
from azureml.core.model import Model
|
||||
|
||||
|
||||
class MyNetwork(Chain):
|
||||
|
||||
def __init__(self, n_mid_units=100, n_out=10):
|
||||
super(MyNetwork, self).__init__()
|
||||
with self.init_scope():
|
||||
self.l1 = L.Linear(None, n_mid_units)
|
||||
self.l2 = L.Linear(n_mid_units, n_mid_units)
|
||||
self.l3 = L.Linear(n_mid_units, n_out)
|
||||
|
||||
def forward(self, x):
|
||||
h = F.relu(self.l1(x))
|
||||
h = F.relu(self.l2(h))
|
||||
return self.l3(h)
|
||||
|
||||
|
||||
def init():
|
||||
global model
|
||||
|
||||
model_root = Model.get_model_path('chainer-dnn-mnist')
|
||||
|
||||
# Load our saved artifacts
|
||||
model = MyNetwork()
|
||||
serializers.load_npz(model_root, model)
|
||||
|
||||
|
||||
def run(input_data):
|
||||
i = np.array(json.loads(input_data)['data'])
|
||||
|
||||
_, test = datasets.get_mnist()
|
||||
x = Variable(np.asarray([test[i][0]]))
|
||||
y = model(x)
|
||||
|
||||
return np.ndarray.tolist(y.data.argmax(axis=1))
|
||||
@@ -45,6 +45,16 @@
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!jupyter nbextension install --py --user azureml.widgets\n",
|
||||
"!jupyter nbextension enable --py --user azureml.widgets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -121,6 +131,7 @@
|
||||
"except ComputeTargetException:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||
" min_nodes=2,\n",
|
||||
" max_nodes=4)\n",
|
||||
"\n",
|
||||
" # create the cluster\n",
|
||||
@@ -206,7 +217,8 @@
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"\n",
|
||||
"shutil.copy('chainer_mnist.py', project_folder)"
|
||||
"shutil.copy('chainer_mnist.py', project_folder)\n",
|
||||
"shutil.copy('chainer_score.py', project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -353,6 +365,7 @@
|
||||
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
|
||||
" hyperparameter_sampling=param_sampling, \n",
|
||||
" primary_metric_name='Accuracy',\n",
|
||||
" policy=BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3),\n",
|
||||
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
||||
" max_total_runs=8,\n",
|
||||
" max_concurrent_runs=4)\n"
|
||||
@@ -398,14 +411,344 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
"hyperdrive_run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Find and register best model\n",
|
||||
"When all jobs finish, we can find out the one that has the highest accuracy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
|
||||
"print(best_run.get_details()['runDefinition']['arguments'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's list the model files uploaded during the run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(best_run.get_file_names())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can then register the folder (and all files in it) as a model named `chainer-dnn-mnist` under the workspace for deployment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = best_run.register_model(model_name='chainer-dnn-mnist', model_path='outputs/model.npz')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy the model in ACI\n",
|
||||
"Now, we are ready to deploy the model as a web service running in Azure Container Instance, [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n",
|
||||
"\n",
|
||||
"### Create scoring script\n",
|
||||
"First, we will create a scoring script that will be invoked by the web service call.\n",
|
||||
"+ Now that the scoring script must have two required functions, `init()` and `run(input_data)`.\n",
|
||||
" + In `init()`, you typically load the model into a global object. This function is executed only once when the Docker contianer is started.\n",
|
||||
" + In `run(input_data)`, the model is used to predict a value based on the input data. The input and output to `run` uses NPZ as the serialization and de-serialization format because it is the preferred format for Chainer, but you are not limited to it.\n",
|
||||
" \n",
|
||||
"Refer to the scoring script `chainer_score.py` for this tutorial. Our web service will use this file to predict. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"shutil.copy('chainer_score.py', project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create myenv.yml\n",
|
||||
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `numpy` and `chainer`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.runconfig import CondaDependencies\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create()\n",
|
||||
"cd.add_conda_package('numpy')\n",
|
||||
"cd.add_conda_package('chainer')\n",
|
||||
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
|
||||
"\n",
|
||||
"print(cd.serialize_to_string())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deploy to ACI\n",
|
||||
"We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigabytes of RAM needed for your ACI container."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n",
|
||||
" auth_enabled=True, # this flag generates API keys to secure access\n",
|
||||
" memory_gb=1,\n",
|
||||
" tags={'name': 'mnist', 'framework': 'Chainer'},\n",
|
||||
" description='Chainer DNN with MNIST')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Deployment Process**\n",
|
||||
"\n",
|
||||
"Now we can deploy. **This cell will run for about 7-8 minutes.** Behind the scenes, it will do the following:\n",
|
||||
"\n",
|
||||
"1. **Build Docker image**\n",
|
||||
"Build a Docker image using the scoring file (chainer_score.py), the environment file (myenv.yml), and the model object.\n",
|
||||
"2. **Register image**\n",
|
||||
"Register that image under the workspace.\n",
|
||||
"3. **Ship to ACI**\n",
|
||||
"And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"imgconfig = ContainerImage.image_configuration(execution_script=\"chainer_score.py\", \n",
|
||||
" runtime=\"python\", \n",
|
||||
" conda_file=\"myenv.yml\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"\n",
|
||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||
" name='chainer-mnist-1',\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
" models=[model],\n",
|
||||
" image_config=imgconfig)\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(service.get_logs())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(service.scoring_uri)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** `print(service.get_logs())`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This is the scoring web service endpoint: `print(service.scoring_uri)`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Test the deployed model\n",
|
||||
"Let's test the deployed model. Pick a random sample from the test set, and send it to the web service hosted in ACI for a prediction. Note, here we are using the an HTTP request to invoke the service.\n",
|
||||
"\n",
|
||||
"We can retrieve the API keys used for accessing the HTTP endpoint and construct a raw HTTP request to send to the service. Don't forget to add key to the HTTP header."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# retreive the API keys. two keys were generated.\n",
|
||||
"key1, Key2 = service.get_keys()\n",
|
||||
"print(key1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%matplotlib inline\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import urllib\n",
|
||||
"import gzip\n",
|
||||
"import numpy as np\n",
|
||||
"import struct\n",
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# load compressed MNIST gz files and return numpy arrays\n",
|
||||
"def load_data(filename, label=False):\n",
|
||||
" with gzip.open(filename) as gz:\n",
|
||||
" struct.unpack('I', gz.read(4))\n",
|
||||
" n_items = struct.unpack('>I', gz.read(4))\n",
|
||||
" if not label:\n",
|
||||
" n_rows = struct.unpack('>I', gz.read(4))[0]\n",
|
||||
" n_cols = struct.unpack('>I', gz.read(4))[0]\n",
|
||||
" res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)\n",
|
||||
" res = res.reshape(n_items[0], n_rows * n_cols)\n",
|
||||
" else:\n",
|
||||
" res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)\n",
|
||||
" res = res.reshape(n_items[0], 1)\n",
|
||||
" return res\n",
|
||||
"\n",
|
||||
"os.makedirs('./data/mnist', exist_ok=True)\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
|
||||
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')\n",
|
||||
"\n",
|
||||
"X_test = load_data('./data/mnist/test-images.gz', False)\n",
|
||||
"y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# send a random row from the test set to score\n",
|
||||
"random_index = np.random.randint(0, len(X_test)-1)\n",
|
||||
"input_data = \"{\\\"data\\\": [\" + str(random_index) + \"]}\"\n",
|
||||
"\n",
|
||||
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||
"\n",
|
||||
"# send sample to service for scoring\n",
|
||||
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"label:\", y_test[random_index])\n",
|
||||
"print(\"prediction:\", resp.text[1])\n",
|
||||
"\n",
|
||||
"plt.imshow(X_test[random_index].reshape((28,28)), cmap='gray')\n",
|
||||
"plt.axis('off')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the workspace after the web service was deployed. You should see\n",
|
||||
"\n",
|
||||
" + a registered model named 'chainer-dnn-mnist' and with the id 'chainer-dnn-mnist:1'\n",
|
||||
" + an image called 'chainer-mnist-svc' and with a docker image location pointing to your workspace's Azure Container Registry (ACR)\n",
|
||||
" + a webservice called 'chainer-mnist-svc' with some scoring URL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"models = ws.models\n",
|
||||
"for name, model in models.items():\n",
|
||||
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
|
||||
" \n",
|
||||
"images = ws.images\n",
|
||||
"for name, image in images.items():\n",
|
||||
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
|
||||
" \n",
|
||||
"webservices = ws.webservices\n",
|
||||
"for name, webservice in webservices.items():\n",
|
||||
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Clean up"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can delete the ACI deployment with a simple delete API call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"service.delete()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "ninhu"
|
||||
"name": "dipeck"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -424,7 +767,8 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
}
|
||||
},
|
||||
"msauthor": "dipeck"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
||||
@@ -4,4 +4,9 @@ dependencies:
|
||||
- azureml-sdk
|
||||
- azureml-widgets
|
||||
- numpy
|
||||
- pytest
|
||||
- matplotlib
|
||||
- json
|
||||
- urllib
|
||||
- gzip
|
||||
- struct
|
||||
- requests
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user