Compare commits

..

22 Commits

Author SHA1 Message Date
Shané Winner
cf0490ab92 Update auto-ml-classification-bank-marketing.ipynb 2019-10-31 12:24:08 -07:00
Shané Winner
9f0e817c70 Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:28:00 -07:00
Shané Winner
a4d713d19b Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:25:03 -07:00
Shané Winner
91a20a0ff9 Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:24:01 -07:00
Shané Winner
a0c510bf42 Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:23:17 -07:00
Shané Winner
116d57c012 Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:19:51 -07:00
Shané Winner
660708db63 Update auto-ml-classification-with-deployment.ipynb 2019-10-09 15:10:30 -07:00
Shané Winner
206df82f9b Update auto-ml-classification-with-deployment.ipynb 2019-10-08 08:34:28 -07:00
Shané Winner
7cfb2da5b8 Update configuration.ipynb 2019-10-01 17:37:28 -07:00
Shané Winner
e5adb4af3a Update configuration.ipynb 2019-10-01 17:35:15 -07:00
Shané Winner
b849267220 Update configuration.ipynb 2019-10-01 16:08:07 -07:00
Shané Winner
9891080b70 Update configuration.ipynb 2019-10-01 16:06:35 -07:00
Shané Winner
2974e86aa0 Update configuration.ipynb 2019-10-01 15:59:06 -07:00
Shané Winner
0a18161193 Create index2.md 2019-09-12 11:32:41 -07:00
Shane' Winner
c676cc9969 index changes 2019-08-20 11:12:45 -07:00
Shané Winner
50f4bc9643 Delete index.md 2019-08-19 14:42:27 -07:00
Shané Winner
f3c7072735 Add files via upload 2019-08-19 14:41:56 -07:00
Shané Winner
44295d9e16 Delete build_nb_index.py 2019-08-19 14:39:02 -07:00
Shane' Winner
710fc0bb4b added more metadata 2019-08-19 14:34:36 -07:00
Shane' Winner
c44dba427f fixed error 2019-08-19 14:22:08 -07:00
Shane' Winner
8066a9263c added a datadrift folder 2019-08-19 13:53:50 -07:00
Shane' Winner
054aadffed index test 2019-08-18 17:27:37 -07:00
280 changed files with 10234 additions and 16745 deletions

View File

@@ -1,43 +1,30 @@
---
name: Notebook issue
about: Describe your notebook issue
title: "[Notebook] DESCRIPTIVE TITLE"
labels: notebook
about: Create a report to help us improve
title: "[Notebook issue]"
labels: ''
assignees: ''
---
### DESCRIPTION: Describe clearly + concisely
**Describe the bug**
A clear and concise description of what the bug is.
Provide the following if applicable:
+ Your Python & SDK version
+ Python Scripts or the full notebook name
+ Pipeline definition
+ Environment definition
+ Example data
+ Any log files.
+ Run and Workspace Id
.
### REPRODUCIBLE: Steps
**To Reproduce**
Steps to reproduce the behavior:
1.
**Expected behavior**
A clear and concise description of what you expected to happen.
.
### EXPECTATION: Clear description
.
### CONFIG/ENVIRONMENT:
```Provide where applicable
## Your Python & SDK version:
## Environment definition:
## Notebook name or Python scripts:
## Run and Workspace Id:
## Pipeline definition:
## Example data:
## Any log files:
```
**Additional context**
Add any other context about the problem here.

View File

@@ -2,8 +2,7 @@
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/concept-azure-machine-learning-architecture/workflow.png)
![Azure ML workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/overview-what-is-azure-ml/aml.png)
## Quick installation
```sh
@@ -50,13 +49,10 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
---
## Community Repository
Visit this [community repository](https://github.com/microsoft/MLOps/tree/master/examples) to find useful end-to-end sample notebooks. Also, please follow these [contribution guidelines](https://github.com/microsoft/MLOps/blob/master/contributing.md) when contributing to this repository.
## Projects using Azure Machine Learning
Visit following repos to see projects contributed by Azure ML users:
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)

215
build_nb_index.py Normal file
View File

@@ -0,0 +1,215 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
### USAGE
#
# 1. Add following metadata elements to the notebook
#
# "friendly_name": "string", friendly name for notebook
# "exclude_from_index": true/false, setting true excludes the notebook from index
# "order_index": integer, smaller value moves notebook closer to beginning
# "category": "starter", "tutorial", "training", "deployment" or "other"
# "tags": [ "featured" ], optional, only supported tag to highlight notebook with :star: symbol
# "task": "string", description of notebook task
# "datasets": [ "dataset 1", "dataset 2"], list of datasets, can be ["None"]
# "compute": [ "compute 1", "compute 2" ], list of computes, can be ["None"]
# "deployment": ["deployment 1", "deployment 2"], list of deployment targets, can be ["None"]
# "framework": ["fw 1", "fw2"], list of ml framework, can be ["None"]
#
# 2. Then run
#
# build_nb_index.py <root folder of notebooks>
#
# 3. The script should produce index.md file with tables of notebook indices
### Example metadata section
'''
"metadata": {
"authors": [
{
"name": "cforbe"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"msauthor": "trbye",
"friendly_name": "Prepare data for regression modeling",
"exclude_from_index": false,
"order_index": 1,
"category": "tutorial",
"tags": [
"featured"
],
"task": "Regression",
"datasets": [
"NYC Taxi"
],
"compute": [
"local"
],
"deployment": [
"None"
],
"framework": [
"Azure ML AutoML"
]
}
'''
import os, json, sys
from shutil import copyfile, copytree, rmtree
# Index building walk over notebook folder
def post_process(notebooks_dir):
indexer = NotebookIndex()
n_dest = len(notebooks_dir)
for r, d, f in os.walk(notebooks_dir):
for file in f:
# Handle only notebooks
if file.endswith(".ipynb") and not file.endswith('checkpoint.ipynb'):
try:
file_path = os.path.join(r, file)
with open(file_path, 'r') as fin:
content = json.load(fin)
print(file)
indexer.add_to_index(os.path.join(r[n_dest:],file), content["metadata"])
except Exception as e:
print("Problem: ",str(e))
indexer.write_index("./index.md")
### Customize these make index look different
index_template = '''
# Index
Azure Machine Learning is a cloud service that you use to train, deploy, automate, and manage machine learning models. This index should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content.
## Getting Started
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
GETTING_STARTED_NBS
## Tutorials
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
TUTORIAL_NBS
## Training
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
TRAINING_NBS
## Deployment
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
DEPLOYMENT_NBS
## Other Notebooks
|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework |
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|
OTHER_NBS
'''
index_row = '''| NB_SYMBOL[NB_NAME](NB_PATH) | NB_TASK | NB_DATASET | NB_COMPUTE | NB_DEPLOYMENT | NB_FRAMEWORK |'''
index_file = "index.md"
nb_types = ["starter", "tutorial", "training", "deployment", "other"]
replace_strings = ["GETTING_STARTED_NBS", "TUTORIAL_NBS", "TRAINING_NBS", "DEPLOYMENT_NBS", "OTHER_NBS"]
class NotebookIndex:
def __init__(self):
self.index = index_template
self.nb_rows = {}
for elem in nb_types:
self.nb_rows[elem] = []
def add_to_index(self, path_to_notebook, metadata):
repo_url = "https://github.com/Azure/MachineLearningNotebooks/blob/master/"
if "exclude_from_index" in metadata:
if metadata["exclude_from_index"]:
return
if "friendly_name" in metadata:
this_row = index_row.replace("NB_NAME",metadata["friendly_name"])
else:
this_name = os.path.basename(path_to_notebook)
this_row = index_row.replace("NB_NAME", this_name[:-6])
path_to_notebook = path_to_notebook.replace("\\","/")
this_row = this_row.replace("NB_PATH", repo_url + path_to_notebook)
if "task" in metadata:
this_row = this_row.replace("NB_TASK", metadata["task"])
if "datasets" in metadata:
this_row = this_row.replace("NB_DATASET", ", ".join(metadata["datasets"]))
if "compute" in metadata:
this_row = this_row.replace("NB_COMPUTE", ", ".join(metadata["compute"]))
if "deployment" in metadata:
this_row = this_row.replace("NB_DEPLOYMENT", ", ".join(metadata["deployment"]))
if "framework" in metadata:
this_row = this_row.replace("NB_FRAMEWORK", ", ".join(metadata["framework"]))
## Fall back
this_row = this_row.replace("NB_TASK","")
this_row = this_row.replace("NB_DATASET","")
this_row = this_row.replace("NB_COMPUTE","")
this_row = this_row.replace("NB_DEPLOYMENT","")
this_row = this_row.replace("NB_FRAMEWORK","")
if "tags" in metadata:
if "featured" in metadata["tags"]:
this_row = this_row.replace("NB_SYMBOL",":star:")
## Fall back
this_row =this_row.replace("NB_SYMBOL","")
index_order = 9999999
if "index_order" in metadata:
index_order = metadata["index_order"]
if "category" in metadata:
self.nb_rows[metadata["category"]].append((index_order, this_row))
else:
self.nb_rows["other"].append((index_order, this_row))
def sort_and_stringify(self,section):
sorted_index = sorted(self.nb_rows[section], key = lambda x: x[0])
sorted_index = [x[1] for x in sorted_index]
## TODO: Make this portable
return "\n".join(sorted_index)
def write_index(self, index_file):
for nb_type, replace_string in zip(nb_types, replace_strings):
nb_string = self.sort_and_stringify(nb_type)
self.index = self.index.replace(replace_string, nb_string)
with open(index_file,"w") as fin:
fin.write(self.index)
try:
dest_repo = sys.argv[1]
except:
dest_repo = "./MachineLearningNotebooks"
post_process(dest_repo)

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.0.62 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.0.55 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -214,7 +214,8 @@
"* You do not have permission to create a resource group if it's non-existing.\n",
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
"\n",
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.\n",
"To learn more about the Enterprise SKU, please visit the Pricing and SKU details page."
]
},
{
@@ -230,11 +231,14 @@
"from azureml.core import Workspace\n",
"\n",
"# Create the workspace using the specified parameters\n",
"# To create an Enterprise workspace, please specify the sku = enterprise\n",
"ws = Workspace.create(name = workspace_name,\n",
" subscription_id = subscription_id,\n",
" resource_group = resource_group, \n",
" location = workspace_region,\n",
" create_resource_group = True,\n",
" create_resource_group = True,\n",
" sku = basic,\n",
" exist_ok = True)\n",
"ws.get_details()\n",
"\n",
@@ -380,4 +384,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}

View File

@@ -0,0 +1,723 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Track Data Drift between Training and Inference Data in Production \n",
"\n",
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
"\n",
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
"\n",
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/datadrift/azureml-datadrift.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prerequisites and Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install the DataDrift package\n",
"\n",
"Install the azureml-contrib-datadrift, azureml-opendatasets and lightgbm packages before running this notebook.\n",
"```\n",
"pip install azureml-contrib-datadrift\n",
"pip install lightgbm\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"import time\n",
"from datetime import datetime, timedelta\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import requests\n",
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
"from azureml.opendatasets import NoaaIsdWeather\n",
"from azureml.core import Dataset, Workspace, Run\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.image import ContainerImage\n",
"from azureml.core.model import Model\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.widgets import RunDetails\n",
"from sklearn.externals import joblib\n",
"from sklearn.model_selection import train_test_split\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up Configuraton and Create Azure ML Workspace\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
"prefix = \"dd\"\n",
"\n",
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
"model_name = \"driftmodel\"\n",
"image_name = \"{}driftimage\".format(prefix)\n",
"service_name = \"{}driftservice\".format(prefix)\n",
"\n",
"# optionally, set email address to receive an email alert for DataDrift\n",
"email_address = \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Train/Testing Data\n",
"\n",
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
" '725066']\n",
"\n",
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
"\n",
"\n",
"def enrich_weather_noaa_data(noaa_df):\n",
" hours_in_day = 23\n",
" week_in_year = 52\n",
" \n",
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
" \n",
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
"\n",
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
" \n",
" return noaa_df\n",
"\n",
"def add_window_col(input_df):\n",
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
" df_shifted = input_df.copy()\n",
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
"\n",
" # merge, keeping only observations where -1 lag is present\n",
" df2 = pd.merge(input_df,\n",
" df_shifted,\n",
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
" how='inner', # use 'left' to keep observations without lags\n",
" suffixes=['', '-7'])\n",
" return df2\n",
"\n",
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
" # Read into Pandas data frame.\n",
" noaa_df = isd.to_pandas_dataframe()\n",
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
" \n",
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
" df_filtered.reset_index(drop=True)\n",
" \n",
" # Enrich with time features\n",
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
" \n",
" return df_enriched\n",
"\n",
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
" noaa_df = pd.concat([df_1, df_2])\n",
" \n",
" print(\"Adding window feature\")\n",
" df_window = add_window_col(noaa_df)\n",
" \n",
" cat_columns = df_window.dtypes == object\n",
" cat_columns = cat_columns[cat_columns == True]\n",
" \n",
" print(\"Encoding categorical columns\")\n",
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
" \n",
" print(\"Dropping unnecessary columns\")\n",
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
" \n",
" return df_featurized"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train model on Jan 1 - 14, 2009 data\n",
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"label = \"temperature\"\n",
"x_df = df.drop(label, axis=1)\n",
"y_df = df[[label]]\n",
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
"\n",
"training_dir = 'outputs/training'\n",
"training_file = \"training.csv\"\n",
"\n",
"# Generate training dataframe to register as Training Dataset\n",
"os.makedirs(training_dir, exist_ok=True)\n",
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
"training_df.to_csv(training_dir + \"/\" + training_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create/Register Training Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_name = \"dataset\"\n",
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
"\n",
"dstore = ws.get_default_datastore()\n",
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
"dpath = dstore.path(\"data/training/training.csv\")\n",
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
"\n",
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
"print(\"dataset registration done.\\n\")\n",
"datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train and Save Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import lightgbm as lgb\n",
"\n",
"train = lgb.Dataset(data=x_train, \n",
" label=y_train)\n",
"\n",
"test = lgb.Dataset(data=x_test, \n",
" label=y_test,\n",
" reference=train)\n",
"\n",
"params = {'learning_rate' : 0.1,\n",
" 'boosting' : 'gbdt',\n",
" 'metric' : 'rmse',\n",
" 'feature_fraction' : 1,\n",
" 'bagging_fraction' : 1,\n",
" 'max_depth': 6,\n",
" 'num_leaves' : 31,\n",
" 'objective' : 'regression',\n",
" 'bagging_freq' : 1,\n",
" \"verbose\": -1,\n",
" 'min_data_per_leaf': 100}\n",
"\n",
"model = lgb.train(params, \n",
" num_boost_round=500,\n",
" train_set=train,\n",
" valid_sets=[train, test],\n",
" verbose_eval=50,\n",
" early_stopping_rounds=25)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
"\n",
"os.makedirs('outputs', exist_ok=True)\n",
"joblib.dump(model, model_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = Model.register(model_path=model_file,\n",
" model_name=model_name,\n",
" workspace=ws,\n",
" datasets=datasets)\n",
"\n",
"print(model_name, image_name, service_name, model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy Model To AKS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare Environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Image creation may take up to 15 minutes.\n",
"\n",
"image_name = image_name + str(model.version)\n",
"\n",
"if not image_name in ws.images:\n",
" # Use the score.py defined in this directory as the execution script\n",
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
" runtime=\"python\",\n",
" conda_file=\"myenv.yml\",\n",
" description=\"Image with weather dataset model\")\n",
" image = ContainerImage.create(name=image_name,\n",
" models=[model],\n",
" image_config=image_config,\n",
" workspace=ws)\n",
"\n",
" image.wait_for_creation(show_output=True)\n",
"else:\n",
" image = ws.images[image_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Compute Target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_name = 'dd-demo-e2e'\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"if not aks_name in ws.compute_targets:\n",
" aks_target = ComputeTarget.create(workspace=ws,\n",
" name=aks_name,\n",
" provisioning_configuration=prov_config)\n",
"\n",
" aks_target.wait_for_completion(show_output=True)\n",
" print(aks_target.provisioning_state)\n",
" print(aks_target.provisioning_errors)\n",
"else:\n",
" aks_target=ws.compute_targets[aks_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy Service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service_name = service_name\n",
"\n",
"if not aks_service_name in ws.webservices:\n",
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
" name=aks_service_name,\n",
" image=image,\n",
" deployment_config=aks_config,\n",
" deployment_target=aks_target)\n",
" aks_service.wait_for_deployment(show_output=True)\n",
" print(aks_service.state)\n",
"else:\n",
" aks_service = ws.webservices[aks_service_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run DataDrift Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Send Scoring Data to Service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Scoring Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Score Model on March 15, 2016 data\n",
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
"# Add the window feature column\n",
"scoring_df = add_window_col(scoring_df)\n",
"\n",
"# Drop features not used by the model\n",
"print(\"Dropping unnecessary columns\")\n",
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
"scoring_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
"extra_cols = ('Path', 'Column1')\n",
"for k in extra_cols:\n",
" columns_dict.pop(k, None)\n",
"training_columns = list(columns_dict.keys())\n",
"\n",
"categorical_columns = scoring_df.dtypes == object\n",
"categorical_columns = categorical_columns[categorical_columns == True]\n",
"\n",
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
"encoded_df = scoring_df.join(test_df)\n",
"\n",
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
"for col in difference:\n",
" encoded_df[col] = 0\n",
"encoded_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Serialize dataframe to list of row dictionaries\n",
"encoded_dict = encoded_df.to_dict('records')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit Scoring Data to Service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"# retreive the API keys. AML generates two keys.\n",
"key1, key2 = aks_service.get_keys()\n",
"\n",
"total_count = len(scoring_df)\n",
"i = 0\n",
"load = []\n",
"for row in encoded_dict:\n",
" load.append(row)\n",
" i = i + 1\n",
" if i % 100 == 0:\n",
" payload = json.dumps({\"data\": load})\n",
" \n",
" # construct raw HTTP request and send to the service\n",
" payload_binary = bytes(payload,encoding = 'utf8')\n",
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
" \n",
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
"\n",
" load = []\n",
" time.sleep(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"time.sleep(600)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure DataDrift"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"services = [service_name]\n",
"start = datetime.now() - timedelta(days=2)\n",
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
"\n",
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
"try:\n",
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
"except KeyError:\n",
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
" \n",
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run an Adhoc DataDriftDetector Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"target_date = datetime.today()\n",
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, datadrift._id)\n",
"dd_run = Run(experiment=exp, run_id=run)\n",
"RunDetails(dd_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Drift Analysis Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(dd_run.get_children())\n",
"for child in children:\n",
" child.wait_for_completion()\n",
"\n",
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
"drift_metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show all drift figures, one per serivice.\n",
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
"\n",
"drift_figures = datadrift.show(with_details=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enable DataDrift Schedule"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datadrift.enable_schedule()"
]
}
],
"metadata": {
"authors": [
{
"name": "rafarmah"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,7 +1,8 @@
name: regression-automated-ml
name: azure-ml-datadrift
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- azureml-contrib-datadrift
- azureml-opendatasets
- lightgbm
- azureml-widgets

View File

@@ -0,0 +1,58 @@
import pickle
import json
import numpy
import azureml.train.automl
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.monitoring import ModelDataCollector
import time
import pandas as pd
def init():
global model, inputs_dc, prediction_dc, feature_names, categorical_features
print("Model is initialized" + time.strftime("%H:%M:%S"))
model_path = Model.get_model_path(model_name="driftmodel")
model = joblib.load(model_path)
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
"temperature-7"]
categorical_features = ["usaf", "wban", "p_k", "station_name"]
inputs_dc = ModelDataCollector(model_name="driftmodel",
identifier="inputs",
feature_names=feature_names)
prediction_dc = ModelDataCollector("driftmodel",
identifier="predictions",
feature_names=["temperature"])
def run(raw_data):
global inputs_dc, prediction_dc
try:
data = json.loads(raw_data)["data"]
data = pd.DataFrame(data)
# Remove the categorical features as the model expects OHE values
input_data = data.drop(categorical_features, axis=1)
result = model.predict(input_data)
# Collect the non-OHE dataframe
collected_df = data[feature_names]
inputs_dc.collect(collected_df.values)
prediction_dc.collect(result)
return result.tolist()
except Exception as e:
error = str(e)
print(error + time.strftime("%H:%M:%S"))
return error

View File

@@ -8,7 +8,7 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
* [logging-api](./training/logging-api): Learn about the details of logging metrics to run history.
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
* [enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model.

View File

@@ -155,11 +155,11 @@ jupyter notebook
- [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
- How to enable subsampling
- [auto-ml-dataset.ipynb](dataprep/auto-ml-dataset.ipynb)
- Using Dataset for reading data
- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb)
- Using DataPrep for reading data
- [auto-ml-dataset-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataset-remote-execution.ipynb)
- Using Dataset for reading data with remote execution
- [auto-ml-dataprep-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb)
- Using DataPrep for reading data with remote execution
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
@@ -229,7 +229,7 @@ The main code of the file must be indented so that it is under this condition.
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
## automl_setup_linux.sh fails
If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
@@ -264,13 +264,13 @@ Some Windows environments see an error loading numpy with the latest Python vers
Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
You may check the version of tensorflow and uninstall as follows
1) start a command shell, activate conda environment where automated ml packages are installed
2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation.
2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation.
## Remote run: DsvmCompute.create fails
## Remote run: DsvmCompute.create fails
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
## Remote run: Unable to establish SSH connection
Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
@@ -296,4 +296,4 @@ To resolve this issue, allocate a DSVM with more memory or reduce the value spec
## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.

View File

@@ -13,14 +13,10 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.80
- pyarrow>=0.11.0
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-train-automl
- azureml-sdk[automl,explain]
- azureml-widgets
- azureml-explain-model
- azureml-contrib-explain-model
- pandas_ml

View File

@@ -14,14 +14,10 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0
- py-xgboost<=0.80
- pyarrow>=0.11.0
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-train-automl
- azureml-sdk[automl,explain]
- azureml-widgets
- azureml-explain-model
- azureml-contrib-explain-model
- pandas_ml

View File

@@ -1,15 +1,6 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
{
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -20,38 +11,40 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Classification with Deployment using a Bank Marketing Dataset**_\n",
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Deploy](#Deploy)\n",
"1. [Test](#Test)\n",
"1. [Acknowledgements](#Acknowledgements)"
"Licensed under the MIT License."
]
},
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"# Unique Descriptive Title\n",
"_**Unique Subtitle**_\n",
"\n",
"In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n",
"Introduction that describes in a customer friendly language, what they will do and accomplish.\n".
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Prerequisites](#Prerequisites)\n",
"1. [Configuration and Setup](#Setup)\n",
"1. [Working with Data](#Working with Data)\n",
"1. [Training](#Training)\n",
"1. [Productionizing](#Productionizing)\n",
"1. [Model Monitoring](#Model Monitoring)\n",
"1. [Clean up resources](#Clean up resources)\n",
"1. [Next Steps](#Next Steps)\n",
"1. [Acknowledgements](#Acknowledgements)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Register the model.\n",
"6. Create a container image.\n",
"7. Create an Azure Container Instance (ACI) service.\n",
"8. Test the ACI service."
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"Please note that a Basic edition workspace is created by default in the configuration.ipynb file.\n",
]
},
{
@@ -60,8 +53,7 @@
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
"As part of the setup you have already created an Azure ML `Workspace` object....\n",
},
{
"cell_type": "code",
@@ -69,17 +61,7 @@
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import pandas as pd\n",
"import os\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
@@ -88,23 +70,21 @@
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"tenant_id = os.environ['TENANT_ID]\n",
"client_id = os.environ['CLIENT_ID]\n",
"run = Run.get_context()\n",
"secret_name = “{0}-secret”.format(client_id)\n",
"secret = run.get_secret(name=secret_name)\n",
"sp_auth = ServicePrincipalAuthentication(tenant_id, client_id, secret)\n",
"ws = Workspace.from_config(auth=sp_auth)\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-classification-bmarketing'\n",
"# choose a unique name for experiment\n",
"experiment_name = 'unique-name'\n",
"# project folder\n",
"project_folder = './sample_projects/test'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
@@ -112,560 +92,77 @@
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"You will need to create a compute target for your run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"# Working with Data\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
"Here you would learn how to perform Data labeling and use Open Datasets etc..\n",
"To do this first load....\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"## Training\n",
"\n",
"Create a run configuration for the remote run."
"Here you would learn how to train a DNN using...\n",
]
},
{
"cell_type": "code",
"execution_count": null,
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"# Productionizing\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"Here you would learn how to deploy your model to ACI to perform...\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Monitoring\n",
"\n",
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"Here you would learn how to detect datadrift etc...\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clean up resources\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
"Now, let's clean up the resources we created...\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Next Steps\n",
"\n",
"In this notebook, youve done x, y, z. You can learn more with these resources:\n",
"+ [SDK reference documentation for `MyClass`]()\n",
"+ [About this feature](https://docs.microsoft.com/azure/machine-learning/service/thisfeature)\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"\n",
"Load the bank marketing dataset into X_train and y_train. X_train contains the training features, which are inputs to the model. y_train contains the training labels, which are the expected output of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X_train = dataset.drop_columns(columns=['y'])\n",
"y_train = dataset.keep_columns(columns=['y'], validate=True)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 5,\n",
" \"iterations\": 10,\n",
" \"n_cross_validations\": 2,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": True,\n",
" \"max_concurrent_iterations\": 5,\n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy\n",
"\n",
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register the Fitted Model for Deployment\n",
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
"tags = None\n",
"model = remote_run.register_model(description = description, tags = tags)\n",
"\n",
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Scoring Script\n",
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import pickle\n",
"import json\n",
"import numpy\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"\n",
"def init():\n",
" global model\n",
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"def run(rawdata):\n",
" try:\n",
" data = json.loads(rawdata)['data']\n",
" data = np.array(data)\n",
" result = model.predict(data)\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" return json.dumps({\"result\":result.tolist()})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a YAML File for the Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-train-automl'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Substitute the actual version number in the environment file.\n",
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
"\n",
"with open(conda_env_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",
"script_file_name = 'score.py'\n",
"\n",
"with open(script_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(script_file_name, 'w') as cefw:\n",
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Container Image\n",
"\n",
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
"or when testing a model that is under development."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import Image, ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
" execution_script = script_file_name,\n",
" conda_file = conda_env_file_name,\n",
" tags = {'area': \"bmData\", 'type': \"automl_classification\"},\n",
" description = \"Image for automl classification sample\")\n",
"\n",
"image = Image.create(name = \"automlsampleimage\",\n",
" # this is the model object \n",
" models = [model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)\n",
"\n",
"if image.creation_state == 'Failed':\n",
" print(\"Image build log at: \" + image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the Image as a Web Service on Azure Container Instance\n",
"\n",
"Deploy an image that contains the model and other assets needed by the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
" description = 'sample service for Automl Classification')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"\n",
"aci_service_name = 'automl-sample-bankmarketing'\n",
"print(aci_service_name)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Web Service\n",
"\n",
"Deletes the specified web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Logs from a Deployed Web Service\n",
"\n",
"Gets logs from a deployed web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load the bank marketing datasets.\n",
"from numpy import array"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X_test = dataset.drop_columns(columns=['y'])\n",
"y_test = dataset.keep_columns(columns=['y'], validate=True)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test = X_test.to_pandas_dataframe()\n",
"y_test = y_test.to_pandas_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred = fitted_model.predict(X_test)\n",
"actual = array(y_test)\n",
"actual = actual[:,0]\n",
"print(y_pred.shape, \" \", actual.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate metrics for the prediction\n",
"\n",
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
"from the trained model that was returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib notebook\n",
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
"test_test = plt.scatter(actual, actual, color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
"This dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
"\n",
"_**Acknowledgements**_\n",
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
"This dataset is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
"\n",
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
]
@@ -674,9 +171,32 @@
"metadata": {
"authors": [
{
"name": "v-rasav"
"name": "YOUR ALIAS"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"MNIST"
],
"deployment": [
"AKS"
],
"exclude_from_index": false,
"framework": [
"PyTorch"
],
"friendly_name": "How to use ModuleStep with AML Pipelines",
},
"order_index": 14,
"star_tag": [],
"tags": [
"Pipeline Builder"
],
"task": "Demonstrates the use of ModuleStep"
},
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -697,4 +217,8 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}

View File

@@ -2,8 +2,6 @@ name: auto-ml-classification-bank-marketing
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -74,12 +74,14 @@
"from matplotlib import pyplot as plt\n",
"import pandas as pd\n",
"import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
@@ -92,6 +94,8 @@
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-classification-ccard'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-classification-creditcard'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
@@ -101,6 +105,7 @@
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -147,12 +152,11 @@
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -161,7 +165,20 @@
"source": [
"# Data\n",
"\n",
"Create a run configuration for the remote run."
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
@@ -180,8 +197,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -191,7 +211,7 @@
"source": [
"### Load Data\n",
"\n",
"Load the credit card dataset into X and y. X contains the features, which are inputs to the model. y contains the labels, which are the expected output of the model. Next split the data using random_split and return X_train and y_train for training the model."
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model."
]
},
{
@@ -201,9 +221,10 @@
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X = dataset.drop_columns(columns=['Class'])\n",
"y = dataset.keep_columns(columns=['Class'], validate=True)\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
"dflow.get_profile()\n",
"X = dflow.drop_columns(columns=['Class'])\n",
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223)"
]
@@ -225,6 +246,7 @@
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
@@ -253,7 +275,8 @@
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" debug_log = 'automl_errors_20190417.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
@@ -424,7 +447,7 @@
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
@@ -435,7 +458,7 @@
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-train-automl'])\n",
" pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
@@ -455,7 +478,7 @@
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",

View File

@@ -2,8 +2,6 @@ name: auto-ml-classification-credit-card-fraud
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -41,6 +41,8 @@
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace#upgrade).\n",
"In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
@@ -61,58 +63,13 @@
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-classification-deployment'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"The following steps require an Enterprise workspace to gain access to these features.To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace#upgrade).\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
@@ -123,7 +80,8 @@
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|"
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
@@ -144,7 +102,8 @@
" iterations = 10,\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train)"
" y = y_train,\n",
" path = project_folder)"
]
},
{
@@ -292,7 +251,7 @@
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
@@ -305,7 +264,7 @@
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-train-automl'])\n",
" pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
@@ -325,7 +284,7 @@
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",
@@ -479,7 +438,7 @@
"metadata": {
"authors": [
{
"name": "savitam"
"name": "shwinne"
}
],
"kernelspec": {
@@ -502,4 +461,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}

View File

@@ -89,8 +89,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-classification-onnx'\n",
"project_folder = './sample_projects/automl-classification-onnx'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -100,6 +101,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -125,7 +127,9 @@
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
" iris.target, \n",
" test_size=0.2, \n",
" random_state=0)"
" random_state=0)\n",
"\n",
"\n"
]
},
{
@@ -166,7 +170,8 @@
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|"
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
@@ -191,7 +196,8 @@
" X = X_train, \n",
" y = y_train,\n",
" preprocess=True,\n",
" enable_onnx_compatible_models=True)"
" enable_onnx_compatible_models=True,\n",
" path = project_folder)"
]
},
{

View File

@@ -100,8 +100,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-whitelist'\n",
"project_folder = './sample_projects/automl-local-whitelist'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -111,6 +112,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -156,6 +158,7 @@
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
]
},
@@ -174,7 +177,8 @@
" X = X_train, \n",
" y = y_train,\n",
" enable_tf=True,\n",
" whitelist_models=whitelist_models)"
" whitelist_models=whitelist_models,\n",
" path = project_folder)"
]
},
{

View File

@@ -113,8 +113,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-classification'\n",
"project_folder = './sample_projects/automl-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -124,6 +125,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -477,7 +479,27 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"friendly_name": "Testing index",
"exclude_from_index": false,
"order_index": 1,
"category": "tutorial",
"tags": [
"featured"
],
"task": "Regression",
"datasets": [
"NYC Taxi"
],
"compute": [
"local"
],
"deployment": [
"None"
],
"framework": [
"Azure ML AutoML"
],
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -21,7 +21,7 @@
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Load Data using `TabularDataset` for Remote Execution (AmlCompute)**_\n",
"_**Prepare Data using `azureml.dataprep` for Remote Execution (AmlCompute)**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
@@ -37,20 +37,23 @@
"metadata": {},
"source": [
"## Introduction\n",
"In this example we showcase how you can use AzureML Dataset to load data for AutoML.\n",
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create a `TabularDataset` pointing to the training data.\n",
"2. Pass the `TabularDataset` to AutoML for a remote run."
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
"2. Pass the `Dataflow` to AutoML for a local run.\n",
"3. Pass the `Dataflow` to AutoML for a remote run."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
"## Setup\n",
"\n",
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
]
},
{
@@ -67,13 +70,15 @@
"outputs": [],
"source": [
"import logging\n",
"import time\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute import DsvmCompute\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep\n",
"from azureml.train.automl import AutoMLConfig"
]
},
@@ -84,9 +89,11 @@
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
" \n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataset-remote-bai'\n",
"experiment_name = 'automl-dataprep-remote-dsvm'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataprep-remote-dsvm'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
@@ -96,6 +103,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -115,21 +123,13 @@
"metadata": {},
"outputs": [],
"source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the data\n",
"\n",
"You can peek the result of a `TabularDataset` at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records, which makes it fast even against large datasets.\n",
"\n",
"`TabularDataset` objects are immutable and are composed of a list of subsetting transformations (optional)."
"dflow = dprep.read_csv(example_data, infer_column_types=True)\n",
"dflow.get_profile()"
]
},
{
@@ -138,8 +138,30 @@
"metadata": {},
"outputs": [],
"source": [
"X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dataset.keep_columns(columns=['Primary Type'], validate=True)"
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
]
},
{
@@ -183,7 +205,7 @@
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlc2\"\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"\n",
@@ -204,12 +226,11 @@
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -228,8 +249,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -237,9 +261,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `TabularDataset` Objects\n",
"### Pass Data with `Dataflow` Objects\n",
"\n",
"The `TabularDataset` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `TabularDataset` object and send it to the remote compute target. The `TabularDataset` will not be evaluated locally."
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
]
},
{
@@ -250,6 +274,7 @@
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" X = X,\n",
" y = y,\n",
@@ -441,13 +466,8 @@
"metadata": {},
"outputs": [],
"source": [
"dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"\n",
"df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df_test['Primary Type'])]\n",
"\n",
"y_test = df_test[['Primary Type']]\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"dflow_test = dflow_test.drop_nulls('Primary Type')"
]
},
{
@@ -466,6 +486,10 @@
"source": [
"from pandas_ml import ConfusionMatrix\n",
"\n",
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
"\n",
"\n",
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",

View File

@@ -1,8 +1,8 @@
name: aml-pipelines-parameter-tuning-with-hyperdrive
name: auto-ml-dataprep-remote-execution
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib
- numpy
- pandas_ml

View File

@@ -21,7 +21,7 @@
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Load Data using `TabularDataset` for Local Execution**_\n",
"_**Prepare Data using `azureml.dataprep` for Local Execution**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
@@ -37,20 +37,23 @@
"metadata": {},
"source": [
"## Introduction\n",
"In this example we showcase how you can use AzureML Dataset to load data for AutoML.\n",
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create a `TabularDataset` pointing to the training data.\n",
"2. Pass the `TabularDataset` to AutoML for a local run."
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
"2. Pass the `Dataflow` to AutoML for a local run.\n",
"3. Pass the `Dataflow` to AutoML for a remote run."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
"## Setup\n",
"\n",
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
]
},
{
@@ -73,7 +76,7 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep\n",
"from azureml.train.automl import AutoMLConfig"
]
},
@@ -86,7 +89,9 @@
"ws = Workspace.from_config()\n",
" \n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataset-local'\n",
"experiment_name = 'automl-dataprep-local'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataprep-local'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
@@ -96,6 +101,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -115,21 +121,13 @@
"metadata": {},
"outputs": [],
"source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the data\n",
"\n",
"You can peek the result of a `TabularDataset` at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records, which makes it fast even against large datasets.\n",
"\n",
"`TabularDataset` objects are immutable and are composed of a list of subsetting transformations (optional)."
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
"dflow.get_profile()"
]
},
{
@@ -138,8 +136,30 @@
"metadata": {},
"outputs": [],
"source": [
"X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dataset.keep_columns(columns=['Primary Type'], validate=True)"
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
]
},
{
@@ -170,9 +190,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `TabularDataset` Objects\n",
"### Pass Data with `Dataflow` Objects\n",
"\n",
"The `TabularDataset` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `TabularDataset` for model training."
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
]
},
{
@@ -335,13 +355,8 @@
"metadata": {},
"outputs": [],
"source": [
"dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"\n",
"df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df_test['Primary Type'])]\n",
"\n",
"y_test = df_test[['Primary Type']]\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"dflow_test = dflow_test.drop_nulls('Primary Type')"
]
},
{
@@ -360,6 +375,9 @@
"source": [
"from pandas_ml import ConfusionMatrix\n",
"\n",
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
"\n",
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",

View File

@@ -1,4 +1,4 @@
name: auto-ml-dataset
name: auto-ml-dataprep
dependencies:
- pip:
- azureml-sdk

View File

@@ -97,6 +97,8 @@
"\n",
"# choose a name for the run history container in the workspace\n",
"experiment_name = 'automl-bikeshareforecasting'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-bikeshareforecasting'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -106,6 +108,7 @@
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Run History Name'] = experiment_name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -222,8 +225,7 @@
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
"\n",
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the iteration_timeout_minutes parameter value to get results."
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
]
},
{
@@ -244,12 +246,12 @@
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ExtremeRandomTrees'],\n",
" iterations=10,\n",
" iteration_timeout_minutes=5,\n",
" X=X_train,\n",
" y=y_train,\n",
" n_cross_validations=3,\n",
" n_cross_validations=3, \n",
" path=project_folder,\n",
" verbosity=logging.INFO,\n",
" **automl_settings)"
]

View File

@@ -93,6 +93,8 @@
"\n",
"# choose a name for the run history container in the workspace\n",
"experiment_name = 'automl-energydemandforecasting'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-energydemandforecasting'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -102,6 +104,7 @@
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Run History Name'] = experiment_name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -210,7 +213,8 @@
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|"
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
]
},
{
@@ -227,12 +231,12 @@
"automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_nyc_energy_errors.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ExtremeRandomTrees'],\n",
" iterations=10,\n",
" iteration_timeout_minutes=5,\n",
" X=X_train,\n",
" y=y_train,\n",
" n_cross_validations=3,\n",
" path=project_folder,\n",
" verbosity = logging.INFO,\n",
" **time_series_settings)"
]
@@ -458,9 +462,7 @@
"source": [
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation.\n",
"\n",
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n",
"\n",
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the iteration_timeout_minutes parameter value to get results."
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features."
]
},
{
@@ -479,12 +481,13 @@
"automl_config_lags = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_nyc_energy_errors.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models=['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor'],\n",
" blacklist_models=['ElasticNet'],\n",
" iterations=10,\n",
" iteration_timeout_minutes=10,\n",
" X=X_train,\n",
" y=y_train,\n",
" n_cross_validations=3,\n",
" path=project_folder,\n",
" verbosity=logging.INFO,\n",
" **time_series_settings_with_lags)"
]
@@ -552,97 +555,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### What features matter for the forecast?\n",
"The following steps will allow you to compute and visualize engineered feature importance based on your test data for forecasting. "
"### What features matter for the forecast?"
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#### Setup the model explanations for AutoML models\n",
"The *fitted_model* can generate the following which will be used for getting the engineered and raw feature explanations using *automl_setup_model_explanations*:-\n",
"1. Featurized data from train samples/test samples \n",
"2. Gather engineered and raw feature name lists\n",
"3. Find the classes in your labeled column in classification scenarios\n",
"from azureml.train.automl.automlexplainer import explain_model\n",
"\n",
"The *automl_explainer_setup_obj* contains all the structures from above list. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train.copy(), \n",
" X_test=X_test.copy(), y=y_train, \n",
" task='forecasting')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the *MimicWrapper* from *azureml.explain.model* package. The *MimicWrapper* can be initialized with fields in *automl_explainer_setup_obj*, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (*fitted_model* here). The *MimicWrapper* also takes the *best_run* object where the raw and engineered explanations will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing engineered feature importance\n",
"The *explain()* method in *MimicWrapper* can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the generated engineered features by AutoML featurizers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(engineered_explanations, automl_explainer_setup_obj.automl_estimator, automl_explainer_setup_obj.X_test_transform)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing raw feature importance\n",
"The *explain()* method in *MimicWrapper* can be again called with the transformed test samples and setting *get_raw* to *True* to get the feature importance for the raw features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the raw features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_explanations = explainer.explain(['local', 'global'], get_raw=True, \n",
" raw_feature_names=automl_explainer_setup_obj.raw_feature_names,\n",
" eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(raw_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(raw_explanations, automl_explainer_setup_obj.automl_pipeline, automl_explainer_setup_obj.X_test_raw)"
"# feature names are everything in the transformed data except the target\n",
"features = X_trans_lags.columns[:-1]\n",
"expl = explain_model(fitted_model_lags, X_train.copy(), X_test.copy(), features=features, best_run=best_run_lags, y_train=y_train)\n",
"# unpack the tuple\n",
"shap_values, expected_values, feat_overall_imp, feat_names, per_class_summary, per_class_imp = expl\n",
"best_run_lags"
]
},
{

View File

@@ -8,4 +8,3 @@ dependencies:
- pandas_ml
- statsmodels
- azureml-explain-model
- azureml-contrib-explain-model

View File

@@ -89,6 +89,8 @@
"\n",
"# choose a name for the run history container in the workspace\n",
"experiment_name = 'automl-ojforecasting'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-ojforecasting'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -98,6 +100,7 @@
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Run History Name'] = experiment_name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -241,9 +244,9 @@
"|**X**|Training matrix of features as a pandas DataFrame, shape = [n_training_samples, n_features]|\n",
"|**y**|Target values as a numpy.ndarray, shape = [n_training_samples, ]|\n",
"|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
"|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models\n",
"|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models\n",
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n",
"|**debug_log**|Log file path for writing debugging information\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n",
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
@@ -270,8 +273,8 @@
" X=X_train,\n",
" y=y_train,\n",
" n_cross_validations=3,\n",
" enable_voting_ensemble=False,\n",
" enable_stack_ensemble=False,\n",
" enable_ensembling=False,\n",
" path=project_folder,\n",
" verbosity=logging.INFO,\n",
" **time_series_settings)"
]
@@ -660,10 +663,10 @@
"conda_env_file_name = 'fcast_env.yml'\n",
"\n",
"dependencies = ml_run.get_run_sdk_dependencies(iteration = best_iteration)\n",
"for p in ['azureml-train-automl', 'azureml-core']:\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-train-automl'])\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
@@ -685,7 +688,7 @@
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",

View File

@@ -93,6 +93,7 @@
"\n",
"# Choose a name for the experiment.\n",
"experiment_name = 'automl-local-missing-data'\n",
"project_folder = './sample_projects/automl-local-missing-data'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -102,6 +103,7 @@
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -164,7 +166,8 @@
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|"
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
@@ -183,7 +186,8 @@
" blacklist_models = ['KNN','LinearSVM'],\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train)"
" y = y_train,\n",
" path = project_folder)"
]
},
{

View File

@@ -69,8 +69,7 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.core.dataset import Dataset"
"from azureml.train.automl import AutoMLConfig"
]
},
{
@@ -108,42 +107,29 @@
"## Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
"train_dataset = Dataset.Tabular.from_delimited_files(train_data)\n",
"X_train = train_dataset.drop_columns(columns=['y']).to_pandas_dataframe()\n",
"y_train = train_dataset.keep_columns(columns=['y'], validate=True).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
"test_dataset = Dataset.Tabular.from_delimited_files(test_data)\n",
"X_test = test_dataset.drop_columns(columns=['y']).to_pandas_dataframe()\n",
"y_test = test_dataset.keep_columns(columns=['y'], validate=True).to_pandas_dataframe()"
"from sklearn import datasets\n",
"\n",
"iris = datasets.load_iris()\n",
"y = iris.target\n",
"X = iris.data\n",
"\n",
"features = iris.feature_names\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
" y,\n",
" test_size=0.1,\n",
" random_state=100,\n",
" stratify=y)\n",
"\n",
"X_train = pd.DataFrame(X_train, columns=features)\n",
"X_test = pd.DataFrame(X_test, columns=features)"
]
},
{
@@ -162,6 +148,8 @@
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
]
@@ -178,10 +166,10 @@
" iteration_timeout_minutes = 200,\n",
" iterations = 10,\n",
" verbosity = logging.INFO,\n",
" preprocess = True,\n",
" X = X_train, \n",
" y = y_train,\n",
" n_cross_validations = 5,\n",
" X_valid = X_test,\n",
" y_valid = y_test,\n",
" model_explainability=True,\n",
" path=project_folder)"
]
@@ -209,7 +197,7 @@
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()"
"local_run"
]
},
{
@@ -314,21 +302,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing model explanations and visualizing the explanations using azureml-explain-model package\n",
"Beside retrieve the existed model explanation information, explain the model with different train/test data. The following steps will allow you to compute and visualize engineered feature importance and raw feature importance based on your test data. "
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#### Setup the model explanations for AutoML models\n",
"The *fitted_model* can generate the following which will be used for getting the engineered and raw feature explanations using *automl_setup_model_explanations*:-\n",
"1. Featurized data from train samples/test samples \n",
"2. Gather engineered and raw feature name lists\n",
"3. Find the classes in your labeled column in classification scenarios\n",
"from azureml.train.automl.automlexplainer import explain_model\n",
"\n",
"The *automl_explainer_setup_obj* contains all the structures from above list. "
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
" explain_model(fitted_model, X_train, X_test, features=features)"
]
},
{
@@ -337,116 +323,8 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
"\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n",
" X_test=X_test, y=y_train, \n",
" task='classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the *MimicWrapper* from *azureml.explain.model* package. The *MimicWrapper* can be initialized with fields in *automl_explainer_setup_obj*, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (*fitted_model* here). The *MimicWrapper* also takes the *best_run* object where the raw and engineered explanations will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map],\n",
" classes=automl_explainer_setup_obj.classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing engineered feature importance\n",
"The *explain()* method in *MimicWrapper* can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the generated engineered features by AutoML featurizers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(engineered_explanations, automl_explainer_setup_obj.automl_estimator, automl_explainer_setup_obj.X_test_transform)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing raw feature importance\n",
"The *explain()* method in *MimicWrapper* can be again called with the transformed test samples and setting *get_raw* to *True* to get the feature importance for the raw features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the raw features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_explanations = explainer.explain(['local', 'global'], get_raw=True, \n",
" raw_feature_names=automl_explainer_setup_obj.raw_feature_names,\n",
" eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(raw_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(raw_explanations, automl_explainer_setup_obj.automl_pipeline, automl_explainer_setup_obj.X_test_raw)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download engineered feature importance from artifact store\n",
"You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *best_run*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
"client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n",
"print(engineered_explanations.get_feature_importance_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download raw feature importance from artifact store\n",
"You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *best_run*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
"client = ExplanationClient.from_run(best_run)\n",
"raw_explanations = client.download_model_explanation(raw=True)\n",
"print(raw_explanations.get_feature_importance_dict())"
"print(overall_summary)\n",
"print(overall_imp)"
]
}
],

View File

@@ -7,4 +7,3 @@ dependencies:
- matplotlib
- pandas_ml
- azureml-explain-model
- azureml-contrib-explain-model

View File

@@ -70,12 +70,13 @@
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
" \n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
@@ -87,8 +88,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-regression-concrete'\n",
"project_folder = './sample_projects/automl-regression-concrete'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -98,6 +100,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -144,12 +147,11 @@
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -158,7 +160,20 @@
"source": [
"# Data\n",
"\n",
"Create a run configuration for the remote run."
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
@@ -177,8 +192,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -188,7 +206,7 @@
"source": [
"### Load Data\n",
"\n",
"Load the concrete strength dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
]
},
{
@@ -198,12 +216,13 @@
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X = dataset.drop_columns(columns=['CONCRETE'])\n",
"y = dataset.keep_columns(columns=['CONCRETE'], validate=True)\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
"dflow.get_profile()\n",
"X = dflow.drop_columns(columns=['CONCRETE'])\n",
"y = dflow.keep_columns(columns=['CONCRETE'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
"dataset.take(5).to_pandas_dataframe()"
"dflow.head()"
]
},
{
@@ -223,6 +242,7 @@
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
@@ -252,6 +272,7 @@
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" debug_log = 'automl.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
@@ -463,7 +484,7 @@
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
@@ -473,7 +494,9 @@
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
@@ -493,7 +516,7 @@
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",

View File

@@ -2,8 +2,6 @@ name: auto-ml-regression-concrete-strength
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -70,12 +70,13 @@
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
" \n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
@@ -87,8 +88,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-regression-hardware'\n",
"project_folder = './sample_projects/automl-remote-regression'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -98,6 +100,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -144,12 +147,11 @@
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -158,7 +160,20 @@
"source": [
"# Data\n",
"\n",
"Create a run configuration for the remote run."
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
@@ -177,8 +192,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -188,7 +206,7 @@
"source": [
"### Load Data\n",
"\n",
"Load the hardware performance dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model."
]
},
{
@@ -198,12 +216,13 @@
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X = dataset.drop_columns(columns=['ERP'])\n",
"y = dataset.keep_columns(columns=['ERP'], validate=True)\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n",
"dflow.get_profile()\n",
"X = dflow.drop_columns(columns=['ERP'])\n",
"y = dflow.keep_columns(columns=['ERP'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223)\n",
"dataset.take(5).to_pandas_dataframe()"
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
"dflow.head()"
]
},
{
@@ -224,6 +243,7 @@
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
@@ -252,7 +272,8 @@
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" debug_log = 'automl_errors.log',\n",
" debug_log = 'automl_errors_20190417.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
@@ -481,7 +502,7 @@
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
@@ -491,7 +512,7 @@
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
@@ -511,7 +532,7 @@
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",

View File

@@ -2,8 +2,6 @@ name: auto-ml-regression-hardware-performance
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -84,8 +84,9 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-regression'\n",
"project_folder = './sample_projects/automl-local-regression'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
@@ -95,6 +96,7 @@
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
@@ -142,7 +144,8 @@
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|"
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
@@ -159,7 +162,8 @@
" debug_log = 'automl.log',\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train)"
" y = y_train,\n",
" path = project_folder)"
]
},
{

View File

@@ -73,7 +73,10 @@
"source": [
"import logging\n",
"import os\n",
"import csv\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n",
@@ -81,8 +84,8 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
"from azureml.train.automl import AutoMLConfig\n",
"import azureml.dataprep as dprep"
]
},
{
@@ -134,7 +137,7 @@
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlc2\"\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
@@ -153,12 +156,11 @@
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -234,8 +236,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -243,9 +248,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating a TabularDataset\n",
"### Dprep reference\n",
"\n",
"Defined X and y as `TabularDataset`s, which are passed to automated machine learning in the AutoMLConfig."
"Defined X and y as dprep references, which are passed to automated machine learning in the AutoMLConfig."
]
},
{
@@ -254,8 +259,8 @@
"metadata": {},
"outputs": [],
"source": [
"X = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/X_train.csv'))\n",
"y = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/y_train.csv'))"
"X = dprep.read_csv(path=ds.path('irisdata/X_train.csv'), infer_column_types=True)\n",
"y = dprep.read_csv(path=ds.path('irisdata/y_train.csv'), infer_column_types=True)"
]
},
{
@@ -493,7 +498,8 @@
" res_path = 'onnx_resource.json'\n",
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
" with open(res_path) as f:\n",
" return json.load(f)\n",
" onnx_res = json.load(f)\n",
" return onnx_res\n",
"\n",
"if onnxrt_present and python_version_compatible: \n",
" mdl_bytes = onnx_mdl.SerializeToString()\n",

View File

@@ -2,8 +2,6 @@ name: auto-ml-remote-amlcompute-with-onnx
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -74,6 +74,7 @@
"source": [
"import logging\n",
"import os\n",
"import csv\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
@@ -83,8 +84,8 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
"from azureml.train.automl import AutoMLConfig\n",
"import azureml.dataprep as dprep"
]
},
{
@@ -136,7 +137,7 @@
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlc2\"\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
@@ -155,12 +156,11 @@
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
@@ -210,8 +210,11 @@
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
@@ -219,9 +222,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating TabularDataset\n",
"### Dprep reference\n",
"\n",
"Defined X and y as `TabularDataset`s, which are passed to Automated ML in the AutoMLConfig. `from_delimited_files` by default sets the `infer_column_types` to true, which will infer the columns type automatically. If you do wish to manually set the column types, you can set the `set_column_types` argument to manually set the type of each columns."
"Defined X and y as dprep references, which are passed to automated machine learning in the AutoMLConfig."
]
},
{
@@ -230,8 +233,8 @@
"metadata": {},
"outputs": [],
"source": [
"X = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/X_train.csv'))\n",
"y = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/y_train.csv'))"
"X = dprep.read_csv(path=ds.path('digitsdata/X_train.csv'), infer_column_types=True)\n",
"y = dprep.read_csv(path=ds.path('digitsdata/y_train.csv'), infer_column_types=True)"
]
},
{

View File

@@ -2,8 +2,6 @@ name: auto-ml-remote-amlcompute
dependencies:
- pip:
- azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -342,6 +342,7 @@
" n_cross_validations = n_cross_validations, \r\n",
" preprocess = preprocess,\r\n",
" verbosity = logging.INFO, \r\n",
" enable_ensembling = False,\r\n",
" X = X_train, \r\n",
" y = y_train, \r\n",
" path = project_folder,\r\n",

View File

@@ -1,73 +1,33 @@
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
- You can keep the data within the same cluster.
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
- You can further tune the model generated by automated machine learning if you chose to.
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
- You can keep the data within the same cluster.
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
- You can further tune the model generated by automated machine learning if you chose to.
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
- The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.
Please follow our [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#azure-databricks) to install the sdk in your Azure Databricks cluster before trying any of the sample notebooks.
**Single file** -
**Single file** -
The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually.
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
Notebook 6 is an Automated ML sample notebook for Classification.
Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.
**Databricks as a Compute Target from Azure ML Pipelines**
You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb).
# Linked Azure Databricks and Azure Machine Learning Workspaces (Preview)
Customers can now link Azure Databricks and AzureML Workspaces to better enable cross-Azure ML scenarios by [managing their tracking data in a single place when using the MLflow client](https://mlflow.org/docs/latest/tracking.html#mlflow-tracking) - the Azure ML workspace.
## Linking the Workspaces (Admin operation)
1. The Azure Databricks Azure portal blade now includes a new button to link an Azure ML workspace.
![New ADB Portal Link button](./img/adb-link-button.png)
2. Both a new or existing Azure ML Workspace can be linked in the resulting prompt. Follow any instructions to set up the Azure ML Workspace.
![Link Prompt](./img/link-prompt.png)
3. After a successful link operation, you should see the Azure Databricks overview reflect the linked status
![Linked Successfully](./img/adb-successful-link.png)
## Configure MLflow to send data to Azure ML (All roles)
1. Add azureml-mlflow as a library to any notebook or cluster that should send data to Azure ML. You can do this via:
1. [DBUtils](https://docs.azuredatabricks.net/user-guide/dev-tools/dbutils.html#dbutils-library)
```
dbutils.library.installPyPI("azureml-mlflow")
dbutils.library.restartPython() # Removes Python state
```
2. [Cluster Libraries](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster)
![Cluster Library](./img/cluster-library.png)
2. [Set the MLflow tracking URI](https://mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) to the following scheme:
```
adbazureml://${azuremlRegion}.experiments.azureml.net/history/v1.0/subscriptions/${azuremlSubscriptionId}/resourceGroups/${azuremlResourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${azuremlWorkspaceName}
```
1. You can automatically configure this on your clusters for all subsequent notebook sessions using this helper script instead of manually setting the tracking URI in the notebook:
* [AzureML Tracking Cluster Init Script](./linking/README.md)
3. If configured correctly, you'll now be able to see your MLflow tracking data in both Azure ML (via the REST API and all clients) and Azure Databricks (in the MLflow UI and using the MLflow client)
## Known Preview Limitations
While we roll this experience out to customers for feedback, there are some known limitations we'd love comments on in addition to any other issues seen in your workflow.
### 1-to-1 Workspace linking
Currently, an Azure ML Workspace can only be linked to one Azure Databricks Workspace at a time.
### Data synchronization
At the moment, data is only generated in the Azure Machine Learning workspace for tracking. Editing tags via the Azure Databricks MLflow UI won't be reflected in the Azure ML UI.
### Java and R support
The experience currently is only available from the Python MLflow client.
**Databricks as a Compute Target from AML Pipelines**
You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb).
For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
**Please let us know your feedback.**
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png)
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png)

View File

@@ -314,18 +314,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Training Data Using Dataset"
"## Load Training Data Using DataPrep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Automated ML takes a `TabularDataset` as input.\n",
"Automated ML takes a Dataflow as input.\n",
"\n",
"You are free to use the data preparation libraries/tools of your choice to do the require preparation and once you are done, you can write it to a datastore and create a TabularDataset from it.\n",
"If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n",
"```python\n",
"df = pd.read_csv(...)\n",
"# apply some transforms\n",
"dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n",
"```\n",
"\n",
"You will get the datastore you registered previously and pass it to Dataset for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
"\n",
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
]
},
{
@@ -334,21 +341,21 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep\n",
"from azureml.data.datapath import DataPath\n",
"\n",
"datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n",
"\n",
"X_train = Dataset.Tabular.from_delimited_files(datastore.path('X.csv'))\n",
"y_train = Dataset.Tabular.from_delimited_files(datastore.path('y.csv'))"
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Review the TabularDataset\n",
"You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets."
"## Review the Data Preparation Result\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
]
},
{
@@ -357,7 +364,7 @@
"metadata": {},
"outputs": [],
"source": [
"X_train.take(5).to_pandas_dataframe()"
"X_train.get_profile()"
]
},
{
@@ -366,7 +373,7 @@
"metadata": {},
"outputs": [],
"source": [
"y_train.take(5).to_pandas_dataframe()"
"y_train.get_profile()"
]
},
{

View File

@@ -331,18 +331,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Training Data Using Dataset"
"## Load Training Data Using DataPrep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Automated ML takes a `TabularDataset` as input.\n",
"Automated ML takes a Dataflow as input.\n",
"\n",
"You are free to use the data preparation libraries/tools of your choice to do the require preparation and once you are done, you can write it to a datastore and create a TabularDataset from it.\n",
"If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n",
"```python\n",
"df = pd.read_csv(...)\n",
"# apply some transforms\n",
"dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n",
"```\n",
"\n",
"You will get the datastore you registered previously and pass it to Dataset for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
"\n",
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
]
},
{
@@ -351,21 +358,21 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep\n",
"from azureml.data.datapath import DataPath\n",
"\n",
"datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n",
"\n",
"X_train = Dataset.Tabular.from_delimited_files(datastore.path('X.csv'))\n",
"y_train = Dataset.Tabular.from_delimited_files(datastore.path('y.csv'))"
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Review the TabularDataset\n",
"You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets."
"## Review the Data Preparation Result\n",
"You can peek the result of a Dataflow at any range using skip(i) and head(j). Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
]
},
{
@@ -374,7 +381,7 @@
"metadata": {},
"outputs": [],
"source": [
"X_train.take(5).to_pandas_dataframe()"
"X_train.get_profile()"
]
},
{
@@ -383,7 +390,7 @@
"metadata": {},
"outputs": [],
"source": [
"y_train.take(5).to_pandas_dataframe()"
"y_train.get_profile()"
]
},
{

Binary file not shown.

Before

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 111 KiB

View File

@@ -1,56 +0,0 @@
# Adding an init script to an Azure Databricks cluster
The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to
1. Use the configured AzureML Workspace with Workspace.from_config()
2. Set the default MLflow Tracking Server to be the AzureML managed one
Modify azureml-cluster-init.sh by providing the values for region, subscriptionId, resourceGroupName, and workspaceName of your target Azure ML workspace in the highlighted section at the top of the script.
To create the Azure Databricks cluster-scoped init script
1. Create the base directory you want to store the init script in if it does not exist.
```
dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
```
2. Create the script by copying the contents of azureml-cluster-init.sh
```
dbutils.fs.put("/databricks/<directory>/azureml-cluster-init.sh","""
<configured_contents_of_azureml-cluster-init.sh>
""", True)
3. Check that the script exists.
```
display(dbutils.fs.ls("dbfs:/databricks/<directory>/azureml-cluster-init.sh"))
```
1. Configure the cluster to run the script.
* Using the cluster configuration page
1. On the cluster configuration page, click the Advanced Options toggle.
1. At the bottom of the page, click the Init Scripts tab.
1. In the Destination drop-down, select a destination type. Example: 'DBFS'
1. Specify a path to the init script.
```
dbfs:/databricks/<directory>/azureml-cluster-init.sh
```
1. Click Add
* Using the API.
```
curl -n -X POST -H 'Content-Type: application/json' -d '{
"cluster_id": "<cluster_id>",
"num_workers": <num_workers>,
"spark_version": "<spark_version>",
"node_type_id": "<node_type_id>",
"cluster_log_conf": {
"dbfs" : {
"destination": "dbfs:/cluster-logs"
}
},
"init_scripts": [ {
"dbfs": {
"destination": "dbfs:/databricks/<directory>/azureml-cluster-init.sh"
}
} ]
}' https://<databricks-instance>/api/2.0/clusters/edit
```

View File

@@ -1,24 +0,0 @@
#!/bin/bash
# This script configures the environment to
# 1. Use the configured AzureML Workspace with azureml.core.Workspace.from_config()
# 2. Set the default MLflow Tracking Server to be the AzureML managed one
############## START CONFIGURATION #################
# Provide the required *AzureML* workspace information
region="" # example: westus2
subscriptionId="" # example: bcb65f42-f234-4bff-91cf-9ef816cd9936
resourceGroupName="" # example: dev-rg
workspaceName="" # example: myazuremlws
# Optional config directory
configLocation="/databricks/config.json"
############### END CONFIGURATION #################
# Drop the workspace configuration on the cluster
sudo touch $configLocation
sudo echo {\\"subscription_id\\": \\"${subscriptionId}\\", \\"resource_group\\": \\"${resourceGroupName}\\", \\"workspace_name\\": \\"${workspaceName}\\"} > $configLocation
# Set the MLflow Tracking URI
trackingUri="adbazureml://${region}.experiments.azureml.net/history/v1.0/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${workspaceName}"
sudo echo export MLFLOW_TRACKING_URI=${trackingUri} >> /databricks/spark/conf/spark-env.sh

View File

@@ -13,7 +13,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.png)"
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/deploy-to-cloud/model-register-and-deploy.png)"
]
},
{
@@ -115,36 +115,6 @@
" workspace=ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n",
"\n",
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"env = Environment.from_conda_specification(name='deploytocloudenv', file_path='myenv.yml')\n",
"\n",
"# This is optional at this point\n",
"# env.register(workspace=ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -183,7 +153,10 @@
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)"
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\", \n",
" extra_docker_file_steps=\"helloworld.txt\")"
]
},
{
@@ -198,11 +171,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"azuremlexception-remarks-sample"
]
},
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice, Webservice\n",

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@@ -13,7 +13,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.png)"
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-local/register-model-deploy-local-advanced.png)"
]
},
{

View File

@@ -13,7 +13,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.png)"
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-local/register-model-deploy-local.png)"
]
},
{

View File

@@ -1,217 +0,0 @@
NOTICES AND INFORMATION
Do Not Translate or Localize
This Azure Machine Learning service example notebooks repository includes material from the projects listed below.
1. SSD-Tensorflow (https://github.com/balancap/ssd-tensorflow)
%% SSD-Tensorflow NOTICES AND INFORMATION BEGIN HERE
=========================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
=========================================
END OF SSD-Tensorflow NOTICES AND INFORMATION

View File

@@ -336,7 +336,7 @@
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-3'\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",

View File

@@ -404,7 +404,7 @@
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-1'\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
@@ -543,7 +543,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.5.6"
}
},
"nbformat": 4,

View File

@@ -694,7 +694,7 @@
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-2'\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",

View File

@@ -22,7 +22,7 @@
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
"1. Update scoring file.\n",
"2. Update aks configuration.\n",
"3. Deploy the model with this new configuration. "
"3. Build new image and deploy it. "
]
},
{
@@ -178,7 +178,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Create Inference Configuration"
"## 6. Create your new Image"
]
},
{
@@ -187,11 +187,22 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"Image with ridge regression model\",\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"myimage1\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
@@ -209,7 +220,7 @@
"source": [
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aci_deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
" description = 'Predict diabetes using regression model',\n",
@@ -225,7 +236,11 @@
"from azureml.core.webservice import Webservice\n",
"\n",
"aci_service_name = 'my-aci-service-4'\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_deployment_config)\n",
"print(aci_service_name)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
@@ -346,7 +361,7 @@
"outputs": [],
"source": [
"#Set the web service configuration\n",
"aks_deployment_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
]
},
{
@@ -364,12 +379,12 @@
"source": [
"if aks_target.provisioning_state== \"Succeeded\": \n",
" aks_service_name ='aks-w-dc5'\n",
" aks_service = Model.deploy(ws,\n",
" aks_service_name, \n",
" [model], \n",
" inference_config, \n",
" aks_deployment_config, \n",
" deployment_target = aks_target) \n",
" aks_service = Webservice.deploy_from_image(workspace = ws, \n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target\n",
" )\n",
" aks_service.wait_for_deployment(show_output = True)\n",
" print(aks_service.state)\n",
"else:\n",
@@ -449,6 +464,7 @@
"%%time\n",
"aks_service.delete()\n",
"aci_service.delete()\n",
"image.delete()\n",
"model.delete()"
]
}

View File

@@ -243,7 +243,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setting up inference configuration\n",
"### Create container image\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container."
]
},
@@ -265,7 +265,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we create the inference configuration."
"Then we have Azure ML create the container. This step will likely take a few minutes."
]
},
{
@@ -274,19 +274,48 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\",\n",
" extra_docker_file_steps = \"Dockerfile\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n",
" description = \"TinyYOLO ONNX Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxyolo\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model"
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
]
},
{
@@ -307,7 +336,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will take a few minutes to run as the model gets packaged up and deployed to ACI."
"The following cell will likely take a few minutes to run as well."
]
},
{
@@ -319,9 +348,14 @@
"from azureml.core.webservice import Webservice\n",
"from random import randint\n",
"\n",
"aci_service_name = 'my-aci-service-15ad'\n",
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]

View File

@@ -54,7 +54,7 @@
"\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n",
"In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
"In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
]
},
{
@@ -176,7 +176,7 @@
"source": [
"### ONNX FER+ Model Methodology\n",
"\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) in the ONNX model zoo.\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n",
"\n",
"The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n",
"\n",
@@ -341,7 +341,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup inference configuration"
"### Create the Container Image\n",
"\n",
"This step will likely take a few minutes."
]
},
{
@@ -350,19 +352,48 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\",\n",
" extra_docker_file_steps = \"Dockerfile\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n",
" description = \"Emotion ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"})\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model"
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"### Deploy the container image"
]
},
{
@@ -379,13 +410,6 @@
" description = 'ONNX for emotion recognition model')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -396,11 +420,23 @@
"\n",
"aci_service_name = 'onnx-demo-emotion'\n",
"print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -434,7 +470,7 @@
"\n",
"### Useful Helper Functions\n",
"\n",
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus)."
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)."
]
},
{

View File

@@ -54,7 +54,7 @@
"\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n",
"In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/vision/classification/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
"In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
]
},
{
@@ -187,7 +187,7 @@
"source": [
"### ONNX MNIST Model Methodology\n",
"\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/vision/classification/mnist) in the ONNX model zoo.\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n",
"\n",
"***Input: Handwritten Images from MNIST Dataset***\n",
"\n",
@@ -325,7 +325,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Inference Configuration"
"### Create the Container Image\n",
"This step will likely take a few minutes."
]
},
{
@@ -334,19 +335,48 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" extra_docker_file_steps = \"Dockerfile\",\n",
" conda_file=\"myenv.yml\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n",
" description = \"MNIST ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"}) \n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model"
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"### Deploy the container image"
]
},
{
@@ -367,7 +397,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run."
"The following cell will likely take a few minutes to run as well."
]
},
{
@@ -380,7 +410,12 @@
"\n",
"aci_service_name = 'onnx-demo-mnist'\n",
"print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]

View File

@@ -28,7 +28,7 @@
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n",
"## ResNet50 Details\n",
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/vision/classification/resnet). "
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). "
]
},
{
@@ -221,7 +221,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create inference configuration"
"### Create container image"
]
},
{
@@ -249,7 +249,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the inference configuration object"
"Then we have Azure ML create the container. This step will likely take a few minutes."
]
},
{
@@ -258,19 +258,48 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\",\n",
" extra_docker_file_steps = \"Dockerfile\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n",
" description = \"ONNX ResNet50 Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model"
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
]
},
{
@@ -305,7 +334,12 @@
"\n",
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]

View File

@@ -28,7 +28,7 @@
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n",
"## MNIST Details\n",
"The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/vision/classification/mnist). "
"The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/mnist). "
]
},
{
@@ -401,7 +401,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create inference configuration\n",
"### Create container image\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container."
]
},
@@ -423,7 +423,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we setup the inference configuration "
"Then we have Azure ML create the container. This step will likely take a few minutes."
]
},
{
@@ -432,19 +432,48 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\",\n",
" extra_docker_file_steps = \"Dockerfile\")"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n",
" description = \"MNIST ONNX Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxmnistdemo\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model"
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
]
},
{
@@ -475,12 +504,16 @@
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n",
"from random import randint\n",
"\n",
"aci_service_name = 'onnx-demo-mnist'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]

View File

@@ -34,6 +34,7 @@
"from azureml.core import Workspace\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.image import Image\n",
"from azureml.core.model import Model"
]
},
@@ -96,51 +97,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create the Environment\n",
"Create an environment that the model will be deployed with"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"conda_deps = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults'])\n",
"myenv = Environment(name='myenv')\n",
"myenv.python.conda_dependencies = conda_deps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use a custom Docker image\n",
"\n",
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n",
"\n",
"Only supported with `python` runtime.\n",
"```python\n",
"# use an image available in public Container Registry without authentication\n",
"myenv.docker.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
"\n",
"# or, use an image available in a private Container Registry\n",
"myenv.docker.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
"myenv.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"myenv.docker.base_image_registry.username = \"username\"\n",
"myenv.docker.base_image_registry.password = \"password\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write the Entry Script\n",
"Write the script that will be used to predict on your model"
"# Create an image\n",
"Create an image using the registered model the script that will load and run the model."
]
},
{
@@ -179,11 +137,17 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create the InferenceConfig\n",
"Create the inference config that will be used when deploying the model"
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
@@ -192,9 +156,47 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"inf_config = InferenceConfig(entry_script='score.py', environment=myenv)"
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"Image with ridge regression model\",\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"myimage1\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use a custom Docker image\n",
"\n",
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n",
"\n",
"Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n",
"```python\n",
"# use an image available in public Container Registry without authentication\n",
"image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
"\n",
"# or, use an image available in a private Container Registry\n",
"image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
"image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"image_config.base_image_registry.username = \"username\"\n",
"image_config.base_image_registry.password = \"password\"\n",
"\n",
"# or, use an image built during training.\n",
"image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n",
"```\n",
"You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n"
]
},
{
@@ -235,21 +237,23 @@
"metadata": {},
"outputs": [],
"source": [
"# from azureml.core.compute import ComputeTarget, AksCompute\n",
"'''\n",
"from azureml.core.compute import ComputeTarget, AksCompute\n",
"\n",
"# # Create the compute configuration and set virtual network information\n",
"# config = AksCompute.provisioning_configuration(location=\"eastus2\")\n",
"# config.vnet_resourcegroup_name = \"mygroup\"\n",
"# config.vnet_name = \"mynetwork\"\n",
"# config.subnet_name = \"default\"\n",
"# config.service_cidr = \"10.0.0.0/16\"\n",
"# config.dns_service_ip = \"10.0.0.10\"\n",
"# config.docker_bridge_cidr = \"172.17.0.1/16\"\n",
"# Create the compute configuration and set virtual network information\n",
"config = AksCompute.provisioning_configuration(location=\"eastus2\")\n",
"config.vnet_resourcegroup_name = \"mygroup\"\n",
"config.vnet_name = \"mynetwork\"\n",
"config.subnet_name = \"default\"\n",
"config.service_cidr = \"10.0.0.0/16\"\n",
"config.dns_service_ip = \"10.0.0.10\"\n",
"config.docker_bridge_cidr = \"172.17.0.1/16\"\n",
"\n",
"# # Create the compute target\n",
"# aks_target = ComputeTarget.create(workspace = ws,\n",
"# name = \"myaks\",\n",
"# provisioning_configuration = config)"
"# Create the compute target\n",
"aks_target = ComputeTarget.create(workspace = ws,\n",
" name = \"myaks\",\n",
" provisioning_configuration = config)\n",
"'''"
]
},
{
@@ -296,15 +300,17 @@
"metadata": {},
"outputs": [],
"source": [
"# # Use the default configuration (can also provide parameters to customize)\n",
"# resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
"'''\n",
"# Use the default configuration (can also provide parameters to customize)\n",
"resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
"\n",
"# create_name='my-existing-aks' \n",
"# # Create the cluster\n",
"# attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
"# aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
"# # Wait for the operation to complete\n",
"# aks_target.wait_for_completion(True)"
"create_name='my-existing-aks' \n",
"# Create the cluster\n",
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
"aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
"# Wait for the operation to complete\n",
"aks_target.wait_for_completion(True)\n",
"'''"
]
},
{
@@ -320,11 +326,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Set the web service configuration (using default here)\n",
"aks_config = AksWebservice.deploy_configuration()\n",
"\n",
"# # Enable token auth and disable (key) auth on the webservice\n",
"# aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)\n"
"#Set the web service configuration (using default here)\n",
"aks_config = AksWebservice.deploy_configuration()"
]
},
{
@@ -336,13 +339,11 @@
"%%time\n",
"aks_service_name ='aks-service-1'\n",
"\n",
"aks_service = Model.deploy(workspace=ws,\n",
" name=aks_service_name,\n",
" models=[model],\n",
" inference_config=inf_config,\n",
" deployment_config=aks_config,\n",
" deployment_target=aks_target)\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
@@ -389,12 +390,11 @@
"metadata": {},
"outputs": [],
"source": [
"# # if (key) auth is enabled, retrieve the API keys. AML generates two keys.\n",
"# key1, Key2 = aks_service.get_keys()\n",
"# print(key1)\n",
"\n",
"# # if token auth is enabled, retrieve the token.\n",
"# access_token, refresh_after = aks_service.get_token()"
"# retreive the API keys. AML generates two keys.\n",
"'''\n",
"key1, Key2 = aks_service.get_keys()\n",
"print(key1)\n",
"'''"
]
},
{
@@ -404,28 +404,27 @@
"outputs": [],
"source": [
"# construct raw HTTP request and send to the service\n",
"# %%time\n",
"'''\n",
"%%time\n",
"\n",
"# import requests\n",
"import requests\n",
"\n",
"# import json\n",
"import json\n",
"\n",
"# test_sample = json.dumps({'data': [\n",
"# [1,2,3,4,5,6,7,8,9,10], \n",
"# [10,9,8,7,6,5,4,3,2,1]\n",
"# ]})\n",
"# test_sample = bytes(test_sample,encoding = 'utf8')\n",
"test_sample = json.dumps({'data': [\n",
" [1,2,3,4,5,6,7,8,9,10], \n",
" [10,9,8,7,6,5,4,3,2,1]\n",
"]})\n",
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
"\n",
"# # If (key) auth is enabled, don't forget to add key to the HTTP header.\n",
"# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
"# Don't forget to add key to the HTTP header.\n",
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
"\n",
"# # If token auth is enabled, don't forget to add token to the HTTP header.\n",
"# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + access_token}\n",
"\n",
"# resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n",
"resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n",
"\n",
"\n",
"# print(\"prediction:\", resp.text)"
"print(\"prediction:\", resp.text)\n",
"'''"
]
},
{
@@ -444,6 +443,7 @@
"source": [
"%%time\n",
"aks_service.delete()\n",
"image.delete()\n",
"model.delete()"
]
}
@@ -470,7 +470,27 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"friendly_name": "Prepare data for regression modeling",
"exclude_from_index": false,
"order_index": 1,
"category": "deployment",
"tags": [
"featured"
],
"task": "Regression",
"datasets": [
"test"
],
"compute": [
"localtest"
],
"deployment": [
"AKS"
],
"framework": [
"test1"
]
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -1,736 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and explain models remotely via Azure Machine Learning Compute\n",
"\n",
"\n",
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a regression model remotely on an Azure Machine Leanrning Compute Target (AMLCompute).**_\n",
"\n",
"\n",
"\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
" 1. Initialize a Workspace\n",
" 1. Create an Experiment\n",
" 1. Introduction to AmlCompute\n",
" 1. Submit an AmlCompute run in a few different ways\n",
" 1. Option 1: Provision as a run based compute target \n",
" 1. Option 2: Provision as a persistent compute target (Basic)\n",
" 1. Option 3: Provision as a persistent compute target (Advanced)\n",
"1. Additional operations to perform on AmlCompute\n",
"1. [Download model explanations from Azure Machine Learning Run History](#Download)\n",
"1. [Visualize explanations](#Visualize)\n",
"1. [Next steps](#Next)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"This notebook showcases how to train and explain a regression model remotely via Azure Machine Learning Compute (AMLCompute), and download the calculated explanations locally for visualization.\n",
"It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations.\n",
"\n",
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP).\n",
"\n",
"Problem: Boston Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n",
"\n",
"| ![explanations-run-history](./img/explanations-run-history.PNG) |\n",
"|:--:|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't.\n",
"\n",
"\n",
"If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize a Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create An Experiment\n",
"\n",
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"experiment_name = 'explainer-remote-run-on-amlcompute'\n",
"experiment = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to AmlCompute\n",
"\n",
"Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n",
"\n",
"Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n",
"\n",
"For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n",
"\n",
"If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
"\n",
"\n",
"The training script `train_explain.py` is already created for you. Let's have a look."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit an AmlCompute run in a few different ways\n",
"\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n",
"\n",
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"AmlCompute.supported_vmsizes(workspace=ws)\n",
"# AmlCompute.supported_vmsizes(workspace=ws, location='southcentralus')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create project directory\n",
"\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import shutil\n",
"\n",
"project_folder = './explainer-remote-run-on-amlcompute'\n",
"os.makedirs(project_folder, exist_ok=True)\n",
"shutil.copy('train_explain.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 1: Provision as a run based compute target\n",
"\n",
"You can provision AmlCompute as a compute target at run-time. In this case, the compute is auto-created for your run, scales up to max_nodes that you specify, and then **deleted automatically** after the run completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"# create a new runconfig object\n",
"run_config = RunConfiguration()\n",
"\n",
"# signal that you want to use AmlCompute to execute script.\n",
"run_config.target = \"amlcompute\"\n",
"\n",
"# AmlCompute will be created in the same region as workspace\n",
"# Set vm size for AmlCompute\n",
"run_config.amlcompute.vm_size = 'STANDARD_D2_V2'\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"# set Docker base image to the default CPU-based image\n",
"run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n",
"\n",
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'sklearn-pandas', 'azureml-dataprep'\n",
"]\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"# Now submit a run on AmlCompute\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory=project_folder,\n",
" script='train_explain.py',\n",
" run_config=run_config)\n",
"\n",
"run = experiment.submit(script_run_config)\n",
"\n",
"# Show run details\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 2: Provision as a persistent compute target (Basic)\n",
"\n",
"You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
"\n",
"* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n",
"* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'azureml-dataprep'\n",
"]\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, \n",
" script='train_explain.py', \n",
" run_config=run_config) \n",
"run = experiment.submit(config=src)\n",
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 3: Provision as a persistent compute target (Advanced)\n",
"\n",
"You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n",
"\n",
"In addition to `vm_size` and `max_nodes`, you can specify:\n",
"* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
"* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
"* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
"* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n",
"* `vnet_name`: Name of VNet\n",
"* `subnet_name`: Name of SubNet within the VNet"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" vm_priority='lowpriority',\n",
" min_nodes=2,\n",
" max_nodes=4,\n",
" idle_seconds_before_scaledown='300',\n",
" vnet_resourcegroup_name='<my-resource-group>',\n",
" vnet_name='<my-vnet-name>',\n",
" subnet_name='<my-subnet-name>')\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'azureml-dataprep'\n",
"]\n",
"\n",
"\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, \n",
" script='train_explain.py', \n",
" run_config=run_config) \n",
"run = experiment.submit(config=src)\n",
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
"\n",
"client = ExplanationClient.from_run(run)\n",
"# Get the top k (e.g., 4) most important features with their importance values\n",
"explanation = client.download_model_explanation(top_k=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional operations to perform on AmlCompute\n",
"\n",
"You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get_status () gets the latest status of the AmlCompute target\n",
"cpu_cluster.get_status().serialize()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n",
"# cpu_cluster.update(min_nodes=1)\n",
"# cpu_cluster.update(max_nodes=10)\n",
"cpu_cluster.update(idle_seconds_before_scaledown=300)\n",
"# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
"# 'cpu-cluster' in this case but use a different VM family for instance.\n",
"\n",
"# cpu_cluster.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download \n",
"1. Download model explanation data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
"\n",
"# Get model explanation data\n",
"client = ExplanationClient.from_run(run)\n",
"global_explanation = client.download_model_explanation()\n",
"local_importance_values = global_explanation.local_importance_values\n",
"expected_values = global_explanation.expected_values\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Or you can use the saved run.id to retrive the feature importance values\n",
"client = ExplanationClient.from_run_id(ws, experiment_name, run.id)\n",
"global_explanation = client.download_model_explanation()\n",
"local_importance_values = global_explanation.local_importance_values\n",
"expected_values = global_explanation.expected_values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the top k (e.g., 4) most important features with their importance values\n",
"global_explanation_topk = client.download_model_explanation(top_k=4)\n",
"global_importance_values = global_explanation_topk.get_ranked_global_values()\n",
"global_importance_names = global_explanation_topk.get_ranked_global_names()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('global importance values: {}'.format(global_importance_values))\n",
"print('global importance names: {}'.format(global_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Download model file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n",
"from sklearn.externals import joblib\n",
"original_model = Model(ws, 'model_explain_model_on_amlcomp')\n",
"model_path = original_model.download(exist_ok=True)\n",
"original_model = joblib.load(model_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Download test dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# retrieve x_test for visualization\n",
"from sklearn.externals import joblib\n",
"x_test_path = './x_test_boston_housing.pkl'\n",
"run.download_file('x_test_boston_housing.pkl', output_file_path=x_test_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_test = joblib.load('x_test_boston_housing.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize\n",
"Load the visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, original_model, x_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. Inferencing time: deploy a classification model and explainer:\n",
" 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
" 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,8 +0,0 @@
name: explain-model-on-amlcompute
dependencies:
- pip:
- azureml-sdk
- azureml-explain-model
- azureml-contrib-explain-model
- sklearn-pandas
- azureml-dataprep

View File

@@ -1,64 +0,0 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn import datasets
from sklearn.linear_model import Ridge
from azureml.explain.model.tabular_explainer import TabularExplainer
from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np
OUTPUT_DIR = './outputs/'
os.makedirs(OUTPUT_DIR, exist_ok=True)
boston_data = datasets.load_boston()
run = Run.get_context()
client = ExplanationClient.from_run(run)
X_train, X_test, y_train, y_test = train_test_split(boston_data.data,
boston_data.target,
test_size=0.2,
random_state=0)
# write x_test out as a pickle file for later visualization
x_test_pkl = 'x_test.pkl'
with open(x_test_pkl, 'wb') as file:
joblib.dump(value=X_test, filename=os.path.join(OUTPUT_DIR, x_test_pkl))
run.upload_file('x_test_boston_housing.pkl', os.path.join(OUTPUT_DIR, x_test_pkl))
alpha = 0.5
# Use Ridge algorithm to create a regression model
reg = Ridge(alpha)
model = reg.fit(X_train, y_train)
preds = reg.predict(X_test)
run.log('alpha', alpha)
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
# save model in the outputs folder so it automatically get uploaded
with open(model_file_name, 'wb') as file:
joblib.dump(value=reg, filename=os.path.join(OUTPUT_DIR,
model_file_name))
# register the model
run.upload_file('original_model.pkl', os.path.join('./outputs/', model_file_name))
original_model = run.register_model(model_name='model_explain_model_on_amlcomp',
model_path='original_model.pkl')
# Explain predictions on your local machine
tabular_explainer = TabularExplainer(model, X_train, features=boston_data.feature_names)
# Explain overall model predictions (global explanation)
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations it will
# take longer although they may be more accurate
global_explanation = tabular_explainer.explain_global(X_test)
# Uploading model explanation data for storage or visualization in webUX
# The explanation can then be downloaded on any compute
comment = 'Global explanation on regression model trained on boston dataset'
client.upload_model_explanation(global_explanation, comment=comment)

View File

@@ -60,11 +60,25 @@
"2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"```\n"
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```"
]
},
{

View File

@@ -1,33 +0,0 @@
import json
import numpy as np
import pandas as pd
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from azureml.core.model import Model
def init():
global original_model
global scoring_explainer
# Retrieve the path to the model file using the model name
# Assume original model is named original_prediction_model
original_model_path = Model.get_model_path('local_deploy_model')
scoring_explainer_path = Model.get_model_path('IBM_attrition_explainer')
original_model = joblib.load(original_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
# Get predictions and explanations for each data point
data = pd.read_json(raw_data)
# Make prediction
predictions = original_model.predict(data)
# Retrieve model explanations
local_importance_values = scoring_explainer.explain(data)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(), 'local_importance_values': local_importance_values}

View File

@@ -1,33 +0,0 @@
import json
import numpy as np
import pandas as pd
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from azureml.core.model import Model
def init():
global original_model
global scoring_explainer
# Retrieve the path to the model file using the model name
# Assume original model is named original_prediction_model
original_model_path = Model.get_model_path('amlcompute_deploy_model')
scoring_explainer_path = Model.get_model_path('IBM_attrition_explainer')
original_model = joblib.load(original_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
# Get predictions and explanations for each data point
data = pd.read_json(raw_data)
# Make prediction
predictions = original_model.predict(data)
# Retrieve model explanations
local_importance_values = scoring_explainer.explain(data)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(), 'local_importance_values': local_importance_values}

View File

@@ -268,8 +268,7 @@
"\n",
"# Register original model\n",
"run.upload_file('original_model.pkl', os.path.join('./outputs/', model_file_name))\n",
"original_model = run.register_model(model_name='local_deploy_model', \n",
" model_path='original_model.pkl')\n",
"original_model = run.register_model(model_name='original_model', model_path='original_model.pkl')\n",
"\n",
"# Register scoring explainer\n",
"run.upload_file('IBM_attrition_explainer.pkl', 'scoring_explainer.pkl')\n",
@@ -384,7 +383,7 @@
"from azureml.core.image import ContainerImage\n",
"\n",
"# Use the custom scoring, docker, and conda files we created above\n",
"image_config = ContainerImage.image_configuration(execution_script=\"score_local_explain.py\",\n",
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
" docker_file=\"dockerfile\", \n",
" runtime=\"python\", \n",
" conda_file=\"myenv.yml\")\n",

View File

@@ -241,7 +241,7 @@
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'azureml-dataprep'\n",
" 'azureml-explain-model'\n",
"]\n",
" \n",
"\n",
@@ -309,7 +309,7 @@
"# retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n",
"from sklearn.externals import joblib\n",
"original_model = Model(ws, 'amlcompute_deploy_model')\n",
"original_model = Model(ws, 'original_model')\n",
"model_path = original_model.download(exist_ok=True)\n",
"original_svm_model = joblib.load(model_path)"
]
@@ -447,7 +447,7 @@
"from azureml.core.image import ContainerImage\n",
"\n",
"# Use the custom scoring, docker, and conda files we created above\n",
"image_config = ContainerImage.image_configuration(execution_script=\"score_remote_explain.py\",\n",
"image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
" docker_file=\"dockerfile\", \n",
" runtime=\"python\", \n",
" conda_file=\"myenv.yml\")\n",

View File

@@ -4,5 +4,5 @@ dependencies:
- azureml-sdk
- azureml-explain-model
- azureml-contrib-explain-model
- sklearn-pandas
- azureml-dataprep
- sklearn-pandas

View File

@@ -99,8 +99,7 @@ with open(model_file_name, 'wb') as file:
# register the model with the model management service for later use
run.upload_file('original_model.pkl', os.path.join(OUTPUT_DIR, model_file_name))
original_model = run.register_model(model_name='amlcompute_deploy_model',
model_path='original_model.pkl')
original_model = run.register_model(model_name='original_model', model_path='original_model.pkl')
# create an explainer to validate or debug the model
tabular_explainer = TabularExplainer(model,

View File

@@ -62,10 +62,24 @@
"4. Visualize the global and local explanations with the visualization dashboard.\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```\n"
]
},

View File

@@ -59,10 +59,24 @@
"3. Visualize the global and local explanations with the visualization dashboard.\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```\n"
]
},

View File

@@ -60,10 +60,24 @@
"3. Visualize the global and local explanations with the visualization dashboard.\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```\n"
]
},

View File

@@ -59,10 +59,24 @@
"3. Visualize the global and local explanations with the visualization dashboard.\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```\n"
]
},

Binary file not shown.

Before

Width:  |  Height:  |  Size: 116 KiB

View File

@@ -61,10 +61,24 @@
"4. Visualize the global and local explanations with the visualization dashboard.\n",
"---\n",
"\n",
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"## Setup\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```\n"
]
},

View File

@@ -36,12 +36,13 @@ Azure Machine Learning Pipelines optimize for simplicity, speed, and efficiency.
In this directory, there are two types of notebooks:
* The first type of notebooks will introduce you to core Azure Machine Learning Pipelines features. Notebooks in this category are designed to go in sequence; they're all located in the [intro-to-pipelines](./intro-to-pipelines/) folder.
* The first type of notebooks will introduce you to core Azure Machine Learning Pipelines features. These notebooks below belong in this category, and are designed to go in sequence; they're all located in the "intro-to-pipelines" folder:
Take a look at [intro-to-pipelines](./intro-to-pipelines/) for the list of notebooks that introduce Azure Machine Learning concepts for you.
* The second type of notebooks illustrate more sophisticated scenarios, and are independent of each other. These notebooks include:
1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines.
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. This sample also showcases how to use conda dependencies using runconfig when using Pipelines.
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute.
3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/README.png)

View File

@@ -15,7 +15,6 @@ These notebooks below are designed to go in sequence.
10. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore.
11. [aml-pipelines-with-automated-machine-learning-step.ipynb](https://aka.ms/pl-automl): AutoMLStep in Pipelines shows how you can do automated machine learning using Pipelines.
12. [aml-pipelines-setup-versioned-pipeline-endpoints.ipynb](https://aka.ms/pl-ver-endpoint): This notebook shows how you can setup PipelineEndpoint and submit a Pipeline using the PipelineEndpoint.
13. [aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb](https://aka.ms/pl-datapath): This notebook showcases how to use DataPath and PipelineParameter in AML Pipeline.
14. [aml-pipelines-how-to-use-pipeline-drafts.ipynb](http://aka.ms/pl-pl-draft): This notebook shows how to use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.png)

View File

@@ -460,7 +460,7 @@
"source": [
"# Submit syntax\n",
"# submit(experiment_name, \n",
"# pipeline_parameters=None, \n",
"# pipeline_params=None, \n",
"# continue_on_step_failure=False, \n",
"# regenerate_outputs=False)\n",
"\n",

View File

@@ -1,266 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to Use Pipeline Drafts\n",
"In this notebook, we will show you how you can use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites and AML Basics\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n",
"\n",
"### Initialization Steps"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace\n",
"from azureml.core import Run, Experiment, Datastore\n",
"from azureml.widgets import RunDetails\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compute Target\n",
"Retrieve an already attached Azure Machine Learning Compute to use in the Pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
"except:\n",
" print(\"Creating new compute target: {}\".format(aml_compute_target))\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build a Pipeline\n",
"Build a simple pipeline to use to create a PipelineDraft."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"from azureml.pipeline.steps import PythonScriptStep\n",
"\n",
"source_directory = \"publish_run_train\"\n",
"\n",
"train_step = PythonScriptStep(\n",
" name=\"Training_Step\",\n",
" script_name=\"train.py\", \n",
" compute_target=aml_compute_target, \n",
" source_directory=source_directory)\n",
"print(\"train step created\")\n",
"\n",
"pipeline = Pipeline(workspace=ws, steps=[train_step])\n",
"print (\"Pipeline is built\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Pipeline Draft\n",
"Create a PipelineDraft by specifying a name, description, experiment_name and Pipeline. You can also specify tags, properties and pipeline_parameter values.\n",
"\n",
"In this example we use the previously created Pipeline object to create the Pipeline Draft. You can also create a Pipeline Draft from an existing Pipeline Run, Published Pipeline, or other Pipeline Draft."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import PipelineDraft\n",
"\n",
"pipeline_draft = PipelineDraft.create(ws, name=\"TestPipelineDraft\",\n",
" description=\"draft description\",\n",
" experiment_name=\"helloworld\",\n",
" pipeline=pipeline,\n",
" continue_on_step_failure=True,\n",
" tags={'dev': 'true'},\n",
" properties={'train': 'value'})\n",
"\n",
"created_pipeline_draft_id = pipeline_draft.id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List Pipeline Drafts in a Workspace\n",
"Use the PipelineDraft.list() function to list all PipelineDrafts in a Workspace. You can use the optional tags parameter to filter on specified tag values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_drafts = PipelineDraft.list(ws, tags={'dev': 'true'})\n",
"\n",
"for pipeline_draft in pipeline_drafts:\n",
" print(pipeline_draft)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get a Pipeline Draft by Id"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_draft = PipelineDraft.get(ws, id=created_pipeline_draft_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Update a Pipeline Draft\n",
"The update() function of a pipeline draft can be used to update the name, description, experiment name, pipeline parameter assignments, continue on step failure setting and Pipeline associated with the PipelineDraft. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_train_step = PythonScriptStep(\n",
" name=\"New_Training_Step\",\n",
" script_name=\"train.py\", \n",
" compute_target=aml_compute_target, \n",
" source_directory=source_directory)\n",
"\n",
"new_pipeline = Pipeline(workspace=ws, steps=[new_train_step])\n",
"\n",
"pipeline_draft.update(name=\"UpdatedPipelineDraft\", description=\"has updated train step\", pipeline=new_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit a Pipeline Run from a Pipeline Draft\n",
"Use the pipeline_draft.submit() function to submit a PipelineRun. After the run is submitted, the PipelineDraft can still be edited and used to submit new runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run = pipeline_draft.submit_run()\n",
"pipeline_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Published Pipeline from a Pipeline Draft\n",
"Use the pipeline_draft.publish() function to create a Published Pipeline from the Pipeline Draft. After creating a Published Pipeline, the Pipeline Draft can still be edited and used to create other Published Pipelines."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline = pipeline_draft.publish()\n",
"published_pipeline"
]
}
],
"metadata": {
"authors": [
{
"name": "elihop"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,5 +0,0 @@
name: aml-pipelines-how-to-use-pipeline-drafts
dependencies:
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -321,11 +321,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hyperdriveconfig-remarks-sample"
]
},
"metadata": {},
"outputs": [],
"source": [
"hd_config = HyperDriveConfig(estimator=est, \n",
@@ -333,7 +329,7 @@
" policy=early_termination_policy,\n",
" primary_metric_name='validation_acc', \n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
" max_total_runs=4,\n",
" max_total_runs=10,\n",
" max_concurrent_runs=4)"
]
},
@@ -441,7 +437,8 @@
"metadata": {},
"outputs": [],
"source": [
"pipeline_run.wait_for_completion()"
"# PUBLISHONLY\n",
"# pipeline_run.wait_for_completion()"
]
},
{
@@ -458,8 +455,9 @@
"metadata": {},
"outputs": [],
"source": [
"metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n",
"num_file_downloaded = metrics_output.download('.', show_progress=True)"
"# PUBLISHONLY\n",
"# metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n",
"# num_file_downloaded = metrics_output.download('.', show_progress=True)"
]
},
{
@@ -468,14 +466,15 @@
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import json\n",
"with open(metrics_output._path_on_datastore) as f: \n",
" metrics_output_result = f.read()\n",
"# PUBLISHONLY\n",
"# import pandas as pd\n",
"# import json\n",
"# with open(metrics_output._path_on_datastore) as f: \n",
"# metrics_output_result = f.read()\n",
" \n",
"deserialized_metrics_output = json.loads(metrics_output_result)\n",
"df = pd.DataFrame(deserialized_metrics_output)\n",
"df"
"# deserialized_metrics_output = json.loads(metrics_output_result)\n",
"# df = pd.DataFrame(deserialized_metrics_output)\n",
"# df"
]
},
{
@@ -492,9 +491,10 @@
"metadata": {},
"outputs": [],
"source": [
"hd_step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run(hd_step_name)[0])\n",
"best_run = hd_step_run.get_best_run_by_primary_metric()\n",
"best_run"
"# PUBLISHONLY\n",
"# hd_step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run(hd_step_name)[0])\n",
"# best_run = hd_step_run.get_best_run_by_primary_metric()\n",
"# best_run"
]
},
{
@@ -510,7 +510,8 @@
"metadata": {},
"outputs": [],
"source": [
"print(best_run.get_file_names())"
"# PUBLISHONLY\n",
"# print(best_run.get_file_names())"
]
},
{
@@ -526,7 +527,8 @@
"metadata": {},
"outputs": [],
"source": [
"model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')"
"# PUBLISHONLY\n",
"# model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')"
]
},
{
@@ -590,14 +592,15 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import CondaDependencies\n",
"# PUBLISHONLY\n",
"# from azureml.core.runconfig import CondaDependencies\n",
"\n",
"cd = CondaDependencies.create()\n",
"cd.add_conda_package('numpy')\n",
"cd.add_tensorflow_conda_package()\n",
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
"# cd = CondaDependencies.create()\n",
"# cd.add_conda_package('numpy')\n",
"# cd.add_tensorflow_conda_package()\n",
"# cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
"\n",
"print(cd.serialize_to_string())"
"# print(cd.serialize_to_string())"
]
},
{
@@ -614,12 +617,13 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
"# PUBLISHONLY\n",
"# from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n",
" description='Tensorflow DNN on MNIST')"
"# aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
"# memory_gb=1, \n",
"# tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n",
"# description='Tensorflow DNN on MNIST')"
]
},
{
@@ -644,11 +648,12 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"# PUBLISHONLY\n",
"# from azureml.core.image import ContainerImage\n",
"\n",
"imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
" runtime=\"python\", \n",
" conda_file=\"myenv.yml\")"
"# imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
"# runtime=\"python\", \n",
"# conda_file=\"myenv.yml\")"
]
},
{
@@ -657,16 +662,17 @@
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.webservice import Webservice\n",
"# PUBLISHONLY\n",
"# %%time\n",
"# from azureml.core.webservice import Webservice\n",
"\n",
"service = Webservice.deploy_from_model(workspace=ws,\n",
" name='tf-mnist-svc',\n",
" deployment_config=aciconfig,\n",
" models=[model],\n",
" image_config=imgconfig)\n",
"# service = Webservice.deploy_from_model(workspace=ws,\n",
"# name='tf-mnist-svc',\n",
"# deployment_config=aciconfig,\n",
"# models=[model],\n",
"# image_config=imgconfig)\n",
"\n",
"service.wait_for_deployment(show_output=True)"
"# service.wait_for_deployment(show_output=True)"
]
},
{
@@ -682,7 +688,8 @@
"metadata": {},
"outputs": [],
"source": [
"print(service.get_logs())"
"# PUBLISHONLY\n",
"# print(service.get_logs())"
]
},
{
@@ -698,7 +705,8 @@
"metadata": {},
"outputs": [],
"source": [
"print(service.scoring_uri)"
"# PUBLISHONLY\n",
"# print(service.scoring_uri)"
]
},
{
@@ -717,36 +725,37 @@
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"# PUBLISHONLY\n",
"# import json\n",
"\n",
"# find 30 random samples from test set\n",
"n = 30\n",
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
"# # find 30 random samples from test set\n",
"# n = 30\n",
"# sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
"\n",
"test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
"test_samples = bytes(test_samples, encoding='utf8')\n",
"# test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
"# test_samples = bytes(test_samples, encoding='utf8')\n",
"\n",
"# predict using the deployed model\n",
"result = service.run(input_data=test_samples)\n",
"# # predict using the deployed model\n",
"# result = service.run(input_data=test_samples)\n",
"\n",
"# compare actual value vs. the predicted values:\n",
"i = 0\n",
"plt.figure(figsize = (20, 1))\n",
"# # compare actual value vs. the predicted values:\n",
"# i = 0\n",
"# plt.figure(figsize = (20, 1))\n",
"\n",
"for s in sample_indices:\n",
" plt.subplot(1, n, i + 1)\n",
" plt.axhline('')\n",
" plt.axvline('')\n",
"# for s in sample_indices:\n",
"# plt.subplot(1, n, i + 1)\n",
"# plt.axhline('')\n",
"# plt.axvline('')\n",
" \n",
" # use different color for misclassified sample\n",
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
"# # use different color for misclassified sample\n",
"# font_color = 'red' if y_test[s] != result[i] else 'black'\n",
"# clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
" plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
"# plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
"# plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
" \n",
" i = i + 1\n",
"plt.show()"
"# i = i + 1\n",
"# plt.show()"
]
},
{
@@ -762,20 +771,21 @@
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"# PUBLISHONLY\n",
"# import requests\n",
"\n",
"# send a random row from the test set to score\n",
"random_index = np.random.randint(0, len(X_test)-1)\n",
"input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
"# # send a random row from the test set to score\n",
"# random_index = np.random.randint(0, len(X_test)-1)\n",
"# input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
"\n",
"headers = {'Content-Type':'application/json'}\n",
"# headers = {'Content-Type':'application/json'}\n",
"\n",
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"# resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n",
"print(\"POST to url\", service.scoring_uri)\n",
"print(\"input data:\", input_data)\n",
"print(\"label:\", y_test[random_index])\n",
"print(\"prediction:\", resp.text)"
"# print(\"POST to url\", service.scoring_uri)\n",
"# print(\"input data:\", input_data)\n",
"# print(\"label:\", y_test[random_index])\n",
"# print(\"prediction:\", resp.text)"
]
},
{
@@ -794,17 +804,18 @@
"metadata": {},
"outputs": [],
"source": [
"models = ws.models\n",
"for name, model in models.items():\n",
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
"# PUBLISHONLY\n",
"# models = ws.models\n",
"# for name, model in models.items():\n",
"# print(\"Model: {}, ID: {}\".format(name, model.id))\n",
" \n",
"images = ws.images\n",
"for name, image in images.items():\n",
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
"# images = ws.images\n",
"# for name, image in images.items():\n",
"# print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
" \n",
"webservices = ws.webservices\n",
"for name, webservice in webservices.items():\n",
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
"# webservices = ws.webservices\n",
"# for name, webservice in webservices.items():\n",
"# print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
]
},
{
@@ -821,14 +832,15 @@
"metadata": {},
"outputs": [],
"source": [
"service.delete()"
"# PUBLISHONLY\n",
"# service.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "sanpil"
"name": "sonnyp"
}
],
"kernelspec": {

Some files were not shown because too many files have changed in this diff Show More