Compare commits

..

125 Commits

Author SHA1 Message Date
Roope Astala
7bb906b53c Merge pull request #87 from rastala/master
Update to version 0.1.80
2018-11-20 11:02:28 -05:00
rastala
5726fe3ddb Version 0.1.80 2018-11-20 11:00:48 -05:00
rastala
d10b1fa796 Revert "Updated notebook folders"
This reverts commit 06728004b6.
2018-11-20 10:39:48 -05:00
rastala
d7127de03c Revert "Update tutorials/README.md"
This reverts commit 50787f4ccc.
2018-11-20 10:39:34 -05:00
Roope Astala
50787f4ccc Update tutorials/README.md 2018-11-19 13:35:11 -05:00
Roope Astala
06728004b6 Updated notebook folders 2018-11-19 13:28:49 -05:00
Roope Astala
f5bcc55fe3 Merge pull request #74 from yueguoguo/master
Typo in README
2018-11-09 09:51:01 -05:00
Roope Astala
f23fb58200 Merge pull request #77 from rastala/master
Fix autoscale
2018-11-09 09:47:46 -05:00
Roope Astala
dbce7b8db2 Fix autoscase 2018-11-09 09:47:01 -05:00
Roope Astala
303090adf6 Merge pull request #76 from rastala/master
Update 00.configuration.ipynb
2018-11-09 09:33:07 -05:00
Roope Astala
b091d1f5f1 Update 00.configuration.ipynb
Create computes in 00.configuration, and link to tutorial
2018-11-09 09:31:25 -05:00
Hai Ning
803d69c539 Update 03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb 2018-11-07 13:54:11 -05:00
Zhang Le
37848e9686 Merge pull request #1 from yueguoguo/yueguoguo-patch-1
Typo in README
2018-11-07 13:18:31 +08:00
Zhang Le
7d9227441e Typo in README
Typo of `psutil`.
2018-11-07 13:17:53 +08:00
Roope Astala
21c454b0f2 Merge pull request #72 from rastala/master
Add logging API notebook
2018-11-06 12:46:39 -05:00
Roope Astala
c7b0960ae4 Add logging API notebook 2018-11-06 12:46:05 -05:00
Roope Astala
14e11fefd6 Delete .gitignore 2018-11-06 12:31:53 -05:00
Roope Astala
4deaeb04cf Delete 05.train-in-spark-checkpoint.ipynb 2018-11-06 12:31:32 -05:00
Roope Astala
ee78323df2 Delete 03.train-on-aci-checkpoint.ipynb 2018-11-06 12:31:18 -05:00
Roope Astala
89c2622938 Delete 02.train-on-local-checkpoint.ipynb 2018-11-06 12:31:03 -05:00
Roope Astala
96b352e3be Delete 04.train-on-remote-vm-checkpoint.ipynb 2018-11-06 12:30:43 -05:00
Roope Astala
5280201f93 Merge pull request #70 from wchill/fix_macos_sigsegv
Fix segfault under certain conditions when running AutoML pipelines on MacOS
2018-11-05 19:04:14 -05:00
Eric Ahn
3825fd2c10 Fix segfault under certain conditions on MacOS 2018-11-05 15:06:38 -08:00
Roope Astala
b936dd3505 Merge pull request #69 from rastala/master
New SDK version 0.1.74
2018-11-05 15:28:40 -05:00
Roope Astala
7339c95ea0 New SDK version 2018-11-05 15:27:36 -05:00
Hai Ning
32102e2aac Update pipeline-batch-scoring.ipynb 2018-11-02 14:18:38 -04:00
Hai Ning
a043769197 Update pr.md 2018-10-29 22:23:49 -04:00
Hai Ning
a0f3727cf4 Update pr.md 2018-10-29 22:23:39 -04:00
Roope Astala
0e8b42f8c7 Delete snowleopardgaze.jpg 2018-10-26 16:53:47 -04:00
hning86
2daafdbca1 logging api sample 2018-10-26 14:02:05 -04:00
Roope Astala
fec2e97310 Merge pull request #62 from rastala/master
Fix link in 01 getting started
2018-10-26 10:27:42 -04:00
Roope Astala
1a79e53935 Fix link in 01 getting started 2018-10-26 10:26:38 -04:00
Hai Ning
900cc7a76b remove json.loads 2018-10-25 13:03:10 -04:00
Roope Astala
3148e52258 Merge pull request #60 from rastala/master
fix json output
2018-10-25 12:48:28 -04:00
Roope Astala
dda402db83 fix json output 2018-10-25 12:47:38 -04:00
Roope Astala
603f4a6434 Merge pull request #58 from rastala/master
Tutorial fixes
2018-10-24 13:47:05 -04:00
Roope Astala
114449dd9b Tutorial fixes 2018-10-24 13:45:15 -04:00
Roope Astala
de20b6c40e Merge pull request #55 from Azure/sdgilley-patch-1
Update 03.auto-train-models.ipynb
2018-10-22 12:43:20 -04:00
Hai Ning
886ece1089 Update pr.md 2018-10-22 11:23:49 -04:00
Sheri Gilley
0dfe00d05a Update 03.auto-train-models.ipynb
fix link
2018-10-22 10:04:46 -05:00
hning86
7a6fb8067f auto updated from HaiGPU 2018-10-22 01:50:11 -04:00
hning86
bb439ab2fd removed empty folder 2018-10-22 01:41:05 -04:00
hning86
ea3abdde4f auto updated from HaiGPU 2018-10-22 01:39:38 -04:00
Hai Ning
2e4eb8785c Update pr.md 2018-10-18 15:29:26 -04:00
Hai Ning
bfccb07dae Update pr.md 2018-10-18 15:27:36 -04:00
Hai Ning
94cd37e9fb Update README.md 2018-10-18 14:49:28 -04:00
Hai Ning
cdeb4dddab Update README.md 2018-10-18 14:47:44 -04:00
Hai Ning
e12637098a Update README.md 2018-10-18 14:47:19 -04:00
Hai Ning
d5f8811f4f YT cover 2018-10-18 14:46:08 -04:00
Hai Ning
92d36a2db4 Delete ytimg_png.PNG 2018-10-18 14:45:53 -04:00
Hai Ning
c5c76e8187 Update pr.md 2018-10-18 14:45:12 -04:00
Hai Ning
833d1d0f4e Update pr.md 2018-10-18 14:44:59 -04:00
Hai Ning
dd0c0264a2 Update README.md 2018-10-18 14:43:15 -04:00
Hai Ning
52368bad81 Update README.md 2018-10-18 14:42:48 -04:00
Hai Ning
604f6c18be Update README.md 2018-10-18 14:42:23 -04:00
Hai Ning
829bc297f2 Update README.md 2018-10-18 14:41:45 -04:00
Hai Ning
9e5101ea8c Update README.md 2018-10-18 14:41:34 -04:00
Hai Ning
37e96f2ad6 youtube cover 2018-10-18 14:40:17 -04:00
Roope Astala
d0c9bb330a Merge pull request #39 from cforbe/master
Adding dataprep notebook
2018-10-18 12:39:01 -04:00
Colleen Forbes
b4c7932640 Update README.md 2018-10-17 15:44:30 -07:00
Roope Astala
8fed628390 Merge pull request #53 from rastala/master
Update automl setup
2018-10-17 17:38:28 -04:00
rastala
d940aca06d Update automl setup 2018-10-17 17:37:01 -04:00
Hai Ning
beb97b1d9f Update README.md 2018-10-17 12:00:37 -04:00
Roope Astala
d58d57ca44 Merge pull request #48 from rastala/master
Update notebooks with new version
2018-10-12 14:44:10 -04:00
Roope Astala
b3cc1b61a2 more updates 2018-10-12 14:43:18 -04:00
Roope Astala
a4792d95ac Update notebooks 2018-10-12 14:39:33 -04:00
Hai Ning
216aa8b6a1 Update pr.md 2018-10-12 11:37:33 -04:00
Hai Ning
9814955b37 Update pr.md 2018-10-12 11:34:51 -04:00
Hai Ning
c96e9fdd5a Update pr.md 2018-10-12 11:33:25 -04:00
Hai Ning
47bd530c6b Update pr.md 2018-10-12 11:32:24 -04:00
Hai Ning
7e53333af6 Update pr.md 2018-10-12 11:02:06 -04:00
Hai Ning
0888050389 Update pr.md 2018-10-12 10:04:52 -04:00
Hai Ning
fb567152a4 pr 2018-10-11 23:56:26 -04:00
Josée Martens
6d50401af4 Update README.md 2018-10-11 12:03:27 -05:00
Josée Martens
b1bde7328b Update README.md 2018-10-11 10:58:49 -05:00
Josée Martens
7fc6b29de8 Update README.md 2018-10-11 10:58:02 -05:00
Roope Astala
cff9606bf9 Merge pull request #47 from rastala/master
Update project-brainwave/project-brainwave-quickstart.ipynb
2018-10-09 17:00:34 -04:00
Roope Astala
532799a22c Update project-brainwave/project-brainwave-quickstart.ipynb 2018-10-09 16:58:22 -04:00
Roope Astala
90454d5a32 Merge pull request #42 from mx-iao/master
Add readme for training/ notebooks
2018-10-09 15:24:20 -04:00
mx-iao
076b206515 Update readme.md 2018-10-09 12:22:11 -07:00
Roope Astala
b8b660e5a8 Merge pull request #46 from rastala/master
Update automl examples
2018-10-09 14:38:33 -04:00
Roope Astala
6005c0987d Update automl examples 2018-10-09 14:35:45 -04:00
mx-iao
34eec6abc2 Create readme.md 2018-10-08 12:08:47 -07:00
Hai Ning
208c36b903 Update README.md 2018-10-06 10:49:50 -04:00
Hai Ning
80e8a5e323 Update 04.train-on-remote-vm.ipynb 2018-10-04 11:47:48 -04:00
Colleen
e7e9923cfb updating README.md 2018-10-03 16:46:51 -07:00
Roope Astala
989511c581 Merge pull request #40 from rastala/master
Update automl readme
2018-10-03 14:24:20 -04:00
Roope Astala
d5c247b005 Update automl readme 2018-10-03 14:23:49 -04:00
Colleen
b5482fcd4b Adding dataprep notebook 2018-10-03 09:58:55 -07:00
Roope Astala
2bdd131b0c Merge pull request #37 from rastala/master
update pipeline notebook
2018-10-02 16:13:00 -04:00
Roope Astala
2c391a4486 update pipeline notebook 2018-10-02 16:12:22 -04:00
Roope Astala
87b6114156 Merge pull request #36 from rastala/master
Update to automl notebooks
2018-10-02 14:34:22 -04:00
Roope Astala
9b701ebaeb updates to automl tutorial 2018-10-02 14:33:28 -04:00
Roope Astala
758b0ee808 updates to automl notebooks 2018-10-02 14:32:18 -04:00
Roope Astala
eeb4d92d7c Merge pull request #34 from rastala/master
update notebooks for new version
2018-10-01 13:48:51 -04:00
Roope Astala
b4df74c72e adding nb 13 for app insights 2018-10-01 13:47:58 -04:00
Roope Astala
231c1062a8 update notebooks for new version 2018-10-01 13:45:50 -04:00
Sheri Gilley
92be6bfd19 Merge pull request #30 from Azure/sdgilley-fix-links
fix links
2018-09-28 17:25:04 -05:00
Sheri Gilley
b0b0756aed fix links 2018-09-28 17:24:37 -05:00
Roope Astala
ff19151d0a Merge pull request #29 from rastala/master
onnx update
2018-09-28 16:42:09 -04:00
rastala
933c1ffc4e onnx update 2018-09-28 16:41:21 -04:00
Roope Astala
f75faaa31e Merge pull request #28 from rastala/master
mitigation to image creation issue
2018-09-27 12:52:47 -04:00
Roope Astala
ae8874ad32 mitigation to image creation issue 2018-09-27 12:50:38 -04:00
Hai Ning
6c3abe2d03 Update train.py 2018-09-27 11:30:22 -04:00
Hai Ning
4627080ff4 Update 04.train-on-remote-vm.ipynb 2018-09-27 11:30:01 -04:00
Sheri Gilley
69af6e36fe Merge pull request #23 from Azure/sdg-update
update readme
2018-09-26 17:57:57 -05:00
Hai Ning
e27ab9a58e Update 04.train-on-remote-vm.ipynb 2018-09-26 14:02:27 -04:00
Hai Ning
c85e7e52af Update 04.train-on-remote-vm.ipynb 2018-09-26 14:01:39 -04:00
Hai Ning
5598e07729 Update 05.train-in-spark.ipynb 2018-09-26 14:00:38 -04:00
Roope Astala
d9b62ad651 Merge pull request #26 from rastala/master
Updating Azure Databricks examples
2018-09-26 09:34:40 -04:00
Roope Astala
8aa287dadf Updating Azure Databricks examples 2018-09-26 09:32:24 -04:00
Roope Astala
9ab092a4d0 Merge pull request #21 from sitomani/master
Fixed an error on Data Exploration chapter
2018-09-25 19:14:30 -04:00
Sheri Gilley
1a1a81621f remove file 2018-09-25 16:24:00 -05:00
Sheri Gilley
d93daa3f38 create link for nb 2018-09-25 16:21:14 -05:00
Sheri Gilley
2fb910b0e0 fix link 2018-09-25 16:17:34 -05:00
Sheri Gilley
2879e00884 update 2018-09-25 16:09:01 -05:00
Sheri Gilley
b574bfd3cf updates 2018-09-25 16:03:57 -05:00
Sheri Gilley
6a3b814394 Update README.md 2018-09-25 15:07:24 -05:00
Sheri Gilley
1009ffab36 Update README.md 2018-09-25 15:06:20 -05:00
Sheri Gilley
995fb1ac8c Update README.md 2018-09-25 14:46:25 -05:00
Sheri Gilley
e418e4fbb2 Update README.md 2018-09-25 14:46:01 -05:00
Sheri Gilley
cdbfa203e1 Update README.md 2018-09-25 14:36:26 -05:00
Roope Astala
a9a9635e72 Merge pull request #22 from rastala/master
Update 00, 01 and 10 notebooks
2018-09-25 15:02:28 -04:00
Roope Astala
b568dc364f Update 00, 01 and 10 notebooks 2018-09-25 15:01:08 -04:00
Aleksi Sitomaniemi
59bdd5a858 Fixed an error on Data Exploration chapter
The sample code comment discusses using a portion of the full dataset to make training faster, suggesting that the following code takes 100 first samples from the dataset. However, the code in repository actually leaves the first 100 items out, and picks the rest of the full set into the evaluated X_digits, Y_digits matrices.
2018-09-25 12:28:51 +03:00
86 changed files with 11251 additions and 3929 deletions

View File

@@ -14,32 +14,39 @@
"metadata": {},
"source": [
"# 00. Installation and configuration\n",
"This notebook configures your library of notebooks to connect to an Azure Machine Learning Workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook to use an existing workspace or create a new workspace.\n",
"\n",
"## Prerequisites:\n",
"## What is an Azure ML Workspace and why do I need one?\n",
"\n",
"### 1. Install Azure ML SDK\n",
"Follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment).\n",
"An AML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Access Azure Subscription\n",
"\n",
"### 2. Install some additional packages\n",
"This Notebook requires some additional libraries. In the conda environment, run below commands: \n",
"```shell\n",
"In order to create an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com).\n",
"\n",
"### 2. If you're running on your own local environment, install Azure ML SDK and other libraries\n",
"\n",
"If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n",
"\n",
"Also install following libraries to your environment. Many of the example notebooks depend on them\n",
"\n",
"```\n",
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
"```\n",
"\n",
"### 3. Make sure your subscription is registered to use ACI.\n",
"This Notebook makes use of Azure Container Instance (ACI). You need to ensure your subscription has been registered to use ACI in order be able to deploy a dev/test web service.\n",
"```shell\n",
"# check to see if ACI is already registered\n",
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
"\n",
"# if ACI is not registered, run this command.\n",
"# note you need to be the subscription owner in order to execute this command successfully.\n",
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
"```\n",
"\n",
"In this example you will optionally create an Azure Machine Learning Workspace and initialize your notebook directory to easily use this workspace. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
"\n",
"This notebook also contains optional cells to install and update the require Azure Machine Learning libraries."
"Once installation is complete, check the Azure ML SDK version:"
]
},
{
@@ -52,7 +59,6 @@
},
"outputs": [],
"source": [
"# Check core SDK version number for debugging purposes\n",
"import azureml.core\n",
"\n",
"print(\"SDK Version:\", azureml.core.VERSION)"
@@ -62,35 +68,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize an Azure ML Workspace\n",
"### What is an Azure ML Workspace and why do I need one?\n",
"### 3. Make sure your subscription is registered to use ACI\n",
"Azure Machine Learning makes use of Azure Container Instance (ACI). You need to ensure your subscription has been registered to use ACI in order be able to deploy a dev/test web service. If you have run through the quickstart experience you have already performed this step. Otherwise you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands.\n",
"\n",
"An AML Workspace is an Azure resource that organaizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
"```shell\n",
"# check to see if ACI is already registered\n",
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
"\n",
"### What do I need\n",
"\n",
"In order to use an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com). Inside your subscription, you will need access to a _resource group_, which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com)\n",
"\n",
"You can also easily create a new resource group using azure-cli.\n",
"\n",
"```sh\n",
"(myenv) $ az group create -n my_resource_group -l eastus2\n",
"```\n",
"\n",
"To create or access an Azure ML Workspace, you will need to import the AML library and the following information:\n",
"* A name for your workspace\n",
"* Your subscription id\n",
"* The resource group name\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. BatchAI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
"# if ACI is not registered, run this command.\n",
"# note you need to be the subscription owner in order to execute this command successfully.\n",
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Supported Azure Regions\n",
"Please specify the Azure subscription Id, resource group name, workspace name, and the region in which you want to create the workspace, for example \"eastus2\". "
"## Set up your Azure Machine Learning workspace\n",
"\n",
"### Option 1: You have workspace already\n",
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
"\n",
"If you have a workspace created another way, [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#create-workspace-configuration-file) describe how to get your subscription and workspace information.\n",
"\n",
"If this cell succeeds, you're done configuring this library! Otherwise continue to follow the instructions in the rest of the notebook."
]
},
{
@@ -102,8 +104,65 @@
"import os\n",
"\n",
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<my-subscription-id>\")\n",
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<my-rg>\")\n",
"workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"<my-workspace>\")\n",
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<my-resource-group>\")\n",
"workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"<my-workspace-name>\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"try:\n",
" ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n",
" ws.write_config()\n",
" print('Workspace configuration succeeded. You are all set!')\n",
"except:\n",
" print('Workspace not found. Run the cells below.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 2: You don't have workspace yet\n",
"\n",
"\n",
"#### Requirements\n",
"\n",
"Inside your Azure subscription, you will need access to a _resource group_, which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
"\n",
"To create or access an Azure ML Workspace, you will need to import the AML library and the following information:\n",
"* A name for your workspace\n",
"* Your subscription id\n",
"* The resource group name\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Supported Azure Regions\n",
"Specify a region where your workspace will be located from the list of [Azure Machine Learning regions](https://linktoregions)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<my-subscription-id>\")\n",
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"my-aml-resource-group\")\n",
"workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"my-first-workspace\")\n",
"\n",
"workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"eastus2\")"
]
},
@@ -111,11 +170,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating a workspace\n",
"If you already have access to an AML Workspace you want to use, you can skip this cell. Otherwise, this cell will create an AML workspace for you in a subscription provided you have the correct permissions.\n",
"#### Create the workspace\n",
"This cell will create an AML workspace for you in a subscription provided you have the correct permissions.\n",
"\n",
"This will fail when:\n",
"1. You do not have permission to create a workspace in the resource group\n",
"2. You do not have permission to create a resource group if it's non-existing.\n",
"2. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
"\n",
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
@@ -138,33 +198,9 @@
" subscription_id = subscription_id,\n",
" resource_group = resource_group, \n",
" location = workspace_region,\n",
" create_resource_group = True,\n",
" exist_ok = True)\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuring your local environment\n",
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"ws = Workspace(workspace_name = workspace_name,\n",
" subscription_id = subscription_id,\n",
" resource_group = resource_group)\n",
"\n",
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
"ws.get_details()\n",
"ws.write_config()"
]
},
@@ -172,22 +208,64 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then load the workspace from this config file from any notebook in the current directory."
"## Create compute resources for your training experiments\n",
"\n",
"Many of the subsequent examples use Azure Machine Learning managed compute (AmlCompute) to train models at scale. To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"metadata": {},
"outputs": [],
"source": [
"# load workspace configuratio from ./aml_config/config.json file.\n",
"my_workspace = Workspace.from_config()\n",
"my_workspace.get_details()"
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"gpucluster\"\n",
"\n",
"# Check if cluster exists already\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
"gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
@@ -195,11 +273,23 @@
"metadata": {},
"source": [
"## Success!\n",
"Great, you are ready to move on to the rest of the sample notebooks."
"Great, you are ready to move on to the rest of the sample notebooks. A good place to start is the [01.train-model tutorial](./tutorials/01.train-model.ipynb) to learn how to train and then deploy an image classification model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -215,7 +305,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
"version": "3.6.2"
}
},
"nbformat": 4,

View File

@@ -27,7 +27,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"1. Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't. \n",
"1. Make sure you go through the [00. Installation and Configuration](../../00.configuration.ipynb) Notebook first if you haven't. \n",
"\n",
"2. Install following pre-requisite libraries to your conda environment and restart notebook.\n",
"```shell\n",
@@ -457,7 +457,8 @@
},
"outputs": [],
"source": [
"models = ws.models(name='best_model')\n",
"from azureml.core.model import Model\n",
"models = Model.list(workspace=ws, name='best_model')\n",
"for m in models:\n",
" print(m.name, m.version)"
]
@@ -524,8 +525,7 @@
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies()\n",
"myenv.add_conda_package(\"scikit-learn\")\n",
"myenv = CondaDependencies.create(conda_packages=[\"scikit-learn\"])\n",
"print(myenv.serialize_to_string())\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
@@ -679,7 +679,7 @@
"# score the entire test set.\n",
"test_samples = json.dumps({'data': X_test.tolist()})\n",
"\n",
"result = json.loads(service.run(input_data = test_samples))['result']\n",
"result = service.run(input_data = test_samples)\n",
"residual = result - y_test"
]
},
@@ -777,16 +777,14 @@
"%%time\n",
"service.delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -802,7 +800,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
"version": "3.6.6"
}
},
"nbformat": 4,

View File

@@ -21,7 +21,9 @@ def run(raw_data):
data = json.loads(raw_data)['data']
data = np.array(data)
result = model.predict(data)
return json.dumps({"result": result.tolist()})
# you can return any data type as long as it is JSON-serializable
return result.tolist()
except Exception as e:
result = str(e)
return json.dumps({"error": result})
return result

View File

@@ -218,7 +218,7 @@
"run_config_system_managed = RunConfiguration()\n",
"\n",
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
"run_config_system_managed.prepare_environment = True\n",
"run_config_system_managed.auto_prepare_environment = True\n",
"\n",
"# Specify conda dependencies with scikit-learn\n",
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
@@ -291,19 +291,17 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"run_config_docker = RunConfiguration()\n",
"\n",
"run_config_docker.environment.python.user_managed_dependencies = False\n",
"run_config_docker.prepare_environment = True\n",
"run_config_docker.auto_prepare_environment = True\n",
"run_config_docker.environment.docker.enabled = True\n",
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"# Specify conda dependencies with scikit-learn\n",
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
"run_config_docker.environment.python.conda_dependencies = cd"
"run_config_docker.environment.python.conda_dependencies = cd\n",
"\n",
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)"
]
},
{
@@ -322,8 +320,17 @@
"metadata": {},
"outputs": [],
"source": [
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)\n",
"run = exp.submit(src)"
"import subprocess\n",
"\n",
"# Check if Docker is installed and Linux containers are enables\n",
"if subprocess.run(\"docker -v\", shell=True) == 0:\n",
" out = subprocess.check_output(\"docker system info\", shell=True, encoding=\"ascii\").split(\"\\n\")\n",
" if not \"OSType: linux\" in out:\n",
" print(\"Switch Docker engine to use Linux containers.\")\n",
" else:\n",
" run = exp.submit(src)\n",
"else:\n",
" print(\"Docker engine not installed.\")"
]
},
{
@@ -442,6 +449,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -15,7 +15,7 @@ os.makedirs('./outputs', exist_ok=True)
X, y = load_diabetes(return_X_y=True)
run = Run.get_submitted_run()
run = Run.get_context()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,

View File

@@ -261,6 +261,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -14,7 +14,7 @@ os.makedirs('./outputs', exist_ok=True)
X, y = load_diabetes(return_X_y=True)
run = Run.get_submitted_run()
run = Run.get_context()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,

View File

@@ -13,12 +13,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 04. Train in a remote VM (MLC managed DSVM)\n",
"# 04. Train in a remote Linux VM\n",
"* Create Workspace\n",
"* Create Project\n",
"* Create `train.py` file\n",
"* Create DSVM as Machine Learning Compute (MLC) resource\n",
"* Configure & execute a run in a conda environment in the default miniconda Docker container on DSVM"
"* Create (or attach) DSVM as compute resource.\n",
"* Upoad data files into default datastore\n",
"* Configure & execute a run in a few different ways\n",
" - Use system-built conda\n",
" - Use existing Python environment\n",
" - Use Docker \n",
"* Find the best model in the run"
]
},
{
@@ -80,7 +84,6 @@
"experiment_name = 'train-on-remote-vm'\n",
"\n",
"from azureml.core import Experiment\n",
"\n",
"exp = Experiment(workspace=ws, name=experiment_name)"
]
},
@@ -88,9 +91,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## View `train.py`\n",
"\n",
"For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file."
"Let's also create a local folder to hold the training script."
]
},
{
@@ -99,7 +100,87 @@
"metadata": {},
"outputs": [],
"source": [
"with open('./train.py', 'r') as training_script:\n",
"import os\n",
"script_folder = './vm-run'\n",
"os.makedirs(script_folder, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload data files into datastore\n",
"Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get the default datastore\n",
"ds = ws.get_default_datastore()\n",
"print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load diabetes data from `scikit-learn` and save it as 2 local files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_diabetes\n",
"import numpy as np\n",
"\n",
"training_data = load_diabetes()\n",
"np.save(file='./features.npy', arr=training_data['data'])\n",
"np.save(file='./labels.npy', arr=training_data['target'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's upload the 2 files into the default datastore under a path named `diabetes`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.upload_files(['./features.npy', './labels.npy'], target_path='diabetes', overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View `train.py`\n",
"\n",
"For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# copy train.py into the script folder\n",
"import shutil\n",
"shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n",
"\n",
"with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n",
" print(training_script.read())"
]
},
@@ -109,9 +190,19 @@
"source": [
"## Create Linux DSVM as a compute target\n",
"\n",
"**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node AmlCompute. The DSVMCompute class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs and the commands below will spin a Linux VM only.\n",
"\n",
"```shell\n",
"# create a DSVM in your resource group\n",
"# note you need to be at least a contributor to the resource group in order to execute this command successfully.\n",
"(myenv) $ az vm create --resource-group <resource_group_name> --name <some_vm_name> --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username <username> --admin-password <password> --generate-ssh-keys --authentication-type password\n",
"```\n",
"\n",
"**Note**: You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal\n",
"\n",
"**Note**: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
" \n",
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can append the port number to the address like the example below."
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object."
]
},
{
@@ -139,7 +230,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Attach an existing Linux DSVM as a compute target\n"
"## Attach an existing Linux DSVM\n",
"You can also attach an existing Linux VM as a compute target. The default port is 22."
]
},
{
@@ -151,15 +243,220 @@
"'''\n",
"from azureml.core.compute import RemoteCompute \n",
"# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n",
" dsvm_compute = RemoteCompute.attach(ws,name=\"attach-from-sdk6\",username=<username>,address=<ipaddress>,ssh_port=22,password=<password>)\n",
"'''"
"attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n",
" name=\"attached_vm\",\n",
" username='<usename>',\n",
" address='<ip_adress_or_fqdn>',\n",
" ssh_port=22,\n",
" password='<password>')\n",
"attached_dsvm_compute.wait_for_completion(show_output=True)\n",
"'''\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure & Run"
"## Configure & Run\n",
"First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the copmute target."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='diabetes', \n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can try a few different ways to run the training script in the VM."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conda run\n",
"You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute.name\n",
"\n",
"# set the data reference of the run configuration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"# specify CondaDependencies obj\n",
"conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=script_folder, \n",
" script='train.py', \n",
" run_config=conda_run_config, \n",
" # pass the datastore reference as a parameter to the training script\n",
" arguments=['--data-folder', str(ds.as_download())] \n",
" ) \n",
"run = exp.submit(config=src)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show the run object. You can navigate to the Azure portal to see detailed information about the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Native VM run\n",
"You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a new RunConfig object\n",
"vm_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"vm_run_config.target = dsvm_compute.name\n",
"\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"# Let system know that you will configure the Python environment yourself.\n",
"vm_run_config.environment.python.user_managed_dependencies = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"src = ScriptRunConfig(source_directory=script_folder, \n",
" script='train.py', \n",
" run_config=vm_run_config,\n",
" # pass the datastore reference as a parameter to the training script\n",
" arguments=['--data-folder', str(ds.as_download())])\n",
"run = exp.submit(config=src)\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to create another script `train2.py` that doesn't have azureml dependencies, and submit it instead."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $script_folder/train2.py\n",
"\n",
"print('####################################')\n",
"print('Hello World (without Azure ML SDK)!')\n",
"print('####################################')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's try again. And this time it should work fine."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"src = ScriptRunConfig(source_directory=script_folder, \n",
" script='train2.py', \n",
" run_config=vm_run_config)\n",
"run = exp.submit(config=src)\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note even in this case you get a run record with some basic statistics."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
@@ -167,7 +464,7 @@
"metadata": {},
"source": [
"### Configure a Docker run with new conda environment on the VM\n",
"You can execute in a Docker container in the VM. If you choose this route, you don't need to install anything on the VM yourself. Azure ML execution service will take care of it for you."
"You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs."
]
},
{
@@ -181,26 +478,23 @@
"\n",
"\n",
"# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n",
"run_config = RunConfiguration(framework = \"python\")\n",
"docker_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"run_config.target = compute_target_name\n",
"docker_run_config.target = dsvm_compute.name\n",
"\n",
"# Use Docker in the remote VM\n",
"run_config.environment.docker.enabled = True\n",
"docker_run_config.environment.docker.enabled = True\n",
"\n",
"# Use CPU base image from DockerHub\n",
"run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"print('Base Docker image is:', run_config.environment.docker.base_image)\n",
"docker_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"print('Base Docker image is:', docker_run_config.environment.docker.base_image)\n",
"\n",
"# Ask system to provision a new one based on the conda_dependencies.yml file\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# Prepare the Docker and conda environment automatically when executingfor the first time.\n",
"run_config.prepare_environment = True\n",
"# set the data reference of the run coonfiguration\n",
"docker_run_config.data_references = {ds.name: dr}\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
"docker_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
]
},
{
@@ -217,11 +511,21 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory = '.', script = 'train.py', run_config = run_config)\n",
"run = exp.submit(src)"
"src = ScriptRunConfig(source_directory=script_folder, \n",
" script='train.py', \n",
" run_config=docker_run_config,\n",
" # pass the datastore reference as a parameter to the training script\n",
" arguments=['--data-folder', str(ds.as_download())])\n",
"run = exp.submit(config=src)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
@@ -241,19 +545,17 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output = True)"
"### Find the best model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Find the best run"
"Now we have tried various execution modes, we can find the best model from the last run."
]
},
{
@@ -273,10 +575,13 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"# find the index where MSE is the smallest\n",
"indices = list(range(0, len(metrics['mse'])))\n",
"min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n",
"\n",
"print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
" min(metrics['mse']), \n",
" metrics['alpha'][np.argmin(metrics['mse'])]\n",
" metrics['mse'][min_mse_index], \n",
" metrics['alpha'][min_mse_index]\n",
"))"
]
},
@@ -298,6 +603,11 @@
}
],
"metadata": {
"authors": [
{
"name": "haining"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -313,7 +623,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.2"
}
},
"nbformat": 4,

View File

@@ -2,7 +2,8 @@
# Licensed under the MIT license.
import os
from sklearn.datasets import load_diabetes
import argparse
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
@@ -12,10 +13,18 @@ from sklearn.externals import joblib
import numpy as np
os.makedirs('./outputs', exist_ok=True)
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str,
dest='data_folder', help='data folder')
args = parser.parse_args()
X, y = load_diabetes(return_X_y=True)
print('Data folder is at:', args.data_folder)
print('List all files: ', os.listdir(args.data_folder))
run = Run.get_submitted_run()
X = np.load(os.path.join(args.data_folder, 'features.npy'))
y = np.load(os.path.join(args.data_folder, 'labels.npy'))
run = Run.get_context()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0)

View File

@@ -77,7 +77,6 @@
"experiment_name = 'train-on-spark'\n",
"\n",
"from azureml.core import Experiment\n",
"\n",
"exp = Experiment(workspace=ws, name=experiment_name)"
]
},
@@ -107,13 +106,95 @@
"## Configure & Run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure an ACI run\n",
"Before you try running on an actual Spark cluster, you can use a Docker image with Spark already baked in, and run it in ACI(Azure Container Registry)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# use pyspark framework\n",
"aci_run_config = RunConfiguration(framework=\"pyspark\")\n",
"\n",
"# use ACI to run the Spark job\n",
"aci_run_config.target = 'containerinstance'\n",
"aci_run_config.container_instance.region = 'eastus2'\n",
"aci_run_config.container_instance.cpu_cores = 1\n",
"aci_run_config.container_instance.memory_gb = 2\n",
"\n",
"# specify base Docker image to use\n",
"aci_run_config.environment.docker.enabled = True\n",
"aci_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_MMLSPARK_CPU_IMAGE\n",
"\n",
"# specify CondaDependencies\n",
"cd = CondaDependencies()\n",
"cd.add_conda_package('numpy')\n",
"aci_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit script to ACI to run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
" script= 'train-spark.py',\n",
" run_config = aci_run_config)\n",
"run = exp.submit(script_run_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note** you can also create a new VM, or attach an existing VM, and use Docker-based execution to run the Spark job. Please see the `04.train-in-vm` for example on how to configure and run in Docker mode in a VM."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Attach an HDI cluster\n",
"To use HDI commpute target:\n",
" 1. Create an Spark for HDI cluster in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS.\n",
"Now we can use a real Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n",
" 1. Create a Spark for HDI cluster in Azure. Here are some [quick instructions](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql). Make sure you use the Ubuntu flavor, NOT CentOS.\n",
" 2. Enter the IP address, username and password below"
]
},
@@ -124,22 +205,22 @@
"outputs": [],
"source": [
"from azureml.core.compute import HDInsightCompute\n",
"from azureml.exceptions import ComputeTargetException\n",
"\n",
"try:\n",
" # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n",
" hdi_compute_new = HDInsightCompute.attach(ws, \n",
" name=\"hdi-attach\", \n",
" address=\"hdi-ignite-demo-ssh.azurehdinsight.net\", \n",
" hdi_compute = HDInsightCompute.attach(workspace=ws, \n",
" name=\"myhdi\", \n",
" address=\"<myhdi-ssh>.azurehdinsight.net\", \n",
" ssh_port=22, \n",
" username='<username>', \n",
" password='<password>')\n",
" username='<ssh-username>', \n",
" password='<ssh-pwd>')\n",
"\n",
"except UserErrorException as e:\n",
"except ComputeTargetException as e:\n",
" print(\"Caught = {}\".format(e.message))\n",
" print(\"Compute config already attached.\")\n",
" \n",
" \n",
"hdi_compute_new.wait_for_completion(show_output=True)"
"hdi_compute.wait_for_completion(show_output=True)"
]
},
{
@@ -159,28 +240,16 @@
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"\n",
"# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n",
"run_config = RunConfiguration(framework = \"python\")\n",
"# use pyspark framework\n",
"hdi_run_config = RunConfiguration(framework=\"pyspark\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"run_config.target = hdi_compute.name\n",
"# Set compute target to the HDI cluster\n",
"hdi_run_config.target = hdi_compute.name\n",
"\n",
"# Use Docker in the remote VM\n",
"# run_config.environment.docker.enabled = True\n",
"\n",
"# Use CPU base image from DockerHub\n",
"# run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"# print('Base Docker image is:', run_config.environment.docker.base_image)\n",
"\n",
"# Ask system to provision a new one based on the conda_dependencies.yml file\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# Prepare the Docker and conda environment automatically when executingfor the first time.\n",
"# run_config.prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n",
"# run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
"# load the runconfig object from the \"myhdi.runconfig\" file generated by the attach operaton above."
"# specify CondaDependencies object to ask system installing numpy\n",
"cd = CondaDependencies()\n",
"cd.add_conda_package('numpy')\n",
"hdi_run_config.environment.python.conda_dependencies = cd"
]
},
{
@@ -196,10 +265,12 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
" script= 'train-spark.py',\n",
" run_config = run_config)\n",
"run = experiment.submit(script_run_config)"
" run_config = hdi_run_config)\n",
"run = exp.submit(config=script_run_config)"
]
},
{
@@ -218,7 +289,9 @@
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output = True)"
"# get all metris logged in the run\n",
"metrics = run.get_metrics()\n",
"print(metrics)"
]
},
{
@@ -226,14 +299,15 @@
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get all metris logged in the run\n",
"metrics = run.get_metrics()\n",
"print(metrics)"
]
"source": []
}
],
"metadata": {
"authors": [
{
"name": "aashishb"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -249,7 +323,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
}
},
"nbformat": 4,

View File

@@ -18,7 +18,7 @@ from pyspark.sql.types import DoubleType, IntegerType, StringType
from azureml.core.run import Run
# initialize logger
run = Run.get_submitted_run()
run = Run.get_context()
# start Spark session
spark = pyspark.sql.SparkSession.builder.appName('Iris').getOrCreate()

View File

@@ -0,0 +1,328 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 06. Logging APIs\n",
"This notebook showcase various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"Make sure you go through the [00. Installation and Configuration](../../00.configuration.ipynb) Notebook first if you haven't. Also make sure you have tqdm and matplotlib installed in the current kernel.\n",
"\n",
"```\n",
"(myenv) $ conda install -y tqdm matplotlib\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validate Azure ML SDK installation and get version number for debugging purposes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"install"
]
},
"outputs": [],
"source": [
"from azureml.core import Experiment, Run, Workspace\n",
"import azureml.core\n",
"import numpy as np\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set experiment\n",
"Create a new experiment (or get the one with such name)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name='logging-api-test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Log metrics\n",
"We will start a run, and use the various logging APIs to record different types of metrics during the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from tqdm import tqdm\n",
"\n",
"# start logging for the run\n",
"run = exp.start_logging()\n",
"\n",
"# log a string value\n",
"run.log(name='Name', value='Logging API run')\n",
"\n",
"# log a numerical value\n",
"run.log(name='Magic Number', value=42)\n",
"\n",
"# Log a list of values. Note this will generate a single-variable line chart.\n",
"run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89])\n",
"\n",
"# create a dictionary to hold a table of values\n",
"sines = {}\n",
"sines['angle'] = []\n",
"sines['sine'] = []\n",
"\n",
"for i in tqdm(range(-10, 10)):\n",
" # log a metric value repeatedly, this will generate a single-variable line chart.\n",
" run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n",
" angle = i / 2.0\n",
" \n",
" # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n",
" run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n",
" \n",
" sines['angle'].append(angle)\n",
" sines['sine'].append(np.sin(angle))\n",
"\n",
"# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n",
"run.log_table(name='Sine Wave', value=sines)\n",
"\n",
"run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Even after the run is marked completed, you can still log things."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Log an image\n",
"This is how to log a _matplotlib_ pyplot object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"angle = np.linspace(-3, 3, 50)\n",
"plt.plot(angle, np.tanh(angle), label='tanh')\n",
"plt.legend(fontsize=12)\n",
"plt.title('Hyperbolic Tangent', fontsize=16)\n",
"plt.grid(True)\n",
"\n",
"run.log_image(name='Hyperbolic Tangent', plot=plt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload a file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also upload an abitrary file. First, let's create a dummy file locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile myfile.txt\n",
"\n",
"This is a dummy file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's upload this file into the run record as a run artifact, and display the properties after the upload."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"props = run.upload_file(name='myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
"props.serialize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Examine the run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take a look at the run detail page in Azure portal. Make sure you checkout the various charts and plots generated/uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get all the metrics in that run back."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also see the files uploaded for this run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_file_names()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also download all the files locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.makedirs('files', exist_ok=True)\n",
"\n",
"for f in run.get_file_names():\n",
" dest = os.path.join('files', f.split('/')[-1])\n",
" print('Downloading file {} to {}...'.format(f, dest))\n",
" run.download_file(f, dest) "
]
}
],
"metadata": {
"authors": [
{
"name": "haining"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -129,9 +129,9 @@
},
"outputs": [],
"source": [
"regression_models = ws.models(tags=['area'])\n",
"for name, m in regression_models.items():\n",
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
"regression_models = Model.list(workspace=ws, tags=['area'])\n",
"for m in regression_models:\n",
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
]
},
{
@@ -192,9 +192,11 @@
" data = json.loads(raw_data)['data']\n",
" data = numpy.array(data)\n",
" result = model.predict(data)\n",
" # you can return any datatype as long as it is JSON-serializable\n",
" return result.tolist()\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"result\": result.tolist()})"
" error = str(e)\n",
" return error"
]
},
{
@@ -387,16 +389,14 @@
"source": [
"aci_service.delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "raymondl"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -412,7 +412,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
}
},
"nbformat": 4,

View File

@@ -122,9 +122,11 @@
" data = json.loads(raw_data)['data']\n",
" data = numpy.array(data)\n",
" result = model.predict(data)\n",
" # you can return any data type as long as it is JSON-serializable\n",
" return result.tolist()\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"result\": result.tolist()})"
" error = str(e)\n",
" return error"
]
},
{
@@ -312,6 +314,11 @@
}
],
"metadata": {
"authors": [
{
"name": "raymondl"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -327,7 +334,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
}
},
"nbformat": 4,

View File

@@ -102,7 +102,7 @@
"### b. In your init function add:\n",
"```python \n",
"global inputs_dc, prediction_d\n",
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\". \"feat4\", \"feat5\", \"Feat6\"])\n",
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n",
"prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
" \n",
"* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
@@ -156,11 +156,12 @@
" inputs_dc.collect(data) #this call is saving our input data into our blob\n",
" prediction_dc.collect(result)#this call is saving our prediction data into our blob\n",
" print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
" return json.dumps({\"result\": result.tolist()})\n",
" # you can return any data type as long as it is JSON-serializable\n",
" return result.tolist()\n",
" except Exception as e:\n",
" result = str(e)\n",
" print (result + time.strftime(\"%H:%M:%S\"))\n",
" return json.dumps({\"error\": result})"
" error = str(e)\n",
" print (error + time.strftime(\"%H:%M:%S\"))\n",
" return error"
]
},
{
@@ -286,7 +287,7 @@
" create_name= 'myaks4'\n",
" aks_target = AksCompute.attach(workspace = ws, \n",
" name = create_name, \n",
" #esource_id=resource_id)\n",
" resource_id=resource_id)\n",
" ## Wait for the operation to complete\n",
" aks_target.wait_for_provisioning(True)```"
]
@@ -424,6 +425,11 @@
}
],
"metadata": {
"authors": [
{
"name": "marthalc"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -0,0 +1,414 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Enabling App Insights for Services in Production\n",
"With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n",
"\n",
"\n",
"## What does Application Insights monitor?\n",
"It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n",
"\n",
"\n",
"## What is different compared to standard production deployment process?\n",
"If you want to enable generic App Insights for a service run:\n",
"```python\n",
"aks_service= Webservice(ws, \"aks-w-dc2\")\n",
"aks_service.update(enable_app_insights=True)```\n",
"Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n",
"\n",
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
"1. Update scoring file.\n",
"2. Update aks configuration.\n",
"3. Build new image and deploy it. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Import your dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Run\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.image import Image\n",
"from azureml.core.model import Model\n",
"\n",
"import azureml.core\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Set up your configuration and create a workspace\n",
"Follow Notebook 00 instructions to do this.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Register Model\n",
"Register an existing trained model, add descirption and tags."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Register the model\n",
"from azureml.core.model import Model\n",
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
" description = \"Ridge regression model to predict diabetes\",\n",
" workspace = ws)\n",
"\n",
"print(model.name, model.description, model.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. *Update your scoring file with custom print statements*\n",
"Here is an example:\n",
"### a. In your init function add:\n",
"```python\n",
"print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n",
"\n",
"### b. In your run function add:\n",
"```python\n",
"print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
"print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import pickle\n",
"import json\n",
"import numpy \n",
"from sklearn.externals import joblib\n",
"from sklearn.linear_model import Ridge\n",
"from azureml.core.model import Model\n",
"from azureml.monitoring import ModelDataCollector\n",
"import time\n",
"\n",
"def init():\n",
" global model\n",
" #Print statement for appinsights custom traces:\n",
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
" \n",
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
" # this call should return the path to the model.pkl file on the local disk.\n",
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
" \n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
" \n",
" global inputs_dc, prediction_dc\n",
" \n",
" # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n",
" inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n",
" \n",
" # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n",
" prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n",
" \n",
"# note you can pass in multiple rows for scoring\n",
"def run(raw_data):\n",
" global inputs_dc, prediction_dc\n",
" try:\n",
" data = json.loads(raw_data)['data']\n",
" data = numpy.array(data)\n",
" result = model.predict(data)\n",
" \n",
" #Print statement for appinsights custom traces:\n",
" print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
" \n",
" #this call is saving our input data into our blob\n",
" inputs_dc.collect(data) \n",
" #this call is saving our prediction data into our blob\n",
" prediction_dc.collect(result)\n",
" \n",
" #Print statement for appinsights custom traces:\n",
" print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
" # you can return any data type as long as it is JSON-serializable\n",
" return result.tolist()\n",
" except Exception as e:\n",
" error = str(e)\n",
" print (error + time.strftime(\"%H:%M:%S\"))\n",
" return error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. *Create myenv.yml file*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Create your new Image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"Image with ridge regression model\",\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"myimage1\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Deploy to AKS service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create AKS compute if you haven't done so (Notebook 11)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the default configuration (can also provide parameters to customize)\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'my-aks-test1' \n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you already have a cluster you can attach the service to it:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python \n",
"%%time\n",
"resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
"create_name= 'myaks4'\n",
"aks_target = AksCompute.attach(workspace = ws, \n",
" name = create_name, \n",
" #esource_id=resource_id)\n",
"## Wait for the operation to complete\n",
"aks_target.wait_for_provisioning(True)```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a. *Activate App Insights through updating AKS Webservice configuration*\n",
"In order to enable App Insights in your service you will need to update your AKS configuration file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Set the web service configuration\n",
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b. Deploy your service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service_name ='aks-w-dc3'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target\n",
" )\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Test your service "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"import json\n",
"\n",
"test_sample = json.dumps({'data': [\n",
" [1,28,13,45,54,6,57,8,8,10], \n",
" [101,9,8,37,6,45,4,3,2,41]\n",
"]})\n",
"test_sample = bytes(test_sample,encoding='utf8')\n",
"\n",
"prediction = aks_service.run(input_data=test_sample)\n",
"print(prediction)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. See your service telemetry in App Insights\n",
"1. Go to the [Azure Portal](https://portal.azure.com/)\n",
"2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n",
"3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n",
"4. Click on the top banner \"Analytics\"\n",
"5. In the \"Schema\" section select \"traces\" and run your query.\n",
"6. Voila! All your custom traces should be there."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Disable App Insights"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.update(enable_app_insights=False)"
]
}
],
"metadata": {
"authors": [
{
"name": "marthalc"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,31 +1,39 @@
# Sample notebooks for Azure Machine Learning service
For full documentation for Azure Machine Learning service, visit **https://aka.ms/aml-docs**.
# Sample Notebooks for Azure Machine Learning service
To run the notebooks in this repository use one of these methods:
## Use Azure Notebooks - Jupyter based notebooks in the Azure cloud
1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks)
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks if they are not already there.
1. Create a workspace and its configuration file (**config.json**) using [these instructions](https://aka.ms/aml-how-to-configure-environment).
1. Select `+New` in the Azure Notebook toolbar to add your **config.json** file to the imported folder.
![upload config file to notebook folder](images/additems.png)
1. Open the notebook.
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
1. Follow the instructions in the [00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace.
1. Open one of the sample notebooks.
**Make sure the Azure Notebook kernal is set to `Python 3.6`** when you open a notebook.
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook.
![set kernal to Python 3.6](images/python36.png)
![set kernel to Python 3.6](images/python36.png)
## **Use your own notebook server**
1. Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to:
* Create a workspace and its configuration file (**config.json**).
* Configure your notebook server.
Video walkthrough:
[![get started video](images/yt_cover.png)](https://youtu.be/VIsXeTuW3FU)
1. Setup a Jupyter Notebook server and [install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python).
1. Clone [this repository](https://aka.ms/aml-notebooks).
1. Add your **config.json** file to the cloned folder
1. You may need to install other packages for specific notebooks
1. You may need to install other packages for specific notebook.
- For example, to run the Azure Machine Learning Data Prep notebooks, install the extra dataprep SDK:
```
pip install --upgrade azureml-dataprep
```
1. Start your notebook server.
1. Open the notebook you want to run.
1. Follow the instructions in the [00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace.
1. Open one of the sample notebooks.
> Note: **Looking for automated machine learning samples?**
> For your convenience, you can use an installation script instead of the steps below for the automated ML notebooks. Go to the [automl folder README](automl/README.md) and follow the instructions. The script installs all packages needed for notebooks in that folder.

View File

@@ -13,53 +13,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 00. configuration\n",
"# AutoML 00. Configuration\n",
"\n",
"In this example you will create an Azure Machine Learning Workspace and initialize your notebook directory to easily use this workspace. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
"In this example you will create an Azure Machine Learning `Workspace` object and initialize your notebook directory to easily reload this object from a configuration file. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
"\n",
"\n",
"## Prerequisites:\n",
"\n",
"Before running this notebook, run the automl_setup script described in README.md.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to your Azure Subscription\n",
"\n",
"In order to use an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com).\n",
"\n",
"First login to azure and follow prompts to authenticate. Then check that your subscription is correct"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!az login"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!az account show"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have multiple subscriptions and need to change the active one, you can use a command\n",
"```shell\n",
"az account set -s <subscription-id>\n",
"```"
"Before running this notebook, run the `automl_setup` script described in README.md.\n"
]
},
{
@@ -68,27 +29,20 @@
"source": [
"### Register Machine Learning Services Resource Provider\n",
"\n",
"This step is required to use the Azure ML services backing the SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# register the new RP\n",
"!az provider register -n Microsoft.MachineLearningServices\n",
"\n",
"# check the registration status\n",
"!az provider show -n Microsoft.MachineLearningServices"
"Microsoft.MachineLearningServices only needs to be registed once in the subscription.\n",
"To register it:\n",
"1. Start the Azure portal.\n",
"2. Select your `All services` and then `Subscription`.\n",
"3. Select the subscription that you want to use.\n",
"4. Click on `Resource providers`\n",
"3. Click the `Register` link next to Microsoft.MachineLearningServices"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Check core SDK version number for validate your installation and for debugging purposes"
"### Check the Azure ML Core SDK Version to Validate Your Installation"
]
},
{
@@ -107,17 +61,17 @@
"metadata": {},
"source": [
"## Initialize an Azure ML Workspace\n",
"### What is an Azure ML Workspace and why do I need one?\n",
"### What is an Azure ML Workspace and Why Do I Need One?\n",
"\n",
"An AML Workspace is an Azure resource that organaizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
"An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
"\n",
"\n",
"### What do I need\n",
"### What do I Need?\n",
"\n",
"To create or access an Azure ML Workspace, you will need to import the AML library and specify following information:\n",
"To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n",
"* A name for your workspace. You can choose one.\n",
"* Your subscription id. Use *id* value from *az account show* output above. \n",
"* The resource group name. Resource group organizes Azure resources and provides default region for the resources in the group. You can either specify a new one, in which case it gets created for your Workspace, or use an existing one or create a new one from [Azure portal](https://portal.azure.com)\n",
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
]
},
@@ -137,17 +91,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating a workspace\n",
"If you already have access to an AML Workspace you want to use, you can skip this cell. Otherwise, this cell will create an AML workspace for you in a subscription provided you have the correct permissions for the given `subscription_id`.\n",
"## Creating a Workspace\n",
"If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n",
"\n",
"This will fail when:\n",
"1. The workspace already exists\n",
"2. You do not have permission to create a workspace in the resource group\n",
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
"1. The workspace already exists.\n",
"2. You do not have permission to create a workspace in the resource group.\n",
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n",
"\n",
"If workspace creation fails for any reason other than already existing, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.\n",
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
"\n",
"**Note** The workspace creation can take several minutes."
"**Note:** Creation of a new workspace can take several minutes."
]
},
{
@@ -156,7 +110,7 @@
"metadata": {},
"outputs": [],
"source": [
"# import the Workspace class and check the azureml SDK version\n",
"# Import the Workspace class and check the Azure ML SDK version.\n",
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.create(name = workspace_name,\n",
@@ -170,7 +124,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuring your local environment\n",
"## Configuring Your Local Environment\n",
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
]
},
@@ -186,7 +140,7 @@
" subscription_id = subscription_id,\n",
" resource_group = resource_group)\n",
"\n",
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
"ws.write_config()"
]
},
@@ -203,7 +157,7 @@
"metadata": {},
"outputs": [],
"source": [
"# load workspace configuratio from ./aml_config/config.json file.\n",
"# Load workspace configuration from ./aml_config/config.json file.\n",
"my_workspace = Workspace.from_config()\n",
"my_workspace.get_details()"
]
@@ -212,8 +166,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a folder to host all sample projects\n",
"Lastly, create a folder where all the sample projects will be hosted."
"## Create a Folder to Host All Sample Projects\n",
"Finally, create a folder where all the sample projects will be hosted."
]
},
{
@@ -242,6 +196,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,27 +13,27 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 01: Classification with local compute\n",
"# AutoML 01: Classification with Local Compute\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment in an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"3. Training the Model using local compute\n",
"4. Exploring the results\n",
"5. Testing the fitted model\n"
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -67,9 +67,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-classification'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -92,7 +91,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -109,7 +108,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Digits Dataset"
"## Load Training Data\n",
"\n",
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
]
},
{
@@ -123,25 +124,25 @@
"digits = datasets.load_digits()\n",
"\n",
"# Exclude the first 100 rows from training so that they can be used for test.\n",
"X_digits = digits.data[100:,:]\n",
"y_digits = digits.target[100:]"
"X_train = digits.data[100:,:]\n",
"y_train = digits.target[100:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"## Configure AutoML\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data |\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
@@ -156,12 +157,12 @@
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iterations = 50,\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 25,\n",
" n_cross_validations = 3,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" path = project_folder)"
]
},
@@ -169,10 +170,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model\n",
"## Train the Models\n",
"\n",
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
@@ -184,11 +185,32 @@
"local_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optionally, you can continue an interrupted local run by calling continue_experiment without the <b>iterations</b> parameter, or run more iterations to a completed run by specifying the <b>iterations</b> parameter:"
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = local_run.continue_experiment(X = X_train, \n",
" y = y_train, \n",
" show_output = True,\n",
" iterations = 5)"
]
},
{
@@ -201,33 +223,21 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"local_run = local_run.continue_experiment(X = X_digits, \n",
" y = y_digits, \n",
" show_output = True,\n",
" iterations = 5)"
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -236,7 +246,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
@@ -246,7 +256,7 @@
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
@@ -272,7 +282,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -290,8 +300,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric\n",
"Give me the run and the model that has the smallest `log_loss`:"
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
@@ -310,8 +320,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration\n",
"Give me the run and the model from the 3rd iteration:"
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
@@ -330,7 +340,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model \n",
"### Test the Best Fitted Model\n",
"\n",
"#### Load Test Data"
]
@@ -342,8 +352,8 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
@@ -351,7 +361,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing our best pipeline\n",
"#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works."
]
},
@@ -361,11 +371,11 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"for index in np.random.choice(len(y_digits), 2):\n",
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize = (3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
@@ -376,6 +386,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,27 +13,27 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 02: Regression with local compute\n",
"# AutoML 02: Regression with Local Compute\n",
"\n",
"In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple regression problem.\n",
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"3. Training the Model using local compute\n",
"4. Exploring the results\n",
"5. Testing the fitted model"
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -67,9 +67,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the experiment\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-regression'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-regression'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -92,7 +91,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -109,7 +108,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read Data"
"### Load Training Data\n",
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
]
},
{
@@ -118,7 +118,7 @@
"metadata": {},
"outputs": [],
"source": [
"# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n",
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
@@ -135,17 +135,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"## Configure AutoML\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
@@ -158,7 +158,7 @@
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'regression',\n",
" max_time_sec = 600,\n",
" iteration_timeout_minutes = 10,\n",
" iterations = 10,\n",
" primary_metric = 'spearman_correlation',\n",
" n_cross_validations = 5,\n",
@@ -173,10 +173,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model\n",
"## Train the Models\n",
"\n",
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
@@ -201,18 +201,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the results"
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -221,7 +221,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
@@ -231,7 +231,7 @@
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
@@ -257,7 +257,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -275,8 +275,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric\n",
"Show the run and model that has the smallest `root_mean_squared_error` (which turned out to be the same as the one with largest `spearman_correlation` value):"
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
]
},
{
@@ -295,9 +295,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration\n",
"\n",
"Simply show the run and model from the 3rd iteration:"
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
@@ -316,7 +315,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model"
"### Test the Best Fitted Model"
]
},
{
@@ -351,13 +350,13 @@
"from sklearn import datasets\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"\n",
"# set up a multi-plot chart\n",
"# Set up a multi-plot chart.\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
"f.set_figheight(6)\n",
"f.set_figwidth(16)\n",
"\n",
"# plot residual values of training set\n",
"# Plot residual values of training set.\n",
"a0.axis([0, 360, -200, 200])\n",
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
@@ -365,11 +364,12 @@
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
"a0.set_xlabel('Training samples', fontsize = 12)\n",
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
"# plot histogram\n",
"\n",
"# Plot a histogram.\n",
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
"\n",
"# plot residual values of test set\n",
"# Plot residual values of test set.\n",
"a1.axis([0, 90, -200, 200])\n",
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
@@ -377,15 +377,21 @@
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
"a1.set_xlabel('Test samples', fontsize = 12)\n",
"a1.set_yticklabels([])\n",
"# plot histogram\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
"\n",
"# Plot a histogram.\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
"\n",
"plt.show()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -15,33 +15,33 @@
"source": [
"# AutoML 03: Remote Execution using DSVM (Ubuntu)\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Attaching an existing DSVM to a workspace\n",
"3. Instantiating AutoMLConfig \n",
"4. Training the Model using the DSVM\n",
"5. Exploring the results\n",
"6. Testing the fitted model\n",
"In this notebook you wiil learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Attach an existing DSVM to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using the DSVM.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Parallel** Executions for iterations\n",
"- Asyncronous tracking of progress\n",
"- **Cancelling** individual iterations or the entire run\n",
"In addition, this notebook showcases the following features:\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- specify automl settings as **kwargs**\n"
"- Specifying AutoML settings as `**kwargs`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a workspace. For AutoML you would need to create a <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -75,9 +75,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the run history container in the workspace\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-dsvm4'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-remote-dsvm4'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -100,7 +99,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -118,9 +117,7 @@
"metadata": {},
"source": [
"## Create a Remote Linux DSVM\n",
"Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
"\n",
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this."
"**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n"
]
},
{
@@ -131,12 +128,12 @@
"source": [
"from azureml.core.compute import DsvmCompute\n",
"\n",
"dsvm_name = 'mydsvm'\n",
"dsvm_name = 'mydsvma'\n",
"try:\n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
" print('found existing dsvm.')\n",
" print('Found an existing DSVM.')\n",
"except:\n",
" print('creating new dsvm.')\n",
" print('Creating a new DSVM.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output = True)"
@@ -147,7 +144,8 @@
"metadata": {},
"source": [
"## Create Get Data File\n",
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file."
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
]
},
{
@@ -175,29 +173,29 @@
"def get_data():\n",
" \n",
" digits = datasets.load_digits()\n",
" X_digits = digits.data[100:,:]\n",
" y_digits = digits.target[100:]\n",
" X_train = digits.data[100:,:]\n",
" y_train = digits.target[100:]\n",
"\n",
" return { \"X\" : X_digits, \"y\" : y_digits }"
" return { \"X\" : X_train, \"y\" : y_train }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
"## Configure AutoML <a class=\"anchor\" id=\"Instantiate-AutoML-Remote-DSVM\"></a>\n",
"\n",
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM."
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be less than the number of cores on the DSVM.|"
]
},
{
@@ -207,12 +205,12 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"max_time_sec\": 600,\n",
" \"iteration_timeout_minutes\": 10,\n",
" \"iterations\": 20,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": False,\n",
" \"concurrent_iterations\": 2,\n",
" \"max_concurrent_iterations\": 2,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"\n",
@@ -229,7 +227,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Note</b> that the first run on a new DSVM may take a several minutes to preparing the environment."
"**Note:** The first run on a new DSVM may take several minutes to prepare the environment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Models\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
"\n",
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
]
},
{
@@ -245,10 +254,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the Results\n",
"## Explore the Results\n",
"\n",
"#### Loading executed runs\n",
"In case you need to load a previously executed run given a run id please enable the below cell"
"#### Loading Executed Runs\n",
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
]
},
{
@@ -262,13 +271,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -277,7 +286,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
@@ -287,7 +296,7 @@
"metadata": {},
"outputs": [],
"source": [
"# wait till the run finishes\n",
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
@@ -297,7 +306,7 @@
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
@@ -321,9 +330,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Canceling runs\n",
"## Cancelling Runs\n",
"\n",
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
@@ -332,10 +341,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
@@ -345,7 +354,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -363,8 +372,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric\n",
"Show the run/model which has the smallest `log_loss` value."
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `log_loss` value:"
]
},
{
@@ -383,8 +392,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration\n",
"Show the run and model from the 3rd iteration."
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
@@ -403,7 +412,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n",
"### Test the Best Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n",
"\n",
"#### Load Test Data"
]
@@ -415,8 +424,8 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
@@ -424,7 +433,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing our best pipeline"
"#### Test Our Best Fitted Model"
]
},
{
@@ -433,11 +442,11 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"for index in np.random.choice(len(y_digits), 2):\n",
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
@@ -448,6 +457,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -15,33 +15,33 @@
"source": [
"# AutoML 03: Remote Execution using Batch AI\n",
"\n",
"In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple classification problem.\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [setup](setup.ipynb) before running this notebook.\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Attaching an existing Batch AI compute to a workspace\n",
"3. Instantiating AutoMLConfig \n",
"4. Training the Model using the Batch AI\n",
"5. Exploring the results\n",
"6. Testing the fitted model\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Attach an existing Batch AI compute to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using Batch AI.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Parallel** Executions for iterations\n",
"- Asyncronous tracking of progress\n",
"- **Cancelling** individual iterations or the entire run\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- specify automl settings as **kwargs**\n"
"- Specifying AutoML settings as `**kwargs`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a workspace. For AutoML you would need to create a <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -75,9 +75,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the run history container in the workspace\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-batchai'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-remote-batchai'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -100,7 +99,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -120,9 +119,9 @@
"## Create Batch AI Cluster\n",
"The cluster is created as Machine Learning Compute and will appear under your workspace.\n",
"\n",
"<b>Note</b>: The cluster creation can take over 10 minutes, please be patient.\n",
"**Note:** The creation of the Batch AI cluster can take over 10 minutes, please be patient.\n",
"\n",
"As with other Azure services, there are limits on certain resources (for eg. BatchAI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
"As with other Azure services, there are limits on certain resources (e.g. Batch AI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
@@ -131,38 +130,34 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import BatchAiCompute\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# choose a name for your cluster\n",
"batchai_cluster_name = ws.name + \"cpu\"\n",
"# Choose a name for your cluster.\n",
"batchai_cluster_name = \"cpucluster\"\n",
"\n",
"found = False\n",
"# see if this compute target already exists in the workspace\n",
"for ct in ws.compute_targets():\n",
" print(ct.name, ct.type)\n",
" if (ct.name == batchai_cluster_name and ct.type == 'BatchAI'):\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if batchai_cluster_name in cts and cts[batchai_cluster_name].type == 'BatchAI':\n",
" found = True\n",
" print('found compute target. just use it.')\n",
" compute_target = ct\n",
" break\n",
" print('Found existing compute target.')\n",
" compute_target = cts[batchai_cluster_name]\n",
" \n",
"if not found:\n",
" print('creating a new compute target...')\n",
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" autoscale_enabled = True,\n",
" cluster_min_nodes = 1, \n",
" cluster_max_nodes = 4)\n",
" max_nodes = 6)\n",
"\n",
" # create the cluster\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
" # For a more detailed view of current BatchAI cluster status, use the 'status' property "
" # For a more detailed view of current Batch AI cluster status, use the 'status' property."
]
},
{
@@ -170,7 +165,8 @@
"metadata": {},
"source": [
"## Create Get Data File\n",
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file."
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
]
},
{
@@ -198,10 +194,10 @@
"def get_data():\n",
" \n",
" digits = datasets.load_digits()\n",
" X_digits = digits.data\n",
" y_digits = digits.target\n",
" X_train = digits.data\n",
" y_train = digits.target\n",
"\n",
" return { \"X\" : X_digits, \"y\" : y_digits }"
" return { \"X\" : X_train, \"y\" : y_train }"
]
},
{
@@ -210,17 +206,17 @@
"source": [
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
"\n",
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
"**Note:** When using Batch AI, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM."
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
]
},
{
@@ -230,12 +226,12 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"max_time_sec\": 120,\n",
" \"iteration_timeout_minutes\": 2,\n",
" \"iterations\": 20,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": False,\n",
" \"concurrent_iterations\": 5,\n",
" \"max_concurrent_iterations\": 5,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"\n",
@@ -248,6 +244,16 @@
" )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Models\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -261,10 +267,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the Results\n",
"## Explore the Results\n",
"\n",
"#### Loading executed runs\n",
"In case you need to load a previously executed run given a run id please enable the below cell"
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
]
},
{
@@ -278,13 +284,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -302,7 +308,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
@@ -312,7 +318,7 @@
"metadata": {},
"outputs": [],
"source": [
"# wait till the run finishes\n",
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
@@ -322,7 +328,7 @@
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
@@ -346,9 +352,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Canceling runs\n",
"## Cancelling Runs\n",
"\n",
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
@@ -357,10 +363,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
@@ -370,7 +376,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -388,8 +394,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric\n",
"Show the run/model which has the smallest `log_loss` value."
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `log_loss` value:"
]
},
{
@@ -408,8 +414,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration\n",
"Show the run and model from the 3rd iteration."
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
@@ -424,25 +430,6 @@
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"remote_run.register_model(description=description, tags=tags)\n",
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -459,8 +446,8 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
@@ -468,7 +455,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing our best pipeline"
"#### Testing Our Best Fitted Model"
]
},
{
@@ -477,11 +464,11 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"for index in np.random.choice(len(y_digits), 2):\n",
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
@@ -489,16 +476,14 @@
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,36 +13,36 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Auto ML : Remote Execution with Text data from Blobstorage\n",
"# Auto ML 04: Remote Execution with Text Data from Azure Blob Storage\n",
"\n",
"In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from a Azure blobstorage.\n",
"In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from Azure Blob Storage.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Attaching an existing DSVM to a workspace\n",
"3. Instantiating AutoMLConfig \n",
"4. Training the Model using the DSVM\n",
"5. Exploring the results\n",
"6. Testing the fitted model\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Attach an existing DSVM to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using the DSVM.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Parallel** Executions for iterations\n",
"- Asyncronous tracking of progress\n",
"- **Cancelling** individual iterations or the entire run\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- specify automl settings as **kwargs**\n",
"- handling **text** data with **preprocess** flag\n"
"- Specifying AutoML settings as `**kwargs`\n",
"- Handling **text** data using the `preprocess` flag\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -76,9 +76,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the run history container in the workspace\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-dsvm-blobstore'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-remote-dsvm-blobstore'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -101,7 +100,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -119,11 +118,11 @@
"metadata": {},
"source": [
"## Attach a Remote Linux DSVM\n",
"To use remote docker commpute target:\n",
"1. Create a Linux DSVM in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS. Make sure that disk space is available under /tmp because AutoML creates files under /tmp/azureml_runs. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4Gb per core.\n",
"2. Enter the IP address, username and password below\n",
"To use a remote Docker compute target:\n",
"1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n",
"2. Enter the IP address, user name and password below.\n",
"\n",
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this."
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on changing SSH ports for security reasons."
]
},
{
@@ -133,14 +132,32 @@
"outputs": [],
"source": [
"from azureml.core.compute import RemoteCompute\n",
"import time\n",
"\n",
"# Add your VM information below\n",
"dsvm_name = 'mydsvm1'\n",
"# If a compute with the specified compute_name already exists, it will be used and the dsvm_ip_addr, dsvm_ssh_port, \n",
"# dsvm_username and dsvm_password will be ignored.\n",
"compute_name = 'mydsvmb'\n",
"dsvm_ip_addr = '<<ip_addr>>'\n",
"dsvm_ssh_port = 22\n",
"dsvm_username = '<<username>>'\n",
"dsvm_password = '<<password>>'\n",
"\n",
"dsvm_compute = RemoteCompute.attach(workspace=ws, name=dsvm_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=22)"
"if compute_name in ws.compute_targets:\n",
" print('Using existing compute.')\n",
" dsvm_compute = ws.compute_targets[compute_name]\n",
"else:\n",
" RemoteCompute.attach(workspace=ws, name=compute_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=dsvm_ssh_port)\n",
"\n",
" while ws.compute_targets[compute_name].provisioning_state == 'Creating':\n",
" time.sleep(1)\n",
"\n",
" dsvm_compute = ws.compute_targets[compute_name]\n",
" \n",
" if dsvm_compute.provisioning_state == 'Failed':\n",
" print('Attached failed.')\n",
" print(dsvm_compute.provisioning_errors)\n",
" dsvm_compute.delete()"
]
},
{
@@ -148,9 +165,8 @@
"metadata": {},
"source": [
"## Create Get Data File\n",
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"\n",
"The *get_data()* function returns a [dictionary](README.md#getdata)."
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns a [dictionary](README.md#getdata)."
]
},
{
@@ -176,18 +192,18 @@
"from sklearn.preprocessing import LabelEncoder\n",
"\n",
"def get_data():\n",
" # Burning man 2016 data\n",
" # Load Burning Man 2016 data.\n",
" df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
" delimiter=\"\\t\", quotechar='\"')\n",
" # get integer labels\n",
" # Get integer labels.\n",
" le = LabelEncoder()\n",
" le.fit(df[\"Label\"].values)\n",
" y = le.transform(df[\"Label\"].values)\n",
" df = df.drop([\"Label\"], axis=1)\n",
" X = df.drop([\"Label\"], axis=1)\n",
"\n",
" df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n",
" X_train, _, y_train, _ = train_test_split(X, y, test_size = 0.1, random_state = 42)\n",
"\n",
" return { \"X\" : df, \"y\" : y }"
" return { \"X\" : X_train, \"y\" : y_train }"
]
},
{
@@ -196,7 +212,7 @@
"source": [
"### View data\n",
"\n",
"You can execute the *get_data()* function locally to view the *train* data"
"You can execute the `get_data()` function locally to view the training data."
]
},
{
@@ -218,21 +234,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
"## Configure AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
"\n",
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n",
"|**preprocess**| *True/False* <br>Setting this to *True* enables AutoML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
"|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> Default is *1*, you can set it to *-1* to use all cores|"
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n",
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
"|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.<br>Default is *1*; you can set it to *-1* to use all cores.|"
]
},
{
@@ -242,8 +258,8 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"max_time_sec\": 3600,\n",
" \"iterations\": 10,\n",
" \"iteration_timeout_minutes\": 60,\n",
" \"iterations\": 4,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": True,\n",
@@ -262,9 +278,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
"## Train the Models <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
"\n",
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run."
]
},
{
@@ -281,13 +297,13 @@
"metadata": {},
"source": [
"## Exploring the Results <a class=\"anchor\" id=\"Exploring-the-Results-Remote-DSVM\"></a>\n",
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -296,17 +312,27 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log. "
]
},
{
@@ -330,8 +356,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Canceling runs\n",
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
"## Cancelling Runs\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
@@ -340,10 +366,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
@@ -353,7 +379,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -371,7 +397,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric"
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `accuracy` value:"
]
},
{
@@ -388,7 +415,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration"
"#### Model from a Specific Iteration"
]
},
{
@@ -401,25 +428,6 @@
"zero_run, zero_model = remote_run.get_output(iteration = iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"remote_run.register_model(description=description, tags=tags)\n",
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -445,12 +453,12 @@
"le = LabelEncoder()\n",
"le.fit(df[\"Label\"].values)\n",
"y = le.transform(df[\"Label\"].values)\n",
"df = df.drop([\"Label\"], axis=1)\n",
"X = df.drop([\"Label\"], axis=1)\n",
"\n",
"_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n",
"_, X_test, _, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
"\n",
"\n",
"ypred = fitted_model.predict(df_test.values)\n",
"ypred = fitted_model.predict(X_test.values)\n",
"\n",
"\n",
"ypred_strings = le.inverse_transform(ypred)\n",
@@ -462,16 +470,14 @@
"\n",
"cm.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,33 +13,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 05 : Blacklisting models, Early termination and handling missing data\n",
"# AutoML 05: Blacklisting Models, Early Termination, and Handling Missing Data\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metric so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a black list of algos that AutoML will ignore for this run.\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"4. Training the Model\n",
"5. Exploring the results\n",
"6. Testing the fitted model\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Blacklist** certain pipelines\n",
"- Specify a **target metrics** to indicate stopping criteria\n",
"- Handling **Missing Data** in the input\n"
"- **Blacklisting** certain pipelines\n",
"- Specifying **target metrics** to indicate stopping criteria\n",
"- Handling **missing data** in the input\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment\n",
"\n",
"## Create Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -73,9 +72,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the experiment\n",
"# Choose a name for the experiment.\n",
"experiment_name = 'automl-local-missing-data'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-missing-data'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -98,7 +96,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -115,7 +113,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating Missing Data"
"### Creating missing data"
]
},
{
@@ -127,17 +125,17 @@
"from scipy import sparse\n",
"\n",
"digits = datasets.load_digits()\n",
"X_digits = digits.data[10:,:]\n",
"y_digits = digits.target[10:]\n",
"X_train = digits.data[10:,:]\n",
"y_train = digits.target[10:]\n",
"\n",
"# Add missing values in 75% of the lines\n",
"# Add missing values in 75% of the lines.\n",
"missing_rate = 0.75\n",
"n_missing_samples = int(np.floor(X_digits.shape[0] * missing_rate))\n",
"missing_samples = np.hstack((np.zeros(X_digits.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
"rng = np.random.RandomState(0)\n",
"rng.shuffle(missing_samples)\n",
"missing_features = rng.randint(0, X_digits.shape[1], n_missing_samples)\n",
"X_digits[np.where(missing_samples)[0], missing_features] = np.nan"
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
]
},
{
@@ -146,8 +144,8 @@
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(data=X_digits)\n",
"df['Label'] = pd.Series(y_digits, index=df.index)\n",
"df = pd.DataFrame(data = X_train)\n",
"df['Label'] = pd.Series(y_train, index=df.index)\n",
"df.head()"
]
},
@@ -155,21 +153,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"## Configure AutoML\n",
"\n",
"\n",
"This defines the settings and data used to run the experiment.\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `experiment_exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
"|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|\n",
"|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet<i><br><i>GradientBoostingRegressor<i><br><i>DecisionTreeRegressor<i><br><i>KNeighborsRegressor<i><br><i>LassoLars<i><br><i>SGDRegressor<i><br><i>RandomForestRegressor<i><br><i>ExtraTreesRegressor<i>|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
@@ -184,15 +181,15 @@
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 20,\n",
" n_cross_validations = 5,\n",
" preprocess = True,\n",
" exit_score = 0.994,\n",
" blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n",
" experiment_exit_score = 0.9984,\n",
" blacklist_models = ['KNN','LinearSVM'],\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" path = project_folder)"
]
},
@@ -200,10 +197,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model\n",
"## Train the Models\n",
"\n",
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
@@ -219,18 +216,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the results"
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"NOTE: The widget will display a link at the bottom. This will not currently work, but will eventually link to a web-ui to explore the individual run details."
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
@@ -239,7 +236,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
@@ -249,7 +246,7 @@
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
@@ -275,7 +272,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. Each pipeline is a tuple of three elements. The first element is the score for the pipeline the second element is the string description of the pipeline and the last element are the pipeline objects used for each fold in the cross-validation."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -291,7 +288,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric"
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `accuracy` value:"
]
},
{
@@ -308,7 +306,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration"
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
@@ -325,26 +324,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"local_run.register_model(description=description, tags=tags)\n",
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model "
"### Testing the best Fitted Model"
]
},
{
@@ -354,15 +334,15 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]\n",
"\n",
"#Randomly select digits and test\n",
"for index in np.random.choice(len(y_digits), 2):\n",
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
@@ -373,6 +353,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -0,0 +1,384 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 06: Train Test Split and Handling Sparse Data\n",
"\n",
"In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML for handling sparse data and how to specify custom cross validations splits.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- Explicit train test splits \n",
"- Handling **sparse data** in the input"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import random\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the experiment\n",
"experiment_name = 'automl-local-missing-data'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-missing-data'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"pd.DataFrame(data=output, index=['']).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Sparse Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_20newsgroups\n",
"from sklearn.feature_extraction.text import HashingVectorizer\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
"]\n",
"data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
"\n",
"\n",
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
" n_features = 2**16)\n",
"X_train = vectorizer.transform(X_train)\n",
"X_valid = vectorizer.transform(X_valid)\n",
"\n",
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
"summary_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure AutoML\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.<br>**Note:** If input data is sparse, you cannot use *True*.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification for the custom validation set.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 5,\n",
" preprocess = False,\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train,\n",
" X_valid = X_valid, \n",
" y_valid = y_valid, \n",
" path = project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Models\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
" \n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `accuracy` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# lookup_metric = \"accuracy\"\n",
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# iteration = 3\n",
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Best Fitted Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load test data.\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"X_test = vectorizer.transform(data_test.data)\n",
"y_test = data_test.target\n",
"\n",
"# Test our best pipeline.\n",
"\n",
"y_pred = fitted_model.predict(X_test)\n",
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
"\n",
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
"print(cm)\n",
"cm.plot()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -13,17 +13,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 07: Exploring previous runs\n",
"# AutoML 07: Exploring Previous Runs\n",
"\n",
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. List all Experiments for the workspace\n",
"2. List AutoML runs for an Experiment\n",
"3. Get details for a AutoML Run. (Automl settings, run widget & all metrics)\n",
"4. Download fitted pipeline for any iteration\n"
"In this notebook you will learn how to:\n",
"1. List all experiments in a workspace.\n",
"2. List all AutoML runs in an experiment.\n",
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
"4. Download a fitted pipeline for any iteration.\n"
]
},
{
@@ -87,7 +87,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -104,8 +104,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# List AutoML runs for an Experiment\n",
"You can set <i>Experiment</i> name with any experiment name from the result of the Experiment.list cell to load the AutoML runs."
"# List AutoML runs for an experiment\n",
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
]
},
{
@@ -114,12 +114,13 @@
"metadata": {},
"outputs": [],
"source": [
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell\n",
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
"\n",
"proj = ws.experiments()[experiment_name]\n",
"proj = ws.experiments[experiment_name]\n",
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
"pattern = re.compile('^AutoML_[^_]*$')\n",
"all_runs = list(proj.get_runs(properties={'azureml.runsource': 'automl'}))\n",
"automl_runs_project = []\n",
"for run in all_runs:\n",
" if(pattern.match(run.id)):\n",
" properties = run.get_properties()\n",
@@ -130,6 +131,8 @@
" else:\n",
" iterations = properties['num_iterations']\n",
" summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n",
" if run.get_details()['status'] == 'Completed':\n",
" automl_runs_project.append(run.id)\n",
" \n",
"from IPython.display import HTML\n",
"projname_html = HTML(\"<h3>{}</h3>\".format(proj.name))\n",
@@ -143,7 +146,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get Details for a Auto ML Run\n",
"# Get details for an AutoML run\n",
"\n",
"Copy the project name and run id from the previous cell output to find more details on a particular run."
]
@@ -154,9 +157,10 @@
"metadata": {},
"outputs": [],
"source": [
"run_id = '' # Filling your own run_id\n",
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
"\n",
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n",
@@ -210,7 +214,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download best model for any given metric"
"## Download the Best Model for Any Given Metric"
]
},
{
@@ -219,7 +223,7 @@
"metadata": {},
"outputs": [],
"source": [
"metric = 'AUC_weighted' # Replace with a metric name\n",
"metric = 'AUC_weighted' # Replace with a metric name.\n",
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
"fitted_model"
]
@@ -228,7 +232,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download model for any given iteration"
"## Download the Model for Any Given Iteration"
]
},
{
@@ -237,7 +241,7 @@
"metadata": {},
"outputs": [],
"source": [
"iteration = 4 # Replace with an interation number\n",
"iteration = 1 # Replace with an iteration number.\n",
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
"fitted_model"
]
@@ -246,7 +250,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register fitted model for deployment"
"# Register fitted model for deployment\n",
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
]
},
{
@@ -258,14 +263,14 @@
"description = 'AutoML Model'\n",
"tags = None\n",
"ml_run.register_model(description = description, tags = tags)\n",
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
"ml_run.model_id # Use this id to deploy the model as a web service in Azure."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register best model for any given metric"
"## Register the Best Model for Any Given Metric"
]
},
{
@@ -274,18 +279,18 @@
"metadata": {},
"outputs": [],
"source": [
"metric = 'AUC_weighted' # Replace with a metric name\n",
"metric = 'AUC_weighted' # Replace with a metric name.\n",
"description = 'AutoML Model'\n",
"tags = None\n",
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register model for any given iteration"
"## Register the Model for Any Given Iteration"
]
},
{
@@ -294,15 +299,20 @@
"metadata": {},
"outputs": [],
"source": [
"iteration = 4 # Replace with an interation number\n",
"iteration = 1 # Replace with an iteration number.\n",
"description = 'AutoML Model'\n",
"tags = None\n",
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,15 +13,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 08: Remote Execution with Text file\n",
"# AutoML 08: Remote Execution with DataStore\n",
"\n",
"In this sample accesses a data file on a remote DSVM. This is more efficient than reading the file from Blob storage in the get_data method.\n",
"This sample accesses a data file on a remote DSVM through DataStore. Advantages of using data store are:\n",
"1. DataStore secures the access details.\n",
"2. DataStore supports read, write to blob and file store\n",
"3. AutoML natively supports copying data from DataStore to DSVM\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Configuring the DSVM to allow files to be access directly by the get_data method.\n",
"2. get_data returning data from a local file.\n",
"1. Storing data in DataStore.\n",
"2. get_data returning data from DataStore.\n",
"\n"
]
},
@@ -43,6 +46,7 @@
"import logging\n",
"import os\n",
"import random\n",
"import time\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
@@ -51,6 +55,7 @@
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute import DsvmCompute\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
@@ -66,7 +71,7 @@
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-remote-dsvm-file'\n",
"experiment_name = 'automl-remote-datastore-file'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-remote-dsvm-file'\n",
"\n",
@@ -119,16 +124,17 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import DsvmCompute\n",
"compute_target_name = 'mydsvmc'\n",
"\n",
"dsvm_name = 'mydsvm'\n",
"try:\n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
" print('found existing dsvm.')\n",
" while ws.compute_targets[compute_target_name].provisioning_state == 'Creating':\n",
" time.sleep(1)\n",
" \n",
" dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n",
" print('found existing:', dsvm_compute.name)\n",
"except:\n",
" print('creating new dsvm.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output=True)"
]
},
@@ -136,9 +142,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Copy data file to the DSVM\n",
"Download the data file.\n",
"Copy the data file to the DSVM under the folder: /tmp/data"
"## Copy data file to local\n",
"\n",
"Download the data file.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mkdir data"
]
},
{
@@ -149,9 +164,90 @@
"source": [
"df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
" delimiter=\"\\t\", quotechar='\"')\n",
"df.to_csv(\"data.tsv\", sep=\"\\t\", quotechar='\"', index=False)\n",
"df.to_csv(\"data/data.tsv\", sep=\"\\t\", quotechar='\"', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload data to the cloud"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n",
"\n",
"# Now copy the file data.tsv to the folder /tmp/data on the DSVM"
"The data.tsv files are uploaded into a directory named data at the root of the datastore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Datastore\n",
"#blob_datastore = Datastore(ws, blob_datastore_name)\n",
"ds = ws.get_default_datastore()\n",
"print(ds.datastore_type, ds.account_name, ds.container_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ds.upload_files(\"data.tsv\")\n",
"ds.upload(src_dir='./data', target_path='data', overwrite=True, show_progress=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure & Run\n",
"\n",
"First let's create a DataReferenceConfigruation object to inform the system what data folder to download to the compute target.\n",
"The path_on_compute should be an absolute path to ensure that the data files are downloaded only once. The get_data method should use this same path to access the data files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
@@ -161,7 +257,9 @@
"## Create Get Data File\n",
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"\n",
"The *get_data()* function returns a [dictionary](README.md#getdata)."
"The *get_data()* function returns a [dictionary](README.md#getdata).\n",
"\n",
"The read_csv uses the path_on_compute value specified in the DataReferenceConfiguration call plus the path_on_datastore folder and then the actual file name."
]
},
{
@@ -186,20 +284,20 @@
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import LabelEncoder\n",
"import os\n",
"from os.path import expanduser, join, dirname\n",
"\n",
"def get_data():\n",
" # Burning man 2016 data\n",
" df = pd.read_csv('/tmp/data/data.tsv',\n",
" delimiter=\"\\t\", quotechar='\"')\n",
" df = pd.read_csv(\"/tmp/azureml_runs/data/data.tsv\", delimiter=\"\\t\", quotechar='\"')\n",
" # get integer labels\n",
" le = LabelEncoder()\n",
" le.fit(df[\"Label\"].values)\n",
" y = le.transform(df[\"Label\"].values)\n",
" df = df.drop([\"Label\"], axis=1)\n",
" X = df.drop([\"Label\"], axis=1)\n",
"\n",
" df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n",
" X_train, _, y_train, _ = train_test_split(X, y, test_size=0.1, random_state=42)\n",
"\n",
" return { \"X\" : df.values, \"y\" : y }"
" return { \"X\" : X_train.values, \"y\" : y_train }"
]
},
{
@@ -210,15 +308,15 @@
"\n",
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
"\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to AutoMLConfig.</i>\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n",
"|**max_concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n",
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
"|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> Default is *1*, you can set it to *-1* to use all cores|"
]
@@ -230,18 +328,19 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"max_time_sec\": 3600,\n",
" \"iterations\": 10,\n",
" \"iteration_timeout_minutes\": 60,\n",
" \"iterations\": 4,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": True,\n",
" \"max_cores_per_iteration\": 2,\n",
" \"max_cores_per_iteration\": 1,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path=project_folder,\n",
" compute_target = dsvm_compute,\n",
" run_configuration=conda_run_config,\n",
" #compute_target = dsvm_compute,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" **automl_settings\n",
" )"
@@ -251,7 +350,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
"## Training the Models <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
"\n",
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
]
@@ -285,10 +384,20 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -319,7 +428,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Canceling runs\n",
"## Canceling Runs\n",
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
]
},
@@ -342,7 +451,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The *get_output* method returns the best run and the fitted model. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
]
},
{
@@ -392,26 +501,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"remote_run.register_model(description=description, tags=tags)\n",
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
"### Testing the Best Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
]
},
{
@@ -432,11 +522,11 @@
"le = LabelEncoder()\n",
"le.fit(df[\"Label\"].values)\n",
"y = le.transform(df[\"Label\"].values)\n",
"df = df.drop([\"Label\"], axis=1)\n",
"X = df.drop([\"Label\"], axis=1)\n",
"\n",
"_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n",
"_, X_test, _, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
"\n",
"ypred = fitted_model.predict(df_test.values)\n",
"ypred = fitted_model.predict(X_test.values)\n",
"\n",
"ypred_strings = le.inverse_transform(ypred)\n",
"ytest_strings = le.inverse_transform(y_test)\n",
@@ -447,16 +537,14 @@
"\n",
"cm.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,29 +13,30 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 09: Classification with deployment\n",
"# AutoML 09: Classification with Deployment\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI).\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"3. Training the Model using local compute\n",
"4. Exploring the results\n",
"5. Registering the model\n",
"6. Creating Image and creating aci service\n",
"7. Testing the aci service\n"
"In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Register the model.\n",
"6. Create a container image.\n",
"7. Create an Azure Container Instance (ACI) service.\n",
"8. Test the ACI service.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -95,7 +96,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -112,17 +113,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"## Configure AutoML\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
@@ -135,19 +136,19 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[10:,:]\n",
"y_digits = digits.target[10:]\n",
"X_train = digits.data[10:,:]\n",
"y_train = digits.target[10:]\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" name = experiment_name,\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec=1200,\n",
" iteration_timeout_minutes = 20,\n",
" iterations = 10,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" path = project_folder)"
]
},
@@ -155,10 +156,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Model\n",
"## Train the Models\n",
"\n",
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
@@ -176,7 +177,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -192,7 +193,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
"### Register the Fitted Model for Deployment\n",
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
]
},
{
@@ -203,7 +205,7 @@
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"model = local_run.register_model(description=description, tags=tags, iteration=8)\n",
"model = local_run.register_model(description = description, tags = tags)\n",
"local_run.model_id # This will be written to the script file later in the notebook."
]
},
@@ -211,7 +213,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Scoring script ###"
"### Create Scoring Script"
]
},
{
@@ -224,6 +226,7 @@
"import pickle\n",
"import json\n",
"import numpy\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"\n",
@@ -249,14 +252,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create yml file for env"
"### Create a YAML File for the Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To ensure the consistence the fit results with the training results, the sdk dependence versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook 12.auto-ml-retrieve-the-training-sdk-versions.ipynb."
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
]
},
{
@@ -296,15 +299,12 @@
"metadata": {},
"outputs": [],
"source": [
"%%writefile myenv.yml\n",
"name: myenv\n",
"channels:\n",
" - defaults\n",
"dependencies:\n",
" - pip:\n",
" - numpy==1.14.2\n",
" - scikit-learn==0.19.2\n",
" - azureml-sdk[notebooks,automl]==<<azureml-version>> "
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
},
{
@@ -314,14 +314,14 @@
"outputs": [],
"source": [
"# Substitute the actual version number in the environment file.\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
"\n",
"with open(conda_env_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace('<<azureml-version>>', dependencies['azureml-sdk']))\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",
@@ -338,7 +338,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Image ###"
"### Create a Container Image"
]
},
{
@@ -361,14 +361,17 @@
" image_config = image_config, \n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
"image.wait_for_creation(show_output = True)\n",
"\n",
"if image.creation_state == 'Failed':\n",
" print(\"Image build log at: \" + image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy Image as web service on Azure Container Instance ###"
"### Deploy the Image as a Web Service on Azure Container Instance"
]
},
{
@@ -407,7 +410,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### To delete a service ##"
"### Delete a Web Service"
]
},
{
@@ -423,7 +426,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### To get logs from deployed service ###"
"### Get Logs from a Deployed Web Service"
]
},
{
@@ -439,7 +442,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test Web Service ###"
"### Test a Web Service"
]
},
{
@@ -450,15 +453,15 @@
"source": [
"#Randomly select digits and test\n",
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]\n",
"\n",
"for index in np.random.choice(len(y_digits), 3):\n",
"for index in np.random.choice(len(y_test), 3, replace = False):\n",
" print(index)\n",
" test_sample = json.dumps({'data':X_digits[index:index + 1].tolist()})\n",
" test_sample = json.dumps({'data':X_test[index:index + 1].tolist()})\n",
" predicted = aci_service.run(input_data = test_sample)\n",
" label = y_digits[index]\n",
" label = y_test[index]\n",
" predictedDict = json.loads(predicted)\n",
" title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n",
" fig = plt.figure(1, figsize = (3,3))\n",
@@ -467,16 +470,14 @@
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,14 +13,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 10: Multi output Example for AutoML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows an example to use AutoML to train the multi output problems by leveraging the correlation between the outputs using indicator vectors."
"# AutoML 10: Multi-output\n",
"\n",
"This notebook shows how to use AutoML to train multi-output problems by leveraging the correlation between the outputs using indicator vectors.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook."
]
},
{
@@ -52,7 +49,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -69,18 +66,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transformer functions\n",
"The transformation of the input are happening for input X and Y as following, e.g. Y = {y_1, y_2}, then X becomes\n",
"## Transformer Functions\n",
"The transformations of inputs `X` and `y` are happening as follows, e.g. `y = {y_1, y_2}`, then `X` becomes\n",
" \n",
"X 1 0\n",
"`X 1 0`\n",
" \n",
"X 0 1\n",
"`X 0 1`\n",
"\n",
"and Y becomes,\n",
"and `y` becomes,\n",
"\n",
"y_1\n",
"`y_1`\n",
"\n",
"y_2"
"`y_2`"
]
},
{
@@ -93,24 +90,24 @@
"from scipy import linalg\n",
"\n",
"#Transformer functions\n",
"def multi_output_transform_x_y(X, Y):\n",
" X_new = multi_output_transformer_x(X, Y.shape[1])\n",
" y_new = multi_output_transform_y(Y)\n",
"def multi_output_transform_x_y(X, y):\n",
" X_new = multi_output_transformer_x(X, y.shape[1])\n",
" y_new = multi_output_transform_y(y)\n",
" return X_new, y_new\n",
"\n",
"def multi_output_transformer_x(X, number_of_columns_Y):\n",
" indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_Y))\n",
"def multi_output_transformer_x(X, number_of_columns_y):\n",
" indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_y))\n",
" if sparse.issparse(X):\n",
" X_new = sparse.vstack(np.tile(X, number_of_columns_Y))\n",
" X_new = sparse.vstack(np.tile(X, number_of_columns_y))\n",
" indicator_vecs = sparse.coo_matrix(indicator_vecs)\n",
" X_new = sparse.hstack((X_new, indicator_vecs))\n",
" else:\n",
" X_new = np.tile(X, (number_of_columns_Y, 1))\n",
" X_new = np.tile(X, (number_of_columns_y, 1))\n",
" X_new = np.hstack((X_new, indicator_vecs))\n",
" return X_new\n",
"\n",
"def multi_output_transform_y(Y):\n",
" return Y.reshape(-1, order=\"F\")\n",
"def multi_output_transform_y(y):\n",
" return y.reshape(-1, order=\"F\")\n",
"\n",
"def multi_output_inverse_transform_y(y, number_of_columns_y):\n",
" return y.reshape((-1, number_of_columns_y), order = \"F\")"
@@ -120,7 +117,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## AutoML experiment set up"
"## AutoML Experiment Setup"
]
},
{
@@ -131,9 +128,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-multi-output'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-multi-output'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -154,7 +150,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a random dataset for the test purpose "
"## Create a Random Dataset for Test Purposes"
]
},
{
@@ -165,15 +161,15 @@
"source": [
"rng = np.random.RandomState(1)\n",
"X_train = np.sort(200 * rng.rand(600, 1) - 100, axis = 0)\n",
"Y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n",
"Y_train += (0.5 - rng.rand(*Y_train.shape))"
"y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n",
"y_train += (0.5 - rng.rand(*y_train.shape))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform X and Y transformation using transformer function"
"Perform X and y transformation using the transformer function."
]
},
{
@@ -182,7 +178,14 @@
"metadata": {},
"outputs": [],
"source": [
"X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, Y_train)"
"X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Configure AutoML using the transformed results."
]
},
{
@@ -206,7 +209,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fit the transformed data "
"## Fit the Transformed Data"
]
},
{
@@ -224,7 +227,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Get the best fit model\n",
"# Get the best fit model.\n",
"best_run, fitted_model = local_run.get_output()"
]
},
@@ -234,8 +237,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Generate random data set for predicting\n",
"X_predict = np.sort(200 * rng.rand(200, 1) - 100, axis=0)"
"# Generate random data set for predicting.\n",
"X_test = np.sort(200 * rng.rand(200, 1) - 100, axis = 0)"
]
},
{
@@ -244,11 +247,12 @@
"metadata": {},
"outputs": [],
"source": [
"# Transform predict data\n",
"X_predict_transformed = multi_output_transformer_x(X_predict, Y_train.shape[1])\n",
"# Predict and inverse transform the prediction\n",
"y_predict = fitted_model.predict(X_predict_transformed)\n",
"Y_predict = multi_output_inverse_transform_y(y_predict, Y_train.shape[1])"
"# Transform predict data.\n",
"X_test_transformed = multi_output_transformer_x(X_test, y_train.shape[1])\n",
"\n",
"# Predict and inverse transform the prediction.\n",
"y_predict = fitted_model.predict(X_test_transformed)\n",
"y_predict = multi_output_inverse_transform_y(y_predict, y_train.shape[1])"
]
},
{
@@ -257,18 +261,16 @@
"metadata": {},
"outputs": [],
"source": [
"print(Y_predict)"
"print(y_predict)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,26 +13,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 11: Sample weight\n",
"# AutoML 11: Sample Weight\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use sample weight with the AutoML Classifier.\n",
"Sample weight is used where some sample values are more important than others.\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. How to specifying sample_weight\n",
"2. The difference that it makes to test results\n",
"\n"
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -66,11 +62,10 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"# Choose names for the regular and the sample weight experiments.\n",
"experiment_name = 'non_sample_weight_experiment'\n",
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
"\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -94,7 +89,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -111,9 +106,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"## Configure AutoML\n",
"\n",
"Instantiate two AutoMLConfig Objects. One will be used with sample_weight and one without."
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
]
},
{
@@ -123,33 +118,33 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[100:,:]\n",
"y_digits = digits.target[100:]\n",
"X_train = digits.data[100:,:]\n",
"y_train = digits.target[100:]\n",
"\n",
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_digits])\n",
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])\n",
"\n",
"automl_classifier = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 10,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" path = project_folder)\n",
"\n",
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 10,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" sample_weight = sample_weight,\n",
" path = project_folder)"
]
@@ -158,10 +153,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Models\n",
"## Train the Models\n",
"\n",
"Call the submit method on the experiment and pass the configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
@@ -181,7 +176,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Models\n",
"### Test the Best Fitted Model\n",
"\n",
"#### Load Test Data"
]
@@ -193,8 +188,8 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:100, :]\n",
"y_digits = digits.target[:100]\n",
"X_test = digits.data[:100, :]\n",
"y_test = digits.target[:100]\n",
"images = digits.images[:100]"
]
},
@@ -202,7 +197,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Compare the pipelines\n",
"#### Compare the Models\n",
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
]
},
@@ -212,11 +207,11 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"for index in range(0,len(y_digits)):\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" predicted_sample_weight = fitted_model_sample_weight.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
"# Randomly select digits and test.\n",
"for index in range(0,len(y_test)):\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
@@ -228,6 +223,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,7 +13,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 12: Retrieving Training SDK Versions"
"# AutoML 12: Retrieving Training SDK Versions\n",
"\n",
"This example shows how to find the SDK versions used for an experiment.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook."
]
},
{
@@ -36,8 +40,7 @@
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun\n",
"from azureml.train.automl.utilities import get_sdk_dependencies"
"from azureml.train.automl.run import AutoMLRun\n"
]
},
{
@@ -46,7 +49,7 @@
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
@@ -63,30 +66,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Retrieve the SDK versions in the current env"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To retrieve the SDK versions in the current env, simple running get_sdk_dependencies()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"get_sdk_dependencies()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Training Model Using AutoML"
"# Train models using AutoML"
]
},
{
@@ -97,9 +77,8 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-classification'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
@@ -123,8 +102,8 @@
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[10:,:]\n",
"y_digits = digits.target[10:]\n",
"X_train = digits.data[10:,:]\n",
"y_train = digits.target[10:]\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
@@ -132,8 +111,8 @@
" iterations = 3,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" X = X_train, \n",
" y = y_train,\n",
" path = project_folder)\n",
"\n",
"local_run = experiment.submit(automl_config, show_output = True)"
@@ -143,14 +122,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Retrieve the SDK versions from RunHistory"
"# Retrieve the SDK versions from RunHistory"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get the SDK versions from RunHistory, first the RunId need to be recorded. This can either be done by copy it from the output message or retieve if after each run."
"To get the SDK versions from RunHistory, first the run id needs to be recorded. This can either be done by copying it from the output message or by retrieving it after each run."
]
},
{
@@ -159,6 +138,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Use a run id copied from an output message.\n",
"#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n",
"\n",
"# Retrieve the run id from a run.\n",
"run_id = local_run.id\n",
"print(run_id)"
]
@@ -167,7 +150,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize a new AutoMLRunClass."
"Initialize a new `AutoMLRun` object."
]
},
{
@@ -177,7 +160,6 @@
"outputs": [],
"source": [
"experiment_name = 'automl-local-classification'\n",
"#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)"
@@ -217,6 +199,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -0,0 +1,446 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 13: Prepare Data using `azureml.dataprep` for Local Execution\n",
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
"\n",
"Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
"2. Pass the `Dataflow` to AutoML for a local run.\n",
"3. Pass the `Dataflow` to AutoML for a remote run."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"import azureml.dataprep as dprep\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
" \n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataprep-local'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataprep-local'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"pd.DataFrame(data = output, index = ['']).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading Data using DataPrep"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X.skip(1).head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure AutoML\n",
"\n",
"This creates a general AutoML settings object applicable for both local and remote runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n",
" \"verbosity\" : logging.INFO,\n",
" \"n_cross_validations\": 3\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Local Run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `Dataflow` Objects\n",
"\n",
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" X = X,\n",
" y = y,\n",
" **automl_settings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
" \n",
"import pandas as pd\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the first iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 0\n",
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test the Best Fitted Model\n",
"\n",
"#### Load Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"\n",
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import random\n",
"import numpy as np\n",
"\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sklearn.digits.data + target\n",
"digits_complete = dprep.smart_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits_complete.to_pandas_dataframe().shape\n",
"labels_column = 'Column64'\n",
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -13,41 +13,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 13: Prepare Data using `azureml.dataprep`\n",
"In this example we showcase how you can use `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone - full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
"# AutoML 13: Prepare Data using `azureml.dataprep` for Remote Execution (DSVM)\n",
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
"\n",
"Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Defining data loading and preparation steps in a `Dataflow` using `azureml.dataprep`\n",
"2. Passing the `Dataflow` to AutoML for local run\n",
"3. Passing the `Dataflow` to AutoML for remote run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install `azureml.dataprep` SDK"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please restart your kernel after the below installs.\n",
"\n",
"Tornado must be downgraded to a pre-5 version due to a known Tornado x Jupyter event loop bug."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install azureml-dataprep\n",
"!pip install tornado==4.5.1"
"In this notebook you will learn how to:\n",
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
"2. Pass the `Dataflow` to AutoML for a local run.\n",
"3. Pass the `Dataflow` to AutoML for a remote run."
]
},
{
@@ -73,9 +47,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
@@ -86,14 +60,13 @@
"source": [
"import logging\n",
"import os\n",
"import time\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute import DsvmCompute\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.runconfig import CondaDependencies\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.workspace import Workspace\n",
"import azureml.dataprep as dprep\n",
"from azureml.train.automl import AutoMLConfig"
@@ -108,9 +81,9 @@
"ws = Workspace.from_config()\n",
" \n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataprep-classification'\n",
"experiment_name = 'automl-dataprep-remote-dsvm'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataprep-classification'\n",
"project_folder = './sample_projects/automl-dataprep-remote-dsvm'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
@@ -139,12 +112,12 @@
"metadata": {},
"outputs": [],
"source": [
"# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file\n",
"# data pulled from sklearn.datasets.load_digits()\n",
"# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # remove header\n",
"X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter).\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
@@ -156,7 +129,7 @@
"source": [
"## Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large dataset."
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
]
},
{
@@ -172,9 +145,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate AutoML Settings\n",
"## Configure AutoML\n",
"\n",
"This creates a general Auto ML Settings applicable for both Local and Remote runs."
"This creates a general AutoML settings object applicable for both local and remote runs."
]
},
{
@@ -184,7 +157,7 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"max_time_sec\": 600,\n",
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n",
@@ -197,46 +170,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Local Run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass data with Dataflows\n",
"\n",
"The `Dataflow` objects captured above can be passed to `submit` method for local run. AutoML will retrieve the results from the `Dataflow` for model training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" X = X,\n",
" y = y,\n",
" **automl_settings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remote Run\n",
"*Note: This feature might not work properly in your workspace region before the October update. You may jump to the \"Exploring the results\" section below to explore other features AutoML and DataPrep has to offer.*"
"## Remote Run"
]
},
{
@@ -252,62 +186,45 @@
"metadata": {},
"outputs": [],
"source": [
"dsvm_name = 'mydsvm'\n",
"dsvm_name = 'mydsvmd'\n",
"\n",
"try:\n",
" while ws.compute_targets[dsvm_name].provisioning_state == 'Creating':\n",
" time.sleep(1)\n",
" \n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
" print('found existing dsvm.')\n",
" print('Found existing DVSM.')\n",
"except:\n",
" print('creating new dsvm.')\n",
" print('Creating a new DSVM.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Update Conda Dependency file to have AutoML and DataPrep SDK\n",
"\n",
"Currently AutoML and DataPrep SDK is not installed with Azure ML SDK by default. Due to this we update the conda dependency file to add such dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cd = CondaDependencies()\n",
"cd.add_pip_package(pip_package='azureml-dataprep')\n",
"cd.add_pip_package(pip_package='tornado==4.5.1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a RunConfiguration with DSVM name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_config = RunConfiguration(conda_dependencies=cd)\n",
"run_config.target = dsvm_compute\n",
"run_config.auto_prepare_environment = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass data with Dataflows\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"The `Dataflow` objects captured above can also be passed to `submit` method for remote run. AutoML will serialize the `Dataflow` and send to remote compute target. The `Dataflow` will not be evaluated locally."
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"conda_run_config.target = dsvm_compute\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `Dataflow` Objects\n",
"\n",
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
]
},
{
@@ -319,31 +236,10 @@
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path = project_folder,\n",
" run_configuration = run_config,\n",
" run_configuration=conda_run_config,\n",
" X = X,\n",
" y = y,\n",
" **automl_settings)\n",
"# Please uncomment the line below to try out remote run with dataprep. \n",
"# This feature might not work properly in your workspace region before the October update.\n",
"# remote_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
" **automl_settings)"
]
},
{
@@ -352,15 +248,42 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"RunDetails(local_run).show() "
"remote_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve all child runs\n",
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
@@ -370,7 +293,7 @@
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
@@ -388,7 +311,7 @@
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -397,7 +320,7 @@
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
@@ -406,8 +329,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric\n",
"Give me the run and the model that has the smallest `log_loss`:"
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
@@ -417,7 +340,7 @@
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
@@ -426,8 +349,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any iteration\n",
"Give me the run and the model from the 1st iteration:"
"#### Model from a Specific Iteration\n",
"Show the run and the model from the first iteration:"
]
},
{
@@ -437,7 +360,7 @@
"outputs": [],
"source": [
"iteration = 0\n",
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
"print(best_run)\n",
"print(fitted_model)"
]
@@ -446,7 +369,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model \n",
"### Test the Best Fitted Model\n",
"\n",
"#### Load Test Data"
]
@@ -460,8 +383,8 @@
"from sklearn import datasets\n",
"\n",
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:10, :]\n",
"y_digits = digits.target[:10]\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
@@ -469,7 +392,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing our best pipeline\n",
"#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works."
]
},
@@ -485,10 +408,10 @@
"import random\n",
"import numpy as np\n",
"\n",
"for index in np.random.choice(len(y_digits), 2):\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
@@ -508,9 +431,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the Dataflows to use for AutoML later\n",
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"\n",
"`Dataflow` objects are immutable. Each of them is composed of a list of data preparation steps. A `Dataflow` can be branched at any point for further usage."
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
@@ -544,6 +467,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -13,22 +13,34 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 06: Custom CV splits, handling sparse data\n",
"# AutoML 14: Explain classification model and visualize the explanation\n",
"\n",
"In this example we use the scikit learn's [20newsgroup](In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling sparse data and specify custom cross validation splits.\n",
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment using an existing Workspace\n",
"1. Creating an Experiment in an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"4. Training the Model\n",
"5. Exploring the results\n",
"6. Testing the fitted model\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Custom CV** splits \n",
"- Handling **Sparse Data** in the input"
"3. Training the Model using local compute and explain the model\n",
"4. Visualization model's feature importance in widget\n",
"5. Explore best model's explanation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install AzureML Explainer SDK "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install azureml_sdk[explain]"
]
},
{
@@ -50,12 +62,7 @@
"import os\n",
"import random\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
@@ -71,17 +78,17 @@
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for the experiment\n",
"experiment_name = 'automl-local-missing-data'\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-local-classification'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-missing-data'\n",
"project_folder = './sample_projects/automl-local-classification-model-explanation'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
@@ -113,7 +120,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Sparse Data"
"## Load Iris Data Set"
]
},
{
@@ -122,33 +129,23 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_20newsgroups\n",
"from sklearn.feature_extraction.text import HashingVectorizer\n",
"from sklearn import datasets\n",
"\n",
"iris = datasets.load_iris()\n",
"y = iris.target\n",
"X = iris.data\n",
"\n",
"features = iris.feature_names\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
" y,\n",
" test_size=0.1,\n",
" random_state=100,\n",
" stratify=y)\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
"]\n",
"data_train = fetch_20newsgroups(subset='train', categories=categories,\n",
" shuffle=True, random_state=42,\n",
" remove=remove)\n",
"\n",
"X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size=0.33, random_state=42)\n",
"\n",
"\n",
"vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,\n",
" n_features=2**16)\n",
"X_train = vectorizer.transform(X_train)\n",
"X_validation = vectorizer.transform(X_validation)\n",
"\n",
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
"summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n",
"summary_df"
"X_train = pd.DataFrame(X_train, columns=features)\n",
"X_test = pd.DataFrame(X_test, columns=features)"
]
},
{
@@ -157,19 +154,19 @@
"source": [
"## Instantiate Auto ML Config\n",
"\n",
"This defines the settings and data used to run the experiment.\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*<br>*Note: If input data is Sparse you cannot use preprocess=True*|\n",
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom Validation set|\n",
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. for the custom Validation set|\n",
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]|\n",
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
]
},
@@ -182,14 +179,14 @@
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec=3600,\n",
" iterations=5,\n",
" preprocess=False,\n",
" max_time_sec = 12000,\n",
" iterations = 10,\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train,\n",
" X_valid = X_validation, \n",
" y_valid = y_validation, \n",
" X_valid = X_test,\n",
" y_valid = y_test,\n",
" model_explainability=True,\n",
" path=project_folder)"
]
},
@@ -223,7 +220,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for monitoring runs\n",
"### Widget for monitoring runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"\n",
@@ -236,43 +233,20 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
" metricslist[int(properties['iteration'])] = metrics\n",
" \n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
"child_run = next(local_run.get_children())\n",
"RunDetails(child_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
@@ -288,14 +262,25 @@
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()"
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric"
"### Best Model 's explanation\n",
"\n",
"Retrieve the explanation from the best_run. And explanation information includes:\n",
"\n",
"1.\tshap_values: The explanation information generated by shap lib\n",
"2.\texpected_values: The expected value of the model applied to set of X_train data.\n",
"3.\toverall_summary: The model level feature importance values sorted in descending order\n",
"4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n",
"5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n",
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case"
]
},
{
@@ -304,15 +289,10 @@
"metadata": {},
"outputs": [],
"source": [
"# lookup_metric = \"accuracy\"\n",
"# best_run, fitted_model = local_run.get_output(metric=lookup_metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration"
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
"\n",
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
" retrieve_model_explanation(best_run)"
]
},
{
@@ -321,15 +301,8 @@
"metadata": {},
"outputs": [],
"source": [
"# iteration = 3\n",
"# best_run, fitted_model = local_run.get_output(iteration=iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register fitted model for deployment"
"print(overall_summary)\n",
"print(overall_imp)"
]
},
{
@@ -338,17 +311,15 @@
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"local_run.register_model(description=description, tags=tags)\n",
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
"print(per_class_summary)\n",
"print(per_class_imp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model "
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
]
},
{
@@ -357,44 +328,29 @@
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()### Testing the Fitted Model\n",
"from azureml.train.automl.automlexplainer import explain_model\n",
"\n",
"#### Load Test Data\n",
"import sklearn\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
"]\n",
"\n",
"\n",
"data_test = fetch_20newsgroups(subset='test', categories=categories,\n",
" shuffle=True, random_state=42,\n",
" remove=remove)\n",
"\n",
"vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,\n",
" n_features=2**16)\n",
"\n",
"X_test = vectorizer.transform(data_test.data)\n",
"y_test = data_test.target\n",
"\n",
"#### Testing our best pipeline\n",
"\n",
"ypred = fitted_model.predict(X_test)\n",
"ypred_strings = [categories[i] for i in ypred]\n",
"ytest_strings = [categories[i] for i in y_test]\n",
"\n",
"cm = ConfusionMatrix(ytest_strings, ypred_strings)\n",
"print(cm)\n",
"cm.plot()"
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
" explain_model(fitted_model, X_train, X_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(overall_summary)\n",
"print(overall_imp)"
]
}
],
"metadata": {
"authors": [
{
"name": "xif"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -0,0 +1,423 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 15a: Classification with ensembling on local compute\n",
"\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig` which enables an extra ensembling iteration.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import random\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-classification'\n",
"project_folder = './sample_projects/automl-local-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"pd.DataFrame(data = output, index = ['']).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Training Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"\n",
"digits = datasets.load_digits()\n",
"\n",
"# Exclude the first 50 rows from training so that they can be used for test.\n",
"X_train = digits.data[150:,:]\n",
"y_train = digits.target[150:]\n",
"X_valid = digits.data[50:150]\n",
"y_valid = digits.target[50:150]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure AutoML\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
"|**enable_ensembling**|Flag to enable an ensembling iteration after all the other iterations complete.|\n",
"|**ensemble_iterations**|Number of iterations during which we choose a fitted pipeline to be part of the final ensemble.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'classification.log',\n",
" primary_metric = 'AUC_weighted',\n",
" iteration_timeout_minutes = 60,\n",
" iterations = 10,\n",
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train,\n",
" X_valid = X_valid,\n",
" y_valid = y_valid,\n",
" enable_ensembling = True,\n",
" ensemble_iterations = 5,\n",
" path = project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Model\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = local_run.continue_experiment(X = X_train, \n",
" y = y_train,\n",
" X_valid = X_valid,\n",
" y_valid = y_valid,\n",
" show_output = True,\n",
" iterations = 5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 3\n",
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
"print(third_run)\n",
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test the Best Fitted Model\n",
"\n",
"#### Load Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing Our Best Pipeline\n",
"We will try to predict 2 digits and see how our model works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize = (3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
}
],
"metadata": {
"authors": [
{
"name": "ratanase"
}
],
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,449 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 15b: Regression with ensembling on remote compute\n",
"\n",
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple regression problem.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`which enables an extra ensembling iteration.\n",
"3. Train the model using remote compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import random\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-regression'\n",
"project_folder = './sample_projects/automl-local-regression'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"pd.DataFrame(data = output, index = ['']).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Remote Linux DSVM\n",
"**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import DsvmCompute\n",
"\n",
"dsvm_name = 'mydsvm'\n",
"try:\n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
" print('Found an existing DSVM.')\n",
"except:\n",
" print('Creating a new DSVM.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Get Data File\n",
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns data using scikit-learn's `diabetes` dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/get_data.py\n",
"\n",
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"def get_data():\n",
" X, y = load_diabetes(return_X_y = True)\n",
"\n",
" columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
"\n",
" X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size = 0.2, random_state = 0)\n",
" X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size = 0.5, random_state = 0)\n",
" return { \"X\" : X_train, \"y\" : y_train, \"X_valid\": X_valid, \"y_valid\": y_valid }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure AutoML\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**enable_ensembling**|Flag to enable an ensembling iteration after all the other iterations complete.|\n",
"|**ensemble_iterations**|Number of iterations during which we choose a fitted pipeline to be part of the final ensemble.|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'regression',\n",
" iteration_timeout_minutes = 10,\n",
" iterations = 20,\n",
" primary_metric = 'spearman_correlation',\n",
" debug_log = 'regression.log',\n",
" verbosity = logging.INFO,\n",
" compute_target = dsvm_compute,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" enable_ensembling = True,\n",
" ensemble_iterations = 5,\n",
" path = project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Model\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `root_mean_squared_error` value."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"root_mean_squared_error\"\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test the Best Model (Ensemble)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Predict on training and test set, and calculate residual values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_diabetes\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"X, y = load_diabetes(return_X_y = True)\n",
"\n",
"X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size = 0.2, random_state = 0)\n",
"X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size = 0.5, random_state = 0)\n",
"\n",
"\n",
"y_pred_train = fitted_model.predict(X_train)\n",
"y_residual_train = y_train - y_pred_train\n",
"\n",
"y_pred_test = fitted_model.predict(X_test)\n",
"y_residual_test = y_test - y_pred_test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from sklearn import datasets\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"\n",
"# Set up a multi-plot chart.\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
"f.set_figheight(6)\n",
"f.set_figwidth(16)\n",
"\n",
"# Plot residual values of training set.\n",
"a0.axis([0, 360, -200, 200])\n",
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
"a0.set_xlabel('Training samples', fontsize = 12)\n",
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
"\n",
"# Plot a histogram.\n",
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
"\n",
"# Plot residual values of test set.\n",
"a1.axis([0, 90, -200, 200])\n",
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
"a1.set_xlabel('Test samples', fontsize = 12)\n",
"a1.set_yticklabels([])\n",
"\n",
"# Plot a histogram.\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
"\n",
"plt.show()"
]
}
],
"metadata": {
"authors": [
{
"name": "ratanase"
}
],
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -20,15 +20,13 @@ If you are an experienced data scientist, AutoML will help increase your product
## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud
1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks)
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks if they are not already there.
1. Create a workspace and its configuration file (**config.json**) using [these instructions](https://aka.ms/aml-how-to-configure-environment).
1. Select `+New` in the Azure Notebook toolbar to add your **config.json** file to the imported folder.
![upload config file to notebook folder](../images/additems.png)
1. Open the notebook.
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
1. Follow the instructions in the [../00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace.
1. Open one of the sample notebooks.
**Make sure the Azure Notebook kernal is set to `Python 3.6`** when you open a notebook.
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook.
![set kernal to Python 3.6](../images/python36.png)
![set kernel to Python 3.6](../images/python36.png)
<a name="localconda"></a>
## Running samples in a Local Conda environment
@@ -59,7 +57,7 @@ There's no need to install mini-conda specifically.
### 3. Setup a new conda environment
The **automl/automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook.
It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. It can take about 30 minutes to execute.
It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. It can take about 10 minutes to execute.
## Windows
Start a conda command windows, cd to the **automl** folder where the sample notebooks were extracted and then run:
```
@@ -136,7 +134,7 @@ bash automl_setup_linux.sh
- Specify a target metrics to indicate stopping criteria
- Handling Missing Data in the input
- [06.auto-ml-sparse-data-custom-cv-split.ipynb](06.auto-ml-sparse-data-custom-cv-split.ipynb)
- [06.auto-ml-sparse-data-train-test-split.ipynb](06.auto-ml-sparse-data-train-test-split.ipynb)
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
- Handle sparse datasets
- Specify custom train and validation set
@@ -145,11 +143,11 @@ bash automl_setup_linux.sh
- List all projects for the workspace
- List all AutoML Runs for a given project
- Get details for a AutoML Run. (Automl settings, run widget & all metrics)
- Downlaod fitted pipeline for any iteration
- Download fitted pipeline for any iteration
- [08.auto-ml-remote-execution-with-text-file-on-DSVM](08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb)
- [08.auto-ml-remote-execution-with-DataStore.ipynb](08.auto-ml-remote-execution-with-DataStore.ipynb)
- Dataset: scikit learn's [digit dataset](https://innovate.burningman.org/datasets-page/)
- Download the data and store it in the DSVM to improve performance.
- Download the data and store it in DataStore.
- [09.auto-ml-classification-with-deployment.ipynb](09.auto-ml-classification-with-deployment.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
@@ -173,6 +171,21 @@ bash automl_setup_linux.sh
- [13.auto-ml-dataprep.ipynb](13.auto-ml-dataprep.ipynb)
- Using DataPrep for reading data
- [14.auto-ml-model-explanation.ipynb](14.auto-ml-model-explanation.ipynb)
- Dataset: sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)
- Explaining the AutoML classification pipeline
- Visualizing feature importance in widget
- [15a.auto-ml-classification-ensemble.ipynb](15a.auto-ml-classification-ensemble.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
- Enables an extra iteration for generating an Ensemble of models
- Uses local compute for training
- [15b.auto-ml-regression-ensemble.ipynb](15b.auto-ml-regression-ensemble.ipynb)
- Dataset: scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)
- Enables an extra iteration for generating an Ensemble of models
- Uses remote Linux DSVM for training
<a name="documentation"></a>
# Documentation
## Table of Contents
@@ -187,17 +200,46 @@ bash automl_setup_linux.sh
|Property|Description|Default|
|-|-|-|
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
|**max_time_sec**|Time limit in seconds for each iteration|None|
|**iteration_timeout_minutes**|Time limit in minutes for each iteration|None|
|**iterations**|Number of iterations. In each iteration trains the data with a specific pipeline. To get the best result, use at least 100. |100|
|**n_cross_validations**|Number of cross validation splits|None|
|**validation_size**|Size of validation set as percentage of all training samples|None|
|**concurrent_iterations**|Max number of iterations that would be executed in parallel|1|
|**max_concurrent_iterations**|Max number of iterations that would be executed in parallel|1|
|**preprocess**|*True/False* <br>Setting this to *True* enables preprocessing <br>on the input to handle missing data, and perform some common feature extraction<br>*Note: If input data is Sparse you cannot use preprocess=True*|False|
|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> You can set it to *-1* to use all cores|1|
|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>gradient boosting</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoostingRegressor</i><br><i>DecisionTreeRegressor</i><br><i>KNeighborsRegressor</i><br><i>LassoLars</i><br><i>SGDRegressor</i><br><i>RandomForestRegressor</i><br><i>ExtraTreesRegressor</i>|None|
|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
|**blacklist_models**|*Array* of *strings* indicating models to ignore for Auto ML from the list of models.|None|
|**whilelist_models**|*Array* of *strings* use only models listed for Auto ML from the list of models..|None|
<a name="cvsplits"></a>
## List of models for white list/blacklist
**Classification**
<br><i>LogisticRegression</i>
<br><i>SGD</i>
<br><i>MultinomialNaiveBayes</i>
<br><i>BernoulliNaiveBayes</i>
<br><i>SVM</i>
<br><i>LinearSVM</i>
<br><i>KNN</i>
<br><i>DecisionTree</i>
<br><i>RandomForest</i>
<br><i>ExtremeRandomTrees</i>
<br><i>LightGBM</i>
<br><i>GradientBoosting</i>
<br><i>TensorFlowDNN</i>
<br><i>TensorFlowLinearClassifier</i>
<br><br>**Regression**
<br><i>ElasticNet</i>
<br><i>GradientBoosting</i>
<br><i>DecisionTree</i>
<br><i>KNN</i>
<br><i>LassoLars</i>
<br><i>SGD</i>
<br><i>RandomForest</i>
<br><i>ExtremeRandomTrees</i>
<br><i>LightGBM</i>
<br><i>TensorFlowLinearRegressor</i>
<br><i>TensorFlowDNN</i>
## Cross validation split options
### K-Folds Cross Validation
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
@@ -255,10 +297,10 @@ The main code of the file must be indented so that it is under this condition.
# Troubleshooting
## Iterations fail and the log contains "MemoryError"
This can be caused by insufficient memory on the DSVM. AutoML loads all training data into memory. So, the available memory should be more than the training data size.
If you are using a remote DSVM, memory is needed for each concurrent iteration. The concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
To resolve this issue, allocate a DSVM with more memory or reduce the value specified for concurrent_iterations.
If you are using a remote DSVM, memory is needed for each concurrent iteration. The max_concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
## Iterations show as "Not Responding" in the RunDetails widget.
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the concurrent_iterations setting should always be less than the number of cores of the DSVM.
To resolve this issue, try reducing the value specified for the concurrent_iterations setting.
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.

View File

@@ -4,17 +4,28 @@ dependencies:
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6
- nb_conda
- matplotlib
- numpy>=1.11.0,<1.16.0
- matplotlib==2.1.0
- numpy>=1.11.0,<1.15.0
- cython
- urllib3<1.24
- scipy>=0.19.0,<0.20.0
- scikit-learn>=0.18.0,<=0.19.1
- pandas>=0.19.0,<0.23.0
- pandas>=0.22.0,<0.23.0
# Required for azuremlftk
- dill
- pyodbc
- statsmodels
- numexpr
- keras
- distributed>=1.21.5,<1.24
- pip:
# Required for azuremlftk
- https://azuremlpackages.blob.core.windows.net/forecasting/azuremlftk-0.1.18313.5a1-py3-none-any.whl
# Required packages for AzureML execution, history, and data preparation.
- --extra-index-url https://pypi.python.org/simple
- azureml-sdk[automl]
- azureml-train-widgets
- azure-cli
- azureml-sdk[automl,notebooks]
- pandas_ml

31
automl/automl_env_mac.yml Normal file
View File

@@ -0,0 +1,31 @@
name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6
- nb_conda
- matplotlib==2.1.0
- numpy>=1.15.3
- cython
- urllib3<1.24
- scipy>=0.19.0,<0.20.0
- scikit-learn>=0.18.0,<=0.19.1
- pandas>=0.22.0,<0.23.0
# Required for azuremlftk
- dill
- pyodbc
- statsmodels
- numexpr
- keras
- distributed>=1.21.5,<1.24
- pip:
# Required for azuremlftk
- https://azuremlpackages.blob.core.windows.net/forecasting/azuremlftk-0.1.18313.5a1-py3-none-any.whl
# Required packages for AzureML execution, history, and data preparation.
- azureml-sdk[automl,notebooks]
- pandas_ml

View File

@@ -1,15 +1,21 @@
@echo off
set conda_env_name=%1
set automl_env_file=%2
set PIP_NO_WARN_SCRIPT_LOCATION=0
IF "%conda_env_name%"=="" SET conda_env_name="azure_automl"
IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
IF NOT EXIST %automl_env_file% GOTO YmlMissing
call conda activate %conda_env_name% 2>nul:
if not errorlevel 1 (
call conda env update --file automl_env.yml -n %conda_env_name%
echo Upgrading azureml-sdk[automl] in existing conda environment %conda_env_name%
call pip install --upgrade azureml-sdk[automl,notebooks]
if errorlevel 1 goto ErrorExit
) else (
call conda env create -f automl_env.yml -n %conda_env_name%
call conda env create -f %automl_env_file% -n %conda_env_name%
)
call conda activate %conda_env_name% 2>nul:
@@ -17,10 +23,12 @@ if errorlevel 1 goto ErrorExit
call pip install psutil
call jupyter nbextension install --py azureml.train.widgets
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
call jupyter nbextension install --py azureml.widgets --user
if errorlevel 1 goto ErrorExit
call jupyter nbextension enable --py azureml.train.widgets
call jupyter nbextension enable --py azureml.widgets --user
if errorlevel 1 goto ErrorExit
echo.
@@ -35,6 +43,9 @@ jupyter notebook --log-level=50
goto End
:YmlMissing
echo File %automl_env_file% not found.
:ErrorExit
echo Install failed

View File

@@ -1,20 +1,34 @@
#!/bin/bash
CONDA_ENV_NAME=$1
AUTOML_ENV_FILE=$2
PIP_NO_WARN_SCRIPT_LOCATION=0
if [ "$CONDA_ENV_NAME" == "" ]
then
CONDA_ENV_NAME="azure_automl"
fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then
echo "File $AUTOML_ENV_FILE not found"
exit 1
fi
if source activate $CONDA_ENV_NAME 2> /dev/null
then
conda env update -file automl_env.yml -n $CONDA_ENV_NAME
echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME
pip install --upgrade azureml-sdk[automl,notebooks]
else
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
source activate $CONDA_ENV_NAME &&
jupyter nbextension install --py azureml.train.widgets --user &&
jupyter nbextension enable --py azureml.train.widgets --user &&
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
jupyter nbextension install --py azureml.widgets --user &&
jupyter nbextension enable --py azureml.widgets --user &&
echo "" &&
echo "" &&
echo "***************************************" &&

View File

@@ -1,21 +1,36 @@
#!/bin/bash
CONDA_ENV_NAME=$1
AUTOML_ENV_FILE=$2
PIP_NO_WARN_SCRIPT_LOCATION=0
if [ "$CONDA_ENV_NAME" == "" ]
then
CONDA_ENV_NAME="azure_automl"
fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env_mac.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then
echo "File $AUTOML_ENV_FILE not found"
exit 1
fi
if source activate $CONDA_ENV_NAME 2> /dev/null
then
conda env update -file automl_env.yml -n $CONDA_ENV_NAME
echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME
pip install --upgrade azureml-sdk[automl,notebooks]
else
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
source activate $CONDA_ENV_NAME &&
conda install lightgbm -c conda-forge -y &&
jupyter nbextension install --py azureml.train.widgets --user &&
jupyter nbextension enable --py azureml.train.widgets --user &&
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
jupyter nbextension install --py azureml.widgets --user &&
jupyter nbextension enable --py azureml.widgets --user &&
pip install numpy==1.15.3
echo "" &&
echo "" &&
echo "***************************************" &&
@@ -33,3 +48,4 @@ then
fi

View File

@@ -1,15 +0,0 @@
name: project_environment
dependencies:
- python=3.6.2
- scikit-learn
- numpy
- pip:
- numpy==1.14.2
- pandas
- scipy==1.0.0
- scikit-learn==0.19.1
# Required packages for AzureML execution, history, and data preparation.
- --index-url https://azuremlsdktestpypi.azureedge.net/sdk-release/Preview/E7501C02541B433786111FE8E140CAA1
- --extra-index-url https://pypi.python.org/simple
- azureml-defaults

View File

@@ -1,4 +0,0 @@
# CLI Example Content
This content can be used in conjunction with our [CLI reference guide.](https://docs.microsoft.com/en-us/azure/machine-learning/service/reference-azure-machine-learning-cli)
Example content includes training scripts, conda environment files and scoring files.

View File

@@ -1,24 +0,0 @@
import pickle
import json
import numpy
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from azureml.core.model import Model
def init():
global model
# note here "sklearn_regression_model.pkl" is the name of the model registered under
# this is a different behavior than before when the code is run locally, even though the code is the same.
model_path = Model.get_model_path('sklearn_regression_model.pkl')
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
# note you can pass in multiple rows for scoring
def run(raw_data):
try:
data = json.loads(raw_data)['data']
data = numpy.array(data)
result = model.predict(data)
except Exception as e:
result = str(e)
return json.dumps({"result": result.tolist()})

Binary file not shown.

View File

@@ -1,44 +0,0 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np
os.makedirs('./outputs', exist_ok=True)
X, y = load_diabetes(return_X_y=True)
run = Run.get_submitted_run()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=0)
data = {"train": {"X": X_train, "y": y_train},
"test": {"X": X_test, "y": y_test}}
# list of numbers from 0.0 to 1.0 with a 0.05 interval
alphas = np.arange(0.0, 1.0, 0.05)
for alpha in alphas:
# Use Ridge algorithm to create a regression model
reg = Ridge(alpha=alpha)
reg.fit(data["train"]["X"], data["train"]["y"])
preds = reg.predict(data["test"]["X"])
mse = mean_squared_error(preds, data["test"]["y"])
run.log('alpha', alpha)
run.log('mse', mse)
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
# save model in the outputs folder so it automatically get uploaded
with open(model_file_name, "wb") as file:
joblib.dump(value=reg, filename=os.path.join('./outputs/',
model_file_name))
print('alpha is {0:.2f}, and mse is {1:0.2f}'.format(alpha, mse))

View File

@@ -1,9 +1,9 @@
# Azure Databricks - Azure ML SDK Sample Notebooks
# Azure Databricks - Azure Machine Learning SDK Sample Notebooks
**NOTE**: With the latest version of our AML SDK, there are some API changes due to which previous version of notebooks will not work.
Kindly use this v4 notebooks (updated Sep 18) if you had installed the AML SDK in your Databricks cluster please update to latest SDK version by installing azureml-sdk[databricks] as a library from GUI.
**NOTE**: With the latest version of Azure Machine Learning SDK, there are some API changes due to which previous version of notebooks will not work.
Please remove the previous SDK version and install the latest SDK by installing **azureml-sdk[databricks]** as a PyPi library in Azure Databricks workspace.
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown). We are extending it to more runtimes asap.
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
**NOTE**: Some packages like psutil upgrade libs that can cause a conflict, please install such packages by freezing lib version. Eg. "pstuil **cryptography==1.5 pyopenssl==16.0.0 ipython=2.2.0**" to avoid install error. This issue is related to Databricks and not related to AML SDK.
@@ -11,9 +11,9 @@ Kindly use this v4 notebooks (updated Sep 18) if you had installed the AML SD
The iPython Notebooks have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing .dbc instead of downloading individually.
This set of notebooks are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. For details on SDK concepts, please refer to [Private preview notebooks](https://github.com/Azure/ViennaDocs/tree/master/PrivatePreview/notebooks)
This set of notebooks are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. For details on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks)
(Recommended) [Azure Databricks AML SDK notebooks](Databricks_AMLSDK_github.dbc) A single DBC package to import all notebooks in your Databricks workspace.
(Recommended) [Azure Databricks AML SDK notebooks](Databricks_AMLSDK_github.dbc) A single DBC package to import all notebooks in your Azure Databricks workspace.
01. [Installation and Configuration](01.Installation_and_Configuration.ipynb): Install the Azure ML Python SDK and Initialize an Azure ML Workspace and save the Workspace configuration file.
02. [Ingest data](02.Ingest_data.ipynb): Download the Adult Census Income dataset and split it into train and test sets.
@@ -23,4 +23,7 @@ This set of notebooks are related to Income prediction experiment based on this
06. [Deploy to AKS](04.Deploy_to_AKS_existingImage.ipynb): Deploy model to Azure Kubernetis Service (AKS) with Azure ML Python SDK from an existing Image with model, conda and score file.
Copyright (c) Microsoft Corporation. All rights reserved.
All notebooks in this folder are licensed under the MIT License.
Apache®, Apache Spark, and Spark® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

File diff suppressed because it is too large Load Diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 19 KiB

BIN
images/yt_cover.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

28
onnx/README.md Normal file
View File

@@ -0,0 +1,28 @@
# ONNX on Azure Machine Learning
These tutorials show how to create and deploy [ONNX](http://onnx.ai) models in Azure Machine Learning environments using [ONNX Runtime](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx) for inference. Once deployed as a web service, you can ping the model with your own set of images to be analyzed!
## Tutorials
- [Obtain ONNX model from ONNX Model Zoo and deploy with ONNX Runtime inference - Handwritten Digit Classification (MNIST)](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-mnist-deploy.ipynb)
- [Obtain ONNX model from ONNX Model Zoo and deploy with ONNX Runtime inference - Facial Expression Recognition (Emotion FER+)](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-facial-emotion-recognition-deploy.ipynb)
- [Obtain ONNX model from ONNX Model Zoo and deploy with ONNX Runtime inference - Image Recognition (ResNet50)](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb)
- [Convert ONNX model from CoreML and deploy - TinyYOLO](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb)
- [Train ONNX model in PyTorch and deploy - MNIST](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb)
## Documentation
- [ONNX Runtime Python API Documentation](http://aka.ms/onnxruntime-python)
- [Azure Machine Learning API Documentation](http://aka.ms/aml-docs)
## Related Articles
- [Building and Deploying ONNX Runtime Models](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx)
- [Azure AI Making AI Real for Business](https://aka.ms/aml-blog-overview)
- [Whats new in Azure Machine Learning](https://aka.ms/aml-blog-whats-new)
## License
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
## Acknowledgements
These tutorials were developed by Vinitra Swamy and Prasanth Pulavarthi of the Microsoft AI Frameworks team and adapted for presentation at Microsoft Ignite 2018.

124
onnx/mnist.py Normal file
View File

@@ -0,0 +1,124 @@
# This is a modified version of https://github.com/pytorch/examples/blob/master/mnist/main.py which is
# licensed under BSD 3-Clause (https://github.com/pytorch/examples/blob/master/LICENSE)
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import os
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, train_loader, optimizer, epoch, output_dir):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False, reduce=True).item() # sum up batch loss
pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main():
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--batch-size', type=int, default=64, metavar='N',
help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
help='number of epochs to train (default: 10)')
parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
help='learning rate (default: 0.01)')
parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
help='SGD momentum (default: 0.5)')
parser.add_argument('--no-cuda', action='store_true', default=False,
help='disables CUDA training')
parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--output-dir', type=str, default='outputs')
args = parser.parse_args()
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
output_dir = args.output_dir
os.makedirs(output_dir, exist_ok=True)
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True,
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False,
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch, output_dir)
test(args, model, device, test_loader)
# save model
dummy_input = torch.randn(1, 1, 28, 28, device=device)
model_path = os.path.join(output_dir, 'mnist.onnx')
torch.onnx.export(model, dummy_input, model_path)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,435 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# YOLO Real-time Object Detection using ONNX on AzureML\n",
"\n",
"This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
"\n",
"## What is ONNX\n",
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n",
"## YOLO Details\n",
"You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"To make the best use of your time, make sure you have done the following:\n",
"\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (config.json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Install necessary packages\n",
"\n",
"You'll need to run the following commands to use this tutorial:\n",
"\n",
"```sh\n",
"pip install onnxmltools\n",
"pip install coremltools # use this on Linux and Mac\n",
"pip install git+https://github.com/apple/coremltools # use this on Windows\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert model to ONNX\n",
"\n",
"First we download the CoreML model. We use the CoreML model listed at https://coreml.store/tinyyolo. This may take a few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import urllib.request\n",
"\n",
"onnx_model_url = \"https://s3-us-west-2.amazonaws.com/coreml-models/TinyYOLO.mlmodel\"\n",
"urllib.request.urlretrieve(onnx_model_url, filename=\"TinyYOLO.mlmodel\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we use ONNXMLTools to convert the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import onnxmltools\n",
"import coremltools\n",
"\n",
"# Load a CoreML model\n",
"coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n",
"\n",
"# Convert from CoreML into ONNX\n",
"onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n",
"\n",
"# Save ONNX model\n",
"onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n",
"\n",
"import os\n",
"print(os.path.getsize('tinyyolov2.onnx'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploying as a web service with Azure ML\n",
"\n",
"### Load Azure ML workspace\n",
"\n",
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Registering your model with Azure ML\n",
"\n",
"Now we upload the model and register it in the workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"model = Model.register(model_path = \"tinyyolov2.onnx\",\n",
" model_name = \"tinyyolov2\",\n",
" tags = {\"onnx\": \"demo\"},\n",
" description = \"TinyYOLO\",\n",
" workspace = ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Displaying your registered models\n",
"\n",
"You can optionally list out all the models that you have registered in this workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"models = ws.models\n",
"for name, m in models.items():\n",
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Write scoring file\n",
"\n",
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import json\n",
"import time\n",
"import sys\n",
"import os\n",
"from azureml.core.model import Model\n",
"import numpy as np # we're going to use numpy to process input and output data\n",
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
"\n",
"def init():\n",
" global session\n",
" model = Model.get_model_path(model_name = 'tinyyolov2')\n",
" session = onnxruntime.InferenceSession(model)\n",
"\n",
"def preprocess(input_data_json):\n",
" # convert the JSON data into the tensor input\n",
" return np.array(json.loads(input_data_json)['data']).astype('float32')\n",
"\n",
"def postprocess(result):\n",
" return np.array(result).tolist()\n",
"\n",
"def run(input_data_json):\n",
" try:\n",
" start = time.time() # start timer\n",
" input_data = preprocess(input_data_json)\n",
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
" result = session.run([], {input_name: input_data})\n",
" end = time.time() # stop timer\n",
" return {\"result\": postprocess(result),\n",
" \"time\": end - start}\n",
" except Exception as e:\n",
" result = str(e)\n",
" return {\"error\": result}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create container image\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we have Azure ML create the container. This step will likely take a few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"TinyYOLO ONNX Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxyolo\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'demo': 'onnx'}, \n",
" description = 'web service for TinyYOLO ONNX model')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from random import randint\n",
"\n",
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if aci_service.state != 'Healthy':\n",
" # run this command for debugging.\n",
" print(aci_service.get_logs())\n",
" aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Success!\n",
"\n",
"If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(aci_service.scoring_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you are eventually done using the web service, remember to delete it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "onnx"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -12,7 +12,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Facial Expression Recognition using ONNX Runtime on AzureML\n",
"# Facial Expression Recognition (FER+) using ONNX Runtime on Azure ML\n",
"\n",
"This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n",
"\n",
@@ -34,32 +34,54 @@
"## Prerequisites\n",
"\n",
"### 1. Install Azure ML SDK and create a new workspace\n",
"Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n",
"\n",
"Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n",
"\n",
"### 2. Install additional packages needed for this Notebook\n",
"You need to install the popular plotting library `matplotlib`, the image manipulation library `PIL`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n",
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n",
"\n",
"```sh\n",
"(myenv) $ pip install matplotlib onnx Pillow\n",
"(myenv) $ pip install matplotlib onnx opencv-python\n",
"```\n",
"\n",
"**Debugging tip**: Make sure that to activate your virtual environment (myenv) before you re-launch this notebook using the `jupyter notebook` comand. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n",
"\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n",
"[Download the ONNX Emotion FER+ model and corresponding test data](https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n",
"In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# urllib is a built-in Python library to download files from URLs\n",
"\n",
"```sh\n",
"(myenv) $ tar xvzf emotion_ferplus.tar.gz\n",
"```\n",
"# Objective: retrieve the latest version of the ONNX Emotion FER+ model files from the\n",
"# ONNX Model Zoo and save it in the same folder as this tutorial\n",
"\n",
"More information can be found about the ONNX FER+ model on [github](https://github.com/onnx/models/tree/master/emotion_ferplus). For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
"import urllib.request\n",
"\n",
"onnx_model_url = \"https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz\"\n",
"\n",
"urllib.request.urlretrieve(onnx_model_url, filename=\"emotion_ferplus.tar.gz\")\n",
"\n",
"# the ! magic command tells our jupyter notebook kernel to run the following line of \n",
"# code from the command line instead of the notebook kernel\n",
"\n",
"# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n",
"\n",
"!tar xvzf emotion_ferplus.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Azure ML workspace\n",
"## Deploy a VM with your ONNX model in the Cloud\n",
"\n",
"### Load Azure ML workspace\n",
"\n",
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
]
@@ -136,9 +158,9 @@
"metadata": {},
"outputs": [],
"source": [
"models = ws.models()\n",
"for m in models:\n",
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
"models = ws.models\n",
"for name, m in models.items():\n",
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
]
},
{
@@ -147,9 +169,9 @@
"source": [
"### ONNX FER+ Model Methodology\n",
"\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n",
"\n",
"The original Facial Emotion Recognition (FER) Dataset was released in 2013, but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a better basis for ground truth. \n",
"The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n",
"\n",
"You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n",
"\n",
@@ -202,20 +224,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy our model on Azure ML"
"### Specify our Score and Environment Files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n",
"\n",
"You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
"\n",
"### Write Score File\n",
"\n",
"A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime GPU inference session to evaluate the data passed in on our function calls."
"A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls."
]
},
{
@@ -248,10 +268,13 @@
" try:\n",
" # load in our data, convert to readable format\n",
" data = np.array(json.loads(input_data)['data']).astype('float32')\n",
" \n",
" start = time.time()\n",
" r = session.run([output_name], {input_name : data})\n",
" end = time.time()\n",
" \n",
" result = emotion_map(postprocess(r[0]))\n",
" \n",
" result_dict = {\"result\": result,\n",
" \"time_in_sec\": [end - start]}\n",
" except Exception as e:\n",
@@ -260,9 +283,12 @@
" return json.dumps(result_dict)\n",
"\n",
"def emotion_map(classes, N=1):\n",
" \"\"\"Take the most probable labels (output of postprocess) and returns the top N emotional labels that fit the picture.\"\"\"\n",
" \"\"\"Take the most probable labels (output of postprocess) and returns the \n",
" top N emotional labels that fit the picture.\"\"\"\n",
" \n",
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n",
" 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
" \n",
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
" emotion_keys = list(emotion_table.keys())\n",
" emotions = []\n",
" for i in range(N):\n",
@@ -276,8 +302,8 @@
" return e_x / e_x.sum(axis=0)\n",
"\n",
"def postprocess(scores):\n",
" \"\"\"This function takes the scores generated by the network and returns the class IDs in decreasing \n",
" order of probability.\"\"\"\n",
" \"\"\"This function takes the scores generated by the network and \n",
" returns the class IDs in decreasing order of probability.\"\"\"\n",
" prob = softmax(scores)\n",
" prob = np.squeeze(prob)\n",
" classes = np.argsort(prob)[::-1]\n",
@@ -299,11 +325,7 @@
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies()\n",
"myenv.add_pip_package(\"numpy\")\n",
"myenv.add_pip_package(\"azureml-core\")\n",
"myenv.add_pip_package(\"onnxruntime\")\n",
"\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
@@ -329,11 +351,11 @@
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"test\",\n",
" description = \"Emotion ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"})\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxtest\",\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
@@ -346,8 +368,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Debugging\n",
"\n",
"In case you need to debug your code, the next line of code accesses the log file."
]
},
@@ -364,9 +384,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"## Deploy the container image"
"### Deploy the container image"
]
},
{
@@ -439,23 +459,56 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing and Evaluation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Useful Helper Functions\n",
"## Testing and Evaluation\n",
"\n",
"### Useful Helper Functions\n",
"\n",
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def emotion_map(classes, N=1):\n",
" \"\"\"Take the most probable labels (output of postprocess) and returns the \n",
" top N emotional labels that fit the picture.\"\"\"\n",
" \n",
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n",
" 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
" \n",
" emotion_keys = list(emotion_table.keys())\n",
" emotions = []\n",
" for i in range(N):\n",
" emotions.append(emotion_keys[classes[i]])\n",
" return emotions\n",
"\n",
"def softmax(x):\n",
" \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n",
" x = x.reshape(-1)\n",
" e_x = np.exp(x - np.max(x))\n",
" return e_x / e_x.sum(axis=0)\n",
"\n",
"def postprocess(scores):\n",
" \"\"\"This function takes the scores generated by the network and \n",
" returns the class IDs in decreasing order of probability.\"\"\"\n",
" prob = softmax(scores)\n",
" prob = np.squeeze(prob)\n",
" classes = np.argsort(prob)[::-1]\n",
" return classes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Test Data"
"### Load Test Data\n",
"\n",
"These are already in your directory from your ONNX model download (from the model zoo).\n",
"\n",
"Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays."
]
},
{
@@ -475,8 +528,6 @@
"import json\n",
"import os\n",
"\n",
"from score import emotion_map, softmax, postprocess\n",
"\n",
"test_inputs = []\n",
"test_outputs = []\n",
"\n",
@@ -499,7 +550,7 @@
" tensor.ParseFromString(f.read())\n",
" \n",
" output_data = numpy_helper.to_array(tensor)\n",
" output_processed = emotion_map(postprocess(output_data))[0]\n",
" output_processed = emotion_map(postprocess(output_data[0]))[0]\n",
" test_outputs.append(output_processed)"
]
},
@@ -512,7 +563,7 @@
},
"source": [
"### Show some sample images\n",
"We use `matplotlib` to plot 3 test images from the model zoo with their labels over them."
"We use `matplotlib` to plot 3 test images from the dataset."
]
},
{
@@ -532,7 +583,7 @@
" plt.axhline('')\n",
" plt.axvline('')\n",
" plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n",
" plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.Greys)\n",
" plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.gray)\n",
"plt.show()"
]
},
@@ -571,7 +622,7 @@
" print(r['error'])\n",
" break\n",
" \n",
" result = r['result'][0][0]\n",
" result = r['result'][0]\n",
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
" \n",
" ground_truth = test_outputs[i]\n",
@@ -583,7 +634,7 @@
"\n",
" # use different color for misclassified sample\n",
" font_color = 'red' if ground_truth != result else 'black'\n",
" clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n",
" clr_map = plt.cm.Greys if ground_truth != result else plt.cm.gray\n",
"\n",
" # ground truth labels are in blue\n",
" plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n",
@@ -611,15 +662,30 @@
"metadata": {},
"outputs": [],
"source": [
"from PIL import Image\n",
"# Preprocessing functions take your image and format it so it can be passed\n",
"# as input into our ONNX model\n",
"\n",
"def preprocess(image_path):\n",
" input_shape = (1, 1, 64, 64)\n",
" img = Image.open(image_path)\n",
" img = img.resize((64, 64), Image.ANTIALIAS)\n",
" img_data = np.array(img)\n",
" img_data = np.resize(img_data, input_shape)\n",
" return img_data"
"import cv2\n",
"\n",
"def rgb2gray(rgb):\n",
" \"\"\"Convert the input image into grayscale\"\"\"\n",
" return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n",
"\n",
"def resize_img(img):\n",
" \"\"\"Resize image to MNIST model input dimensions\"\"\"\n",
" img = cv2.resize(img, dsize=(64, 64), interpolation=cv2.INTER_AREA)\n",
" img.resize((1, 1, 64, 64))\n",
" return img\n",
"\n",
"def preprocess(img):\n",
" \"\"\"Resize input images and convert them to grayscale.\"\"\"\n",
" if img.shape == (64, 64):\n",
" img.resize((1, 1, 64, 64))\n",
" return img\n",
" \n",
" grayscale = rgb2gray(img)\n",
" processed_img = resize_img(grayscale)\n",
" return processed_img"
]
},
{
@@ -634,14 +700,19 @@
"# Any PNG or JPG image file should work\n",
"# Make sure to include the entire path with // instead of /\n",
"\n",
"# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//emotion_test_images//img_1.png\"\n",
"# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/face.png\"\n",
"\n",
"your_test_image = \"<path to file>\"\n",
"\n",
"import matplotlib.image as mpimg\n",
"\n",
"if your_test_image != \"<path to file>\":\n",
" img = preprocess(your_test_image)\n",
" img = mpimg.imread(your_test_image)\n",
" plt.subplot(1,3,1)\n",
" plt.imshow(img.reshape((64,64)), cmap = plt.cm.gray)\n",
" plt.imshow(img, cmap = plt.cm.Greys)\n",
" print(\"Old Dimensions: \", img.shape)\n",
" img = preprocess(img)\n",
" print(\"New Dimensions: \", img.shape)\n",
"else:\n",
" img = None"
]
@@ -659,7 +730,7 @@
"\n",
" try:\n",
" r = json.loads(aci_service.run(input_data))\n",
" result = r['result'][0][0]\n",
" result = r['result'][0]\n",
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
" except Exception as e:\n",
" print(str(e))\n",
@@ -668,12 +739,13 @@
" plt.subplot(1,8,1)\n",
" plt.axhline('')\n",
" plt.axvline('')\n",
" plt.text(x = -10, y = -35, s = \"Model prediction: \", fontsize = 14)\n",
" plt.text(x = -10, y = -20, s = \"Inference time: \", fontsize = 14)\n",
" plt.text(x = 100, y = -35, s = str(result), fontsize = 14)\n",
" plt.text(x = 100, y = -20, s = str(time_ms) + \" ms\", fontsize = 14)\n",
" plt.text(x = -10, y = -8, s = \"Input image: \", fontsize = 14)\n",
" plt.imshow(img.reshape(64, 64), cmap = plt.cm.gray) "
" plt.text(x = -10, y = -40, s = \"Model prediction: \", fontsize = 14)\n",
" plt.text(x = -10, y = -25, s = \"Inference time: \", fontsize = 14)\n",
" plt.text(x = 100, y = -40, s = str(result), fontsize = 14)\n",
" plt.text(x = 100, y = -25, s = str(time_ms) + \" ms\", fontsize = 14)\n",
" plt.text(x = -10, y = -10, s = \"Model Input image: \", fontsize = 14)\n",
" plt.imshow(img.reshape((64, 64)), cmap = plt.cm.gray) \n",
" "
]
},
{
@@ -708,10 +780,15 @@
}
],
"metadata": {
"authors": [
{
"name": "viswamy"
}
],
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.6",
"language": "python",
"name": "python3"
"name": "python36"
},
"language_info": {
"codemirror_mode": {
@@ -723,7 +800,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
},
"msauthor": "vinitra.swamy"
},

View File

@@ -22,9 +22,9 @@
"\n",
"#### Tutorial Objectives:\n",
"\n",
"1. Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n",
"2. Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n",
"3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML"
"- Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n",
"- Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n",
"- Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML"
]
},
{
@@ -34,31 +34,61 @@
"## Prerequisites\n",
"\n",
"### 1. Install Azure ML SDK and create a new workspace\n",
"Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n",
"Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n",
"\n",
"### 2. Install additional packages needed for this Notebook\n",
"### 2. Install additional packages needed for this tutorial notebook\n",
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n",
"\n",
"```sh\n",
"(myenv) $ pip install matplotlib onnx opencv-python\n",
"```\n",
"\n",
"**Debugging tip**: Make sure that you run the \"jupyter notebook\" command to launch this notebook after activating your virtual environment. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n",
"\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n",
"[Download the ONNX MNIST model and corresponding test data](https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n",
"In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# urllib is a built-in Python library to download files from URLs\n",
"\n",
"```sh\n",
"(myenv) $ tar xvzf mnist.tar.gz\n",
"```\n",
"# Objective: retrieve the latest version of the ONNX MNIST model files from the\n",
"# ONNX Model Zoo and save it in the same folder as this tutorial\n",
"\n",
"More information can be found about the ONNX MNIST model on [github](https://github.com/onnx/models/tree/master/mnist). For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
"import urllib.request\n",
"\n",
"onnx_model_url = \"https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz\"\n",
"\n",
"urllib.request.urlretrieve(onnx_model_url, filename=\"mnist.tar.gz\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# the ! magic command tells our jupyter notebook kernel to run the following line of \n",
"# code from the command line instead of the notebook kernel\n",
"\n",
"# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n",
"\n",
"!tar xvzf mnist.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Azure ML workspace\n",
"## Deploy a VM with your ONNX model in the Cloud\n",
"\n",
"### Load Azure ML workspace\n",
"\n",
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
]
@@ -113,11 +143,11 @@
"source": [
"from azureml.core.model import Model\n",
"\n",
"model = Model.register(model_path = model_dir + \"//model.onnx\",\n",
"model = Model.register(workspace = ws,\n",
" model_path = model_dir + \"/\" + \"model.onnx\",\n",
" model_name = \"mnist_1\",\n",
" tags = {\"onnx\": \"demo\"},\n",
" description = \"MNIST image classification CNN from ONNX Model Zoo\",\n",
" workspace = ws)"
" description = \"MNIST image classification CNN from ONNX Model Zoo\",)"
]
},
{
@@ -135,9 +165,9 @@
"metadata": {},
"outputs": [],
"source": [
"models = ws.models()\n",
"for m in models:\n",
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
"models = ws.models\n",
"for name, m in models.items():\n",
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
]
},
{
@@ -188,16 +218,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy our model on Azure ML"
"### Specify our Score and Environment Files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n",
"\n",
"You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
"\n",
"### Write Score File\n",
"\n",
@@ -248,7 +276,7 @@
" return json.dumps(result_dict)\n",
"\n",
"def choose_class(result_prob):\n",
" \"\"\"We use argmax to determine the right label to choose from our output, after calling softmax on the 10 numbers we receive\"\"\"\n",
" \"\"\"We use argmax to determine the right label to choose from our output\"\"\"\n",
" return int(np.argmax(result_prob, axis=0))"
]
},
@@ -256,14 +284,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Write Environment File"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This step creates a YAML file that specifies which dependencies we would like to see in our Linux Virtual Machine."
"### Write Environment File\n",
"\n",
"This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine."
]
},
{
@@ -274,11 +297,7 @@
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies()\n",
"myenv.add_pip_package(\"numpy\")\n",
"myenv.add_pip_package(\"azureml-core\")\n",
"myenv.add_pip_package(\"onnxruntime\")\n",
"\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
@@ -289,7 +308,6 @@
"metadata": {},
"source": [
"### Create the Container Image\n",
"\n",
"This step will likely take a few minutes."
]
},
@@ -304,11 +322,11 @@
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"test\",\n",
" tags = {\"demo\": \"onnx\"}) )\n",
" description = \"MNIST ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"}) \n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxtest\",\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
@@ -321,8 +339,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Debugging\n",
"\n",
"In case you need to debug your code, the next line of code accesses the log file."
]
},
@@ -339,9 +355,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"## Deploy the container image"
"### Deploy the container image"
]
},
{
@@ -414,16 +430,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing and Evaluation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Test Data\n",
"## Testing and Evaluation\n",
"\n",
"These are already in your directory from your ONNX model download (from the model zoo). If you didn't place your model and test data in the same directory as this notebook, edit the \"model_dir\" filename below."
"### Load Test Data\n",
"\n",
"These are already in your directory from your ONNX model download (from the model zoo).\n",
"\n",
"Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays."
]
},
{
@@ -579,7 +592,9 @@
"metadata": {},
"outputs": [],
"source": [
"# Preprocessing functions\n",
"# Preprocessing functions take your image and format it so it can be passed\n",
"# as input into our ONNX model\n",
"\n",
"import cv2\n",
"\n",
"def rgb2gray(rgb):\n",
@@ -587,12 +602,17 @@
" return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n",
"\n",
"def resize_img(img):\n",
" \"\"\"Resize image to MNIST model input dimensions\"\"\"\n",
" img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n",
" img.resize((1, 1, 28, 28))\n",
" return img\n",
"\n",
"def preprocess(img):\n",
" \"\"\"Resize input images and convert them to grayscale.\"\"\"\n",
" if img.shape == (28, 28):\n",
" img.resize((1, 1, 28, 28))\n",
" return img\n",
" \n",
" grayscale = rgb2gray(img)\n",
" processed_img = resize_img(grayscale)\n",
" return processed_img"
@@ -608,12 +628,11 @@
"# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n",
"\n",
"# Any PNG or JPG image file should work\n",
"# Make sure to include the entire path with // instead of /\n",
"\n",
"# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//digit.png\"\n",
"\n",
"your_test_image = \"<path to file>\"\n",
"\n",
"# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/handwritten_digit.png\"\n",
"\n",
"import matplotlib.image as mpimg\n",
"\n",
"if your_test_image != \"<path to file>\":\n",
@@ -738,16 +757,21 @@
"- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n",
"\n",
"Next steps:\n",
"- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine with GPU support.\n",
"- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine.\n",
"- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)"
]
}
],
"metadata": {
"authors": [
{
"name": "viswamy"
}
],
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.6",
"language": "python",
"name": "python3"
"name": "python36"
},
"language_info": {
"codemirror_mode": {
@@ -759,7 +783,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
},
"msauthor": "vinitra.swamy"
},

View File

@@ -0,0 +1,419 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ResNet50 Image Classification using ONNX and AzureML\n",
"\n",
"This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
"\n",
"## What is ONNX\n",
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n",
"## ResNet50 Details\n",
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"To make the best use of your time, make sure you have done the following:\n",
"\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (config.json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download pre-trained ONNX model from ONNX Model Zoo.\n",
"\n",
"Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and extract it in the same folder as this tutorial notebook.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import urllib.request\n",
"\n",
"onnx_model_url = \"https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz\"\n",
"urllib.request.urlretrieve(onnx_model_url, filename=\"resnet50v2.tar.gz\")\n",
"\n",
"!tar xvzf resnet50v2.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploying as a web service with Azure ML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load your Azure ML workspace\n",
"\n",
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register your model with Azure ML\n",
"\n",
"Now we upload the model and register it in the workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n",
" model_name = \"resnet50v2\",\n",
" tags = {\"onnx\": \"demo\"},\n",
" description = \"ResNet50v2 from ONNX Model Zoo\",\n",
" workspace = ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Displaying your registered models\n",
"\n",
"You can optionally list out all the models that you have registered in this workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"models = ws.models\n",
"for name, m in models.items():\n",
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Write scoring file\n",
"\n",
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import json\n",
"import time\n",
"import sys\n",
"import os\n",
"from azureml.core.model import Model\n",
"import numpy as np # we're going to use numpy to process input and output data\n",
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
"\n",
"def softmax(x):\n",
" x = x.reshape(-1)\n",
" e_x = np.exp(x - np.max(x))\n",
" return e_x / e_x.sum(axis=0)\n",
"\n",
"def init():\n",
" global session\n",
" model = Model.get_model_path(model_name = 'resnet50v2')\n",
" session = onnxruntime.InferenceSession(model, None)\n",
"\n",
"def preprocess(input_data_json):\n",
" # convert the JSON data into the tensor input\n",
" img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n",
" \n",
" #normalize\n",
" mean_vec = np.array([0.485, 0.456, 0.406])\n",
" stddev_vec = np.array([0.229, 0.224, 0.225])\n",
" norm_img_data = np.zeros(img_data.shape).astype('float32')\n",
" for i in range(img_data.shape[0]):\n",
" norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n",
"\n",
" return norm_img_data\n",
"\n",
"def postprocess(result):\n",
" return softmax(np.array(result)).tolist()\n",
"\n",
"def run(input_data_json):\n",
" try:\n",
" start = time.time()\n",
" # load in our data which is expected as NCHW 224x224 image\n",
" input_data = preprocess(input_data_json)\n",
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
" result = session.run([], {input_name: input_data})\n",
" end = time.time() # stop timer\n",
" return {\"result\": postprocess(result),\n",
" \"time\": end - start}\n",
" except Exception as e:\n",
" result = str(e)\n",
" return {\"error\": result}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create container image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we create a YAML file that specifies which dependencies we would like to see in our container."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we have Azure ML create the container. This step will likely take a few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"ONNX ResNet50 Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case you need to debug your code, the next line of code accesses the log file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'demo': 'onnx'}, \n",
" description = 'web service for ResNet50 ONNX model')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from random import randint\n",
"\n",
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n",
"\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if aci_service.state != 'Healthy':\n",
" # run this command for debugging.\n",
" print(aci_service.get_logs())\n",
" aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Success!\n",
"\n",
"If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(aci_service.scoring_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you are eventually done using the web service, remember to delete it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "onnx"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@@ -40,8 +40,8 @@
"metadata": {},
"outputs": [],
"source": [
"!jupyter nbextension install --py --user azureml.train.widgets\n",
"!jupyter nbextension enable --py --user azureml.train.widgets"
"!jupyter nbextension install --py --user azureml.widgets\n",
"!jupyter nbextension enable --py --user azureml.widgets"
]
},
{
@@ -53,6 +53,11 @@
}
],
"metadata": {
"authors": [
{
"name": "hichando"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -16,6 +16,14 @@
"This notebook demonstrates how to run batch scoring job. __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and unlabeled images from __[ImageNet](http://image-net.org/)__ dataset will be used. It registers a pretrained inception model in model registry then uses the model to do batch scoring on images in a blob container."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"Make sure you go through the [00. Installation and Configuration](./00.configuration.ipynb) Notebook first if you haven't.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -32,11 +40,10 @@
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
"\n",
"# Also create a Project and attach to Workspace\n",
"project_folder = \"sample_projects\"\n",
"run_history_name = project_folder\n",
"scripts_folder = \"scripts\"\n",
"\n",
"if not os.path.isdir(project_folder):\n",
" os.mkdir(project_folder)"
"if not os.path.isdir(scripts_folder):\n",
" os.mkdir(scripts_folder)"
]
},
{
@@ -45,7 +52,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import BatchAiCompute, ComputeTarget\n",
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"from azureml.core.datastore import Datastore\n",
"from azureml.data.data_reference import DataReference\n",
"from azureml.pipeline.core import Pipeline, PipelineData\n",
@@ -67,21 +74,35 @@
"metadata": {},
"outputs": [],
"source": [
"# Batch AI compute\n",
"cluster_name = \"gpu_cluster\"\n",
"try:\n",
" cluster = BatchAiCompute(ws, cluster_name)\n",
" print(\"found existing cluster.\")\n",
"except:\n",
" print(\"creating new cluster\")\n",
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n",
" autoscale_enabled = True,\n",
" cluster_min_nodes = 0, \n",
" cluster_max_nodes = 1)\n",
"import os\n",
"\n",
"# choose a name for your cluster\n",
"compute_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", \"gpucluster\")\n",
"compute_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 0)\n",
"compute_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 4)\n",
"vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_NC6\")\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n",
" vm_priority = 'lowpriority', # optional\n",
" min_nodes = compute_min_nodes, \n",
" max_nodes = compute_max_nodes)\n",
"\n",
" # create the cluster\n",
" cluster = ComputeTarget.create(ws, cluster_name, provisioning_config)\n",
" cluster.wait_for_completion(show_output=True)"
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
" print(compute_target.status.serialize())"
]
},
{
@@ -104,7 +125,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/batchai_score.py\n",
"%%writefile $scripts_folder/batchai_score.py\n",
"import os\n",
"import argparse\n",
"import datetime,time\n",
@@ -225,6 +246,15 @@
"## Prepare Model and Input data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Model\n",
"\n",
"Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to `\"models\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -238,27 +268,29 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Download Model\n",
"<font color=red>This manual step is required to register the model to the workspace</font>\n",
"import tarfile\n",
"import urllib.request\n",
"\n",
"Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to model_dir"
"url=\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\"\n",
"response = urllib.request.urlretrieve(url, \"model.tar.gz\")\n",
"tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n",
"tar.extractall(model_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get samples images and upload to Datastore\n",
"<font color=red>This manual step is required to run batchai_score.py</font>\n",
"### Create a datastore that points to blob container containing sample images\n",
"\n",
"Download and extract sample images from ImageNet evaluation set and **upload** to a blob that will be registered as a Datastore in the next step\n",
"We have created a public blob container `sampledata` on an account named `pipelinedata` containing images from ImageNet evaluation set. In the next step, we create a datastore with name `images_datastore` that points to this container. The `overwrite=True` step overwrites any datastore that was created previously with that name. \n",
"\n",
"A copy of sample images from ImageNet evaluation set can be found at __[BatchAI Samples Blob](https://batchaisamples.blob.core.windows.net/samples/imagenet_samples.zip?st=2017-09-29T18%3A29%3A00Z&se=2099-12-31T08%3A00%3A00Z&sp=rl&sv=2016-05-31&sr=c&sig=PmhL%2BYnYAyNTZr1DM2JySvrI12e%2F4wZNIwCtf7TRI%2BM%3D)__ \n",
"\n",
"There are multiple ways to create folders and upload files into Azure Blob Container - you can use __[Azure Portal](https://ms.portal.azure.com/)__, __[Storage Explorer](http://storageexplorer.com/)__, __[Azure CLI2](https://render.githubusercontent.com/azure-cli-extension)__ or Azure SDK for your preferable programming language. "
"This step can be changed to point to your blob container by providing an additional `account_key` parameter with `account_name`. "
]
},
{
@@ -267,8 +299,8 @@
"metadata": {},
"outputs": [],
"source": [
"account_name = \"batchscoringdata\"\n",
"sample_data = Datastore.register_azure_blob_container(ws, \"sampledata\", \"sampledata\", \n",
"account_name = \"pipelinedata\"\n",
"sample_data = Datastore.register_azure_blob_container(ws, datastore_name=\"images_datastore\", container_name=\"sampledata\", \n",
" account_name=account_name, \n",
" overwrite=True)"
]
@@ -293,7 +325,7 @@
"metadata": {},
"outputs": [],
"source": [
"default_ds = \"workspaceblobstore\""
"default_ds = ws.get_default_datastore()"
]
},
{
@@ -338,7 +370,7 @@
" mode=\"download\" \n",
" )\n",
"output_dir = PipelineData(name=\"scores\", \n",
" datastore_name=default_ds, \n",
" datastore=default_ds, \n",
" output_path_on_compute=\"batchscoring/results\")"
]
},
@@ -381,13 +413,15 @@
"metadata": {},
"outputs": [],
"source": [
"cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.4.0\", \"azureml-defaults\"])\n",
"from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n",
"\n",
"cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.10.0\", \"azureml-defaults\"])\n",
"\n",
"# Runconfig\n",
"batchai_run_config = RunConfiguration(conda_dependencies=cd)\n",
"batchai_run_config.environment.docker.enabled = True\n",
"batchai_run_config.environment.docker.gpu_support = True\n",
"batchai_run_config.environment.docker.base_image = \"microsoft/mmlspark:gpu-0.12\"\n",
"batchai_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n",
"batchai_run_config.environment.spark.precache_packages = False"
]
},
@@ -431,11 +465,11 @@
" \"--label_dir\", label_dir, \n",
" \"--output_dir\", output_dir, \n",
" \"--batch_size\", batch_size_param],\n",
" target=cluster,\n",
" compute_target=compute_target,\n",
" inputs=[input_images, label_dir],\n",
" outputs=[output_dir],\n",
" runconfig=batchai_run_config,\n",
" source_directory=project_folder\n",
" source_directory=scripts_folder\n",
")"
]
},
@@ -462,7 +496,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run).show()"
]
},
@@ -572,9 +606,12 @@
"source": [
"from azureml.pipeline.core import PublishedPipeline\n",
"\n",
"rest_endpoint = PublishedPipeline.get_endpoint(published_id, ws)\n",
"rest_endpoint = published_pipeline.endpoint\n",
"# specify batch size when running the pipeline\n",
"response = requests.post(rest_endpoint, headers=aad_token, json={\"param_batch_size\": 50})\n",
"response = requests.post(rest_endpoint, \n",
" headers=aad_token, \n",
" json={\"ExperimentName\": \"batch_scoring\",\n",
" \"ParameterAssignments\": {\"param_batch_size\": 50}})\n",
"run_id = response.json()[\"Id\"]"
]
},
@@ -592,13 +629,18 @@
"outputs": [],
"source": [
"from azureml.pipeline.core.run import PipelineRun\n",
"published_pipeline_run = PipelineRun(ws.experiments()[\"batch_scoring\"], run_id)\n",
"published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n",
"\n",
"RunDetails(published_pipeline_run).show()"
]
}
],
"metadata": {
"authors": [
{
"name": "hichando"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -614,7 +656,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.2"
}
},
"nbformat": 4,

51
pr.md Normal file
View File

@@ -0,0 +1,51 @@
# Azure Machine Learning Resources & Links
## Product Documentation
- [Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/)
- [Azure Machine Learning Studio](https://docs.microsoft.com/en-us/azure/machine-learning/studio/)
## Product Team Blogs
- [Whats new in Azure Machine Learning service](https://aka.ms/aml-blog-whats-new)
- [Announcing automated ML capability in Azure Machine Learning](https://aka.ms/aml-blog-automl)
- [Experimentation using Azure Machine Learning](https://aka.ms/aml-blog-experimentation)
- [Azure AI Making AI real for business](https://aka.ms/aml-blog-overview)
## Community Blogs
- [Power Bat How Spektacom is Powering the Game of Cricket with Microsoft AI](https://blogs.technet.microsoft.com/machinelearning/2018/10/11/power-bat-how-spektacom-is-powering-the-game-of-cricket-with-microsoft-ai/)
## Ignite 2018 Public Preview Launch Sessions
- [AI with Azure Machine Learning services: Simplifying the data science process](https://myignite.techcommunity.microsoft.com/sessions/66248)
- [AI TechTalk: Azure Machine Learning SDK - a walkthrough](https://myignite.techcommunity.microsoft.com/sessions/66265)
- [AI for an intelligent cloud and intelligent edge: Discover, deploy, and manage with Azure ML services](https://myignite.techcommunity.microsoft.com/sessions/65389)
- [Generating high quality models efficiently using Automated ML and Hyperparameter Tuning](https://myignite.techcommunity.microsoft.com/sessions/66245)
- [AI for pros: Deep learning with PyTorch using the Azure Data Science Virtual Machine and scaling training with Azure ML](https://myignite.techcommunity.microsoft.com/sessions/66244)
## Get-started Videos on YouTube
- [Get started with Python SDK](https://youtu.be/VIsXeTuW3FU)
- [Get started from Azure Portal](https://youtu.be/lCkYUHV86Mk)
## Third Party Articles
- [Azures new machine learning features embrace Python](https://www.infoworld.com/article/3306840/azure/azures-new-machine-learning-features-embrace-python.html) (InfoWorld)
- [How to use Azure ML in Windows 10](https://www.infoworld.com/article/3308381/azure/how-to-use-azure-ml-in-windows-10.html) (InfoWorld)
- [How Azure ML Streamlines Cloud-based Machine Learning](https://thenewstack.io/how-the-azure-ml-streamlines-cloud-based-machine-learning/) (The New Stack)
- [Facebook launches PyTorch 1.0 with integrations for Google Cloud, AWS, and Azure Machine Learning](https://venturebeat.com/2018/10/02/facebook-launches-pytorch-1-0-integrations-for-google-cloud-aws-and-azure-machine-learning/) (VentureBeat)
- [How Microsoft Uses Machine Learning to Help You Build Machine Learning Pipelines](https://towardsdatascience.com/how-microsoft-uses-machine-learning-to-help-you-build-machine-learning-pipelines-be75f710613b) (Towards Data Science)
- [Microsoft's Machine Learning Tools for Developers Get Smarter](https://techcrunch.com/2018/09/24/microsofts-machine-learning-tools-for-developers-get-smarter/) (TechCrunch)
- [Microsoft introduces Azure service to automatically build AI models](https://venturebeat.com/2018/09/24/microsoft-introduces-azure-service-to-automatically-build-ai-models/) (VentureBeat)
## Community Projects
- [Fashion MNIST](https://github.com/amynic/azureml-sdk-fashion)
- Keras on Databricks
- [Samples from CSS](https://github.com/Azure/AMLSamples)
## Azure Machine Learning Studio Resources
- [A-Z Machine Learning using Azure Machine Learning (AzureML)](https://www.udemy.com/machine-learning-using-azureml/)
- [Machine Learning In The Cloud With Azure Machine Learning](https://www.udemy.com/machine-learning-in-the-cloud-with-azure-machine-learning/)
- [How to Become A Data Scientist Using Azure Machine Learning](https://www.udemy.com/azure-machine-learning-introduction/)
- [Learn Azure Machine Learning from scratch](https://www.udemy.com/learn-azure-machine-learning-from-scratch/)
- [Azure Machine Learning Studio PowerShell Module](https://aka.ms/amlps)
## Forum Help
- [Azure Machine Learning service](https://social.msdn.microsoft.com/Forums/en-US/home?forum=AzureMachineLearningService)
- [Azure Machine Learning Studio](https://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning)

View File

@@ -434,12 +434,13 @@
"from azureml.core.image import Image\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage\n",
"from azureml.exceptions import WebserviceException\n",
"\n",
"model_name = \"catsanddogs-resnet50-model\"\n",
"image_name = \"catsanddogs-resnet50-image\"\n",
"service_name = \"modelbuild-service\"\n",
"\n",
"registered_model = Model.register(ws, service_def_path, model_name)\n",
"registered_model = Model.register(ws, model_def_path, model_name)\n",
"\n",
"image_config = BrainwaveImage.image_configuration()\n",
"deployment_config = BrainwaveWebservice.deploy_configuration()\n",
@@ -448,8 +449,10 @@
" service = Webservice(ws, service_name)\n",
" service.delete()\n",
" service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n",
" service.wait_for_deployment(True)\n",
"except WebserviceException:\n",
" service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)"
" service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n",
" service.wait_for_deployment(True)"
]
},
{
@@ -594,6 +597,11 @@
}
],
"metadata": {
"authors": [
{
"name": "coverste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -80,7 +80,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.brainwave.models import QuantizedResnet50, Resnet50\n",
"from azureml.contrib.brainwave.models import QuantizedResnet50\n",
"model_path = os.path.expanduser('~/models')\n",
"model = QuantizedResnet50(model_path, is_frozen = True)\n",
"feature_tensor = model.import_graph_def(image_tensors)\n",
@@ -103,7 +103,7 @@
"metadata": {},
"outputs": [],
"source": [
"classifier_input, classifier_output = Resnet50.get_default_classifier(feature_tensor, model_path)"
"classifier_output = model.get_default_classifier(feature_tensor)"
]
},
{
@@ -131,7 +131,7 @@
"with tf.Session() as sess:\n",
" model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n",
" model_def.pipeline.append(BrainWaveStage(sess, model))\n",
" model_def.pipeline.append(TensorflowStage(sess, classifier_input, classifier_output))\n",
" model_def.pipeline.append(TensorflowStage(sess, feature_tensor, classifier_output))\n",
" model_def.save(model_def_path)\n",
" print(model_def_path)"
]
@@ -198,7 +198,7 @@
" image_config = BrainwaveImage.image_configuration()\n",
" deployment_config = BrainwaveWebservice.deploy_configuration()\n",
" service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n",
" service.wait_for_deployment(true)"
" service.wait_for_deployment(True)"
]
},
{
@@ -265,9 +265,7 @@
"metadata": {},
"outputs": [],
"source": [
"service.delete()\n",
" \n",
"registered_model.delete()"
"service.delete()"
]
},
{
@@ -286,6 +284,11 @@
}
],
"metadata": {
"authors": [
{
"name": "coverste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -404,7 +404,7 @@
" image_config = BrainwaveImage.image_configuration()\n",
" deployment_config = BrainwaveWebservice.deploy_configuration()\n",
" service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)\n",
" service.wait_for_deployment(true)"
" service.wait_for_deployment(True)"
]
},
{
@@ -544,6 +544,11 @@
}
],
"metadata": {
"authors": [
{
"name": "coverste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

Binary file not shown.

Before

Width:  |  Height:  |  Size: 61 KiB

View File

@@ -43,6 +43,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -82,7 +104,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
@@ -93,10 +115,8 @@
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
@@ -244,7 +264,7 @@
"In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n",
"```Python\n",
"from azureml.core.run import Run\n",
"run = Run.get_submitted_run()\n",
"run = Run.get_context()\n",
"```\n",
"Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n",
"```Python\n",
@@ -311,7 +331,7 @@
"\n",
"script_params = {\n",
" '--data_dir': ds_data,\n",
" '--num_epochs': 25,\n",
" '--num_epochs': 10,\n",
" '--output_dir': './outputs'\n",
"}\n",
"\n",
@@ -365,10 +385,26 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can block until the script has completed training before running more code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -409,7 +445,7 @@
" policy=early_termination_policy,\n",
" primary_metric_name='best_val_acc',\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
" max_total_runs=20,\n",
" max_total_runs=8,\n",
" max_concurrent_runs=4)"
]
},
@@ -444,11 +480,27 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(hyperdrive_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or block until the HyperDrive sweep has completed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hyperdrive_run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -524,7 +576,7 @@
"metadata": {},
"source": [
"### Create environment file\n",
"Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by AML. In this case, we need to specify `torch`, `torchvision`, `pillow`, and `azureml-sdk`."
"Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by AML. In this case, we need to specify `azureml-core`, `torch` and `torchvision`."
]
},
{
@@ -533,16 +585,14 @@
"metadata": {},
"outputs": [],
"source": [
"%%writefile myenv.yml\n",
"name: myenv\n",
"channels:\n",
" - defaults\n",
"dependencies:\n",
" - pip:\n",
" - torch\n",
" - torchvision\n",
" - pillow\n",
" - azureml-core"
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(pip_packages=['azureml-core', 'torch', 'torchvision'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())\n",
" \n",
"print(myenv.serialize_to_string())"
]
},
{
@@ -594,25 +644,7 @@
"metadata": {},
"source": [
"### Deploy the registered model\n",
"Finally, let's deploy a web service from our registered model. First, retrieve the model from your workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"model = Model(ws, name='pytorch-hymenoptera')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list."
"Finally, let's deploy a web service from our registered model. Deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list."
]
},
{
@@ -688,18 +720,10 @@
"metadata": {},
"outputs": [],
"source": [
"import os, json, base64\n",
"from io import BytesIO\n",
"import os, json\n",
"from PIL import Image\n",
"import matplotlib.pyplot as plt\n",
"\n",
"def imgToBase64(img):\n",
" \"\"\"Convert pillow image to base64-encoded image\"\"\"\n",
" imgio = BytesIO()\n",
" img.save(imgio, 'JPEG')\n",
" img_str = base64.b64encode(imgio.getvalue())\n",
" return img_str.decode('utf-8')\n",
"\n",
"test_img = os.path.join('hymenoptera_data', 'val', 'bees', '10870992_eebeeb3a12.jpg') #arbitary image from val dataset\n",
"plt.imshow(Image.open(test_img))"
]
@@ -710,18 +734,42 @@
"metadata": {},
"outputs": [],
"source": [
"base64Img = imgToBase64(Image.open(test_img))\n",
"import torch\n",
"from torchvision import transforms\n",
" \n",
"result = service.run(input_data=json.dumps({'data': base64Img}))\n",
"print(json.loads(result))"
"def preprocess(image_file):\n",
" \"\"\"Preprocess the input image.\"\"\"\n",
" data_transforms = transforms.Compose([\n",
" transforms.Resize(256),\n",
" transforms.CenterCrop(224),\n",
" transforms.ToTensor(),\n",
" transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n",
" ])\n",
"\n",
" image = Image.open(image_file)\n",
" image = data_transforms(image).float()\n",
" image = torch.tensor(image)\n",
" image = image.unsqueeze(0)\n",
" return image.numpy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"input_data = preprocess(test_img)\n",
"result = service.run(input_data=json.dumps({'data': input_data.tolist()}))\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete web service\n",
"Once you no longer need the web service, you should delete it."
"## Clean up\n",
"Once you no longer need the web service, you can delete it with a simple API call."
]
},
{
@@ -735,6 +783,11 @@
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -750,7 +803,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.6.2"
},
"msauthor": "minxia"
},

View File

@@ -5,35 +5,10 @@ import torch
import torch.nn as nn
from torchvision import transforms
import json
import base64
from io import BytesIO
from PIL import Image
from azureml.core.model import Model
def preprocess_image(image_file):
"""Preprocess the input image."""
data_transforms = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image = Image.open(image_file)
image = data_transforms(image).float()
image = torch.tensor(image)
image = image.unsqueeze(0)
return image
def base64ToImg(base64ImgString):
base64Img = base64ImgString.encode('utf-8')
decoded_img = base64.b64decode(base64Img)
return BytesIO(decoded_img)
def init():
global model
model_path = Model.get_model_path('pytorch-hymenoptera')
@@ -42,16 +17,15 @@ def init():
def run(input_data):
img = base64ToImg(json.loads(input_data)['data'])
img = preprocess_image(img)
input_data = torch.tensor(json.loads(input_data)['data'])
# get prediction
output = model(img)
with torch.no_grad():
output = model(input_data)
classes = ['ants', 'bees']
softmax = nn.Softmax(dim=1)
pred_probs = softmax(model(img)).detach().numpy()[0]
pred_probs = softmax(output).numpy()[0]
index = torch.argmax(output, 1)
result = json.dumps({"label": classes[index], "probability": str(pred_probs[index])})
result = {"label": classes[index], "probability": str(pred_probs[index])}
return result

View File

@@ -17,7 +17,7 @@ import argparse
from azureml.core.run import Run
# get the Azure ML run object
run = Run.get_submitted_run()
run = Run.get_context()
def load_data(data_dir):
@@ -59,6 +59,7 @@ def train_model(model, criterion, optimizer, scheduler, num_epochs, data_dir):
dataloaders, dataset_sizes, class_names = load_data(data_dir)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
@@ -146,12 +147,15 @@ def fine_tune_model(num_epochs, data_dir, learning_rate, momentum):
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=learning_rate, momentum=momentum)
optimizer_ft = optim.SGD(model_ft.parameters(),
lr=learning_rate, momentum=momentum)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
exp_lr_scheduler = lr_scheduler.StepLR(
optimizer_ft, step_size=7, gamma=0.1)
model = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs, data_dir)
model = train_model(model_ft, criterion, optimizer_ft,
exp_lr_scheduler, num_epochs, data_dir)
return model
@@ -159,15 +163,19 @@ def fine_tune_model(num_epochs, data_dir, learning_rate, momentum):
def main():
# get command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, help='directory of training data')
parser.add_argument('--num_epochs', type=int, default=25, help='number of epochs to train')
parser.add_argument('--data_dir', type=str,
help='directory of training data')
parser.add_argument('--num_epochs', type=int, default=25,
help='number of epochs to train')
parser.add_argument('--output_dir', type=str, help='output directory')
parser.add_argument('--learning_rate', type=float, help='learning rate')
parser.add_argument('--momentum', type=float, help='momentum')
parser.add_argument('--learning_rate', type=float,
default=0.001, help='learning rate')
parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
args = parser.parse_args()
print("data directory is: " + args.data_dir)
model = fine_tune_model(args.num_epochs, args.data_dir, args.learning_rate, args.momentum)
model = fine_tune_model(args.num_epochs, args.data_dir,
args.learning_rate, args.momentum)
os.makedirs(args.output_dir, exist_ok=True)
torch.save(model, os.path.join(args.output_dir, 'model.pt'))

View File

@@ -18,7 +18,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -42,6 +41,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -82,7 +103,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
@@ -93,10 +114,8 @@
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
@@ -243,7 +262,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -265,6 +284,11 @@
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -0,0 +1 @@
/data/

View File

@@ -17,7 +17,7 @@
}
},
"source": [
"# 03. Training MNIST dataset with hyperparameter tuning & deploy to ACI\n",
"# 03. Training, hyperparameter tune, and deploy with TensorFlow\n",
"\n",
"## Introduction\n",
"This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n",
@@ -72,6 +72,28 @@
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -203,9 +225,23 @@
"metadata": {},
"source": [
"## Upload MNIST dataset to default datastore \n",
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share.\n",
"\n",
"In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on a Batch AI cluster for training.\n"
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = ws.get_default_datastore()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on a Batch AI cluster for training."
]
},
{
@@ -214,7 +250,6 @@
"metadata": {},
"outputs": [],
"source": [
"ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
]
},
@@ -230,7 +265,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name in the previous cell, then we will create a new cluster here. We will create a Batch AI Cluster of `STANDARD_D2_V2` CPU VMs. This process is broken down into 3 steps:\n",
"If we could not find the cluster with the given name in the previous cell, then we will create a new cluster here. We will create a Batch AI Cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the Batch AI cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
@@ -242,29 +277,27 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"batchai_cluster_name = \"gpucluster\"\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" # look for the existing cluster by name\n",
" compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n",
" if type(compute_target) is BatchAiCompute:\n",
" print('found compute target {}, just use it.'.format(batchai_cluster_name))\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" if type(compute_target) is AmlCompute:\n",
" print('Found existing compute target {}.'.format(cluster_name))\n",
" else:\n",
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n",
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(cluster_name))\n",
"except ComputeTargetException:\n",
" print('creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n",
" #vm_priority='lowpriority', # optional\n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
@@ -278,7 +311,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets()` function returns. You should now see one entry named 'cpucluster' of type BatchAI."
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpucluster' of type BatchAI."
]
},
{
@@ -287,8 +320,9 @@
"metadata": {},
"outputs": [],
"source": [
"for ct in ws.compute_targets():\n",
" print(ct.name, ct.type, ct.provisioning_state)"
"compute_targets = ws.compute_targets\n",
"for name, ct in compute_targets.items():\n",
" print(name, ct.type, ct.provisioning_state)"
]
},
{
@@ -338,7 +372,7 @@
" parser = argparse.ArgumentParser()\n",
" parser.add_argument('--data_folder')\n",
"```\n",
"2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_submitted_run()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n",
"2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n",
"```\n",
" run.log('training_acc', np.float(acc_train))\n",
" run.log('validation_acc', np.float(acc_val))\n",
@@ -409,7 +443,7 @@
"metadata": {},
"outputs": [],
"source": [
"run = exp.submit(config=est)"
"run = exp.submit(est)"
]
},
{
@@ -437,7 +471,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -457,6 +491,15 @@
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -705,9 +748,10 @@
"source": [
"htc = HyperDriveRunConfig(estimator=est, \n",
" hyperparameter_sampling=ps, \n",
" policy=policy, \n",
" primary_metric_name='validation_acc', \n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
" max_total_runs=20,\n",
" max_total_runs=8,\n",
" max_concurrent_runs=4)"
]
},
@@ -743,6 +787,15 @@
"RunDetails(htr).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"htr.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -773,7 +826,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(best_run.get_file_names()"
"print(best_run.get_file_names())"
]
},
{
@@ -836,7 +889,7 @@
" # make prediction\n",
" out = output.eval(session=sess, feed_dict={X: data})\n",
" y_hat = np.argmax(out, axis=1)\n",
" return json.dumps(y_hat.tolist())"
" return y_hat.tolist()"
]
},
{
@@ -988,7 +1041,7 @@
"test_samples = bytes(test_samples, encoding='utf8')\n",
"\n",
"# predict using the deployed model\n",
"result = json.loads(service.run(input_data = test_samples))\n",
"result = service.run(input_data=test_samples)\n",
"\n",
"# compare actual value vs. the predicted values:\n",
"i = 0\n",
@@ -1056,14 +1109,17 @@
"metadata": {},
"outputs": [],
"source": [
"for model in ws.models():\n",
" print(\"Model:\", model.name, model.id)\n",
"models = ws.models\n",
"for name, model in models.items():\n",
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
" \n",
"for image in ws.images():\n",
" print(\"Image:\", image.name, image.image_location)\n",
"images = ws.images\n",
"for name, image in images.items():\n",
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
" \n",
"for webservice in ws.webservices():\n",
" print(\"Webservice:\", webservice.name, webservice.scoring_uri)"
"webservices = ws.webservices\n",
"for name, webservice in webservices.items():\n",
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
]
},
{
@@ -1082,26 +1138,14 @@
"source": [
"service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also delete the computer cluster. But remember if you set the `cluster_min_nodes` value to 0 when you created the cluster, once the jobs are finished, all nodes are deleted automatically. So you don't have to delete the cluster itself since it won't incur any cost. Next time you submit jobs to it, the cluster will then automatically \"grow\" up to the `cluster_min_nodes` which is set to 4."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# delete the cluster if you need to.\n",
"compute_target.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -1119,505 +1163,7 @@
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"nbpresent": {
"slides": {
"05bb34ad-74b0-42b3-9654-8357d1ba9c99": {
"id": "05bb34ad-74b0-42b3-9654-8357d1ba9c99",
"prev": "851089af-9725-40c9-8f0b-9bf892b2b1fe",
"regions": {
"23fb396d-50f9-4770-adb3-0d6abcb40767": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "2039d2d5-aca6-4f25-a12f-df9ae6529cae",
"part": "whole"
},
"id": "23fb396d-50f9-4770-adb3-0d6abcb40767"
}
}
},
"11bebe14-d1dc-476d-a31a-5828b9c3adf0": {
"id": "11bebe14-d1dc-476d-a31a-5828b9c3adf0",
"prev": "502648cb-26fe-496b-899f-84c8fe1dcbc0",
"regions": {
"a42499db-623e-4414-bea2-ff3617fd8fc5": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "4788c040-27a2-4dc1-8ed0-378a99b3a255",
"part": "whole"
},
"id": "a42499db-623e-4414-bea2-ff3617fd8fc5"
}
}
},
"134f92d0-6389-4226-af51-1134ae8e8278": {
"id": "134f92d0-6389-4226-af51-1134ae8e8278",
"prev": "36b8728c-32ad-4941-be03-5cef51cdc430",
"regions": {
"b6d82a77-2d58-4b9e-a375-3103214b826c": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "7ab0e6d0-1f1c-451b-8ac5-687da44a8287",
"part": "whole"
},
"id": "b6d82a77-2d58-4b9e-a375-3103214b826c"
}
}
},
"282a2421-697b-4fd0-9485-755abf5a0c18": {
"id": "282a2421-697b-4fd0-9485-755abf5a0c18",
"prev": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2",
"regions": {
"522fec96-abe7-4a34-bd34-633733afecc8": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "d58e7785-c2ee-4a45-8e3d-4c538bf8075a",
"part": "whole"
},
"id": "522fec96-abe7-4a34-bd34-633733afecc8"
}
}
},
"2dfec088-8a70-411a-9199-904ef3fa2383": {
"id": "2dfec088-8a70-411a-9199-904ef3fa2383",
"prev": "282a2421-697b-4fd0-9485-755abf5a0c18",
"regions": {
"0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3",
"part": "whole"
},
"id": "0535fcb6-3a2b-4b46-98a7-3ebb1a38c47e"
}
}
},
"36a814c9-c540-4a6d-92d9-c03553d3d2c2": {
"id": "36a814c9-c540-4a6d-92d9-c03553d3d2c2",
"prev": "b52e4d09-5186-44e5-84db-3371c087acde",
"regions": {
"8bfba503-9907-43f0-b1a6-46a0b4311793": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "d5e4a56c-dfac-4346-be83-1c15b503deac",
"part": "whole"
},
"id": "8bfba503-9907-43f0-b1a6-46a0b4311793"
}
}
},
"36b8728c-32ad-4941-be03-5cef51cdc430": {
"id": "36b8728c-32ad-4941-be03-5cef51cdc430",
"prev": "05bb34ad-74b0-42b3-9654-8357d1ba9c99",
"regions": {
"a36a5bdf-7f62-49b0-8634-e155a98851dc": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "e33dfc47-e7df-4623-a7a6-ab6bcf944629",
"part": "whole"
},
"id": "a36a5bdf-7f62-49b0-8634-e155a98851dc"
}
}
},
"3f136f2a-f14c-4a4b-afea-13380556a79c": {
"id": "3f136f2a-f14c-4a4b-afea-13380556a79c",
"prev": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3",
"regions": {
"80ecf237-d1b0-401e-83d2-6d04b7fcebd3": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "7debeb2b-ecea-414f-9b50-49657abb3e6a",
"part": "whole"
},
"id": "80ecf237-d1b0-401e-83d2-6d04b7fcebd3"
}
}
},
"502648cb-26fe-496b-899f-84c8fe1dcbc0": {
"id": "502648cb-26fe-496b-899f-84c8fe1dcbc0",
"prev": "3f136f2a-f14c-4a4b-afea-13380556a79c",
"regions": {
"4c83bb4d-2a52-41ba-a77f-0c6efebd83a6": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "dbd22f6b-6d49-4005-b8fe-422ef8ef1d42",
"part": "whole"
},
"id": "4c83bb4d-2a52-41ba-a77f-0c6efebd83a6"
}
}
},
"54cb8dfd-a89c-4922-867b-3c87d8b67cd3": {
"id": "54cb8dfd-a89c-4922-867b-3c87d8b67cd3",
"prev": "aa224267-f885-4c0c-95af-7bacfcc186d9",
"regions": {
"0848f0a7-032d-46c7-b35c-bfb69c83f961": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "3c32c557-d0e8-4bb3-a61a-aa51a767cd4e",
"part": "whole"
},
"id": "0848f0a7-032d-46c7-b35c-bfb69c83f961"
}
}
},
"636b563c-faee-4c9e-a6a3-f46a905bfa82": {
"id": "636b563c-faee-4c9e-a6a3-f46a905bfa82",
"prev": "c5f59b98-a227-4344-9d6d-03abdd01c6aa",
"regions": {
"9c64f662-05dc-4b14-9cdc-d450b96f4368": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "70640ac0-7041-47a8-9a7f-e871defd74b2",
"part": "whole"
},
"id": "9c64f662-05dc-4b14-9cdc-d450b96f4368"
}
}
},
"793cec2f-8413-484d-aa1e-388fd2b53a45": {
"id": "793cec2f-8413-484d-aa1e-388fd2b53a45",
"prev": "c66f3dfd-2d27-482b-be78-10ba733e826b",
"regions": {
"d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "dd56113e-e3db-41ae-91b7-2472ed194308",
"part": "whole"
},
"id": "d08f9cfa-3b8d-4fb4-91ba-82d9858ea93e"
}
}
},
"83e912ff-260a-4391-8a12-331aba098506": {
"id": "83e912ff-260a-4391-8a12-331aba098506",
"prev": "fe5a0732-69f5-462a-8af6-851f84a9fdec",
"regions": {
"2fefcf5f-ea20-4604-a528-5e6c91bcb100": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea",
"part": "whole"
},
"id": "2fefcf5f-ea20-4604-a528-5e6c91bcb100"
}
}
},
"851089af-9725-40c9-8f0b-9bf892b2b1fe": {
"id": "851089af-9725-40c9-8f0b-9bf892b2b1fe",
"prev": "636b563c-faee-4c9e-a6a3-f46a905bfa82",
"regions": {
"31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "8408b90e-6cdd-44d1-86d3-648c23f877ac",
"part": "whole"
},
"id": "31c9dda5-fdf4-45e2-bcb7-12aa0f30e1d8"
}
}
},
"87ab653d-e804-470f-bde9-c67caaa0f354": {
"id": "87ab653d-e804-470f-bde9-c67caaa0f354",
"prev": "a8c2d446-caee-42c8-886a-ed98f4935d78",
"regions": {
"bc3aeb56-c465-4868-a1ea-2de82584de98": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "59f52294-4a25-4c92-bab8-3b07f0f44d15",
"part": "whole"
},
"id": "bc3aeb56-c465-4868-a1ea-2de82584de98"
}
}
},
"8b887c97-83bc-4395-83ac-f6703cbe243d": {
"id": "8b887c97-83bc-4395-83ac-f6703cbe243d",
"prev": "36a814c9-c540-4a6d-92d9-c03553d3d2c2",
"regions": {
"9d0bc72a-cb13-483f-a572-2bf60d0d145f": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "75499c85-d0a1-43db-8244-25778b9b2736",
"part": "whole"
},
"id": "9d0bc72a-cb13-483f-a572-2bf60d0d145f"
}
}
},
"a8b9ceb9-b38f-4489-84df-b644c6fe28f2": {
"id": "a8b9ceb9-b38f-4489-84df-b644c6fe28f2",
"prev": null,
"regions": {
"f741ed94-3f24-4427-b615-3ab8753e5814": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "bf74d2e9-2708-49b1-934b-e0ede342f475",
"part": "whole"
},
"id": "f741ed94-3f24-4427-b615-3ab8753e5814"
}
}
},
"a8c2d446-caee-42c8-886a-ed98f4935d78": {
"id": "a8c2d446-caee-42c8-886a-ed98f4935d78",
"prev": "2dfec088-8a70-411a-9199-904ef3fa2383",
"regions": {
"f03457d8-b2a7-4e14-9a73-cab80c5b815d": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "edaa7f2f-2439-4148-b57a-8c794c0945ec",
"part": "whole"
},
"id": "f03457d8-b2a7-4e14-9a73-cab80c5b815d"
}
}
},
"aa224267-f885-4c0c-95af-7bacfcc186d9": {
"id": "aa224267-f885-4c0c-95af-7bacfcc186d9",
"prev": "793cec2f-8413-484d-aa1e-388fd2b53a45",
"regions": {
"0d7ac442-5e1d-49a5-91b3-1432d72449d8": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "4d6826fe-2cb8-4468-85ed-a242a1ce7155",
"part": "whole"
},
"id": "0d7ac442-5e1d-49a5-91b3-1432d72449d8"
}
}
},
"b52e4d09-5186-44e5-84db-3371c087acde": {
"id": "b52e4d09-5186-44e5-84db-3371c087acde",
"prev": "134f92d0-6389-4226-af51-1134ae8e8278",
"regions": {
"7af7d997-80b2-497d-bced-ef8341763439": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "376882ec-d469-4fad-9462-18e4bbea64ca",
"part": "whole"
},
"id": "7af7d997-80b2-497d-bced-ef8341763439"
}
}
},
"c5f59b98-a227-4344-9d6d-03abdd01c6aa": {
"id": "c5f59b98-a227-4344-9d6d-03abdd01c6aa",
"prev": "83e912ff-260a-4391-8a12-331aba098506",
"regions": {
"7268abff-0540-4c06-aefc-c386410c0953": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "396d478b-34aa-4afa-9898-cdce8222a516",
"part": "whole"
},
"id": "7268abff-0540-4c06-aefc-c386410c0953"
}
}
},
"c66f3dfd-2d27-482b-be78-10ba733e826b": {
"id": "c66f3dfd-2d27-482b-be78-10ba733e826b",
"prev": "8b887c97-83bc-4395-83ac-f6703cbe243d",
"regions": {
"6cbe8e0e-8645-41a1-8a38-e44acb81be4b": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "7594c7c7-b808-48f7-9500-d7830a07968a",
"part": "whole"
},
"id": "6cbe8e0e-8645-41a1-8a38-e44acb81be4b"
}
}
},
"d22045e5-7e3e-452e-bc7b-c6c4a893da8e": {
"id": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e",
"prev": "ec41f96a-63a3-4825-9295-f4657a440ddb",
"regions": {
"24e2a3a9-bf65-4dab-927f-0bf6ffbe581d": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "defe921f-8097-44c3-8336-8af6700804a7",
"part": "whole"
},
"id": "24e2a3a9-bf65-4dab-927f-0bf6ffbe581d"
}
}
},
"d24c958c-e419-4e4d-aa9c-d228a8ca55e4": {
"id": "d24c958c-e419-4e4d-aa9c-d228a8ca55e4",
"prev": "11bebe14-d1dc-476d-a31a-5828b9c3adf0",
"regions": {
"25312144-9faa-4680-bb8e-6307ea71370f": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "bed09a92-9a7a-473b-9464-90e479883a3e",
"part": "whole"
},
"id": "25312144-9faa-4680-bb8e-6307ea71370f"
}
}
},
"ec41f96a-63a3-4825-9295-f4657a440ddb": {
"id": "ec41f96a-63a3-4825-9295-f4657a440ddb",
"prev": "87ab653d-e804-470f-bde9-c67caaa0f354",
"regions": {
"22e8be98-c254-4d04-b0e4-b9b5ae46eefe": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "bc70f780-c240-4779-96f3-bc5ef9a37d59",
"part": "whole"
},
"id": "22e8be98-c254-4d04-b0e4-b9b5ae46eefe"
}
}
},
"fe5a0732-69f5-462a-8af6-851f84a9fdec": {
"id": "fe5a0732-69f5-462a-8af6-851f84a9fdec",
"prev": "d22045e5-7e3e-452e-bc7b-c6c4a893da8e",
"regions": {
"671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b": {
"attrs": {
"height": 0.8,
"width": 0.8,
"x": 0.1,
"y": 0.1
},
"content": {
"cell": "fd46e2ab-4ab6-4001-b536-1f323525d7d3",
"part": "whole"
},
"id": "671b89f5-fa9c-4bc1-bdeb-6e0a4ce8939b"
}
}
}
},
"themes": {}
}
"msauthor": "minxia"
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -39,7 +39,7 @@ n_h1 = args.n_hidden_1
n_h2 = args.n_hidden_2
n_outputs = 10
learning_rate = args.learning_rate
n_epochs = 50
n_epochs = 20
batch_size = args.batch_size
with tf.name_scope('network'):
@@ -64,7 +64,7 @@ init = tf.global_variables_initializer()
saver = tf.train.Saver()
# start an Azure ML run
run = Run.get_submitted_run()
run = Run.get_context()
with tf.Session() as sess:
init.run()

View File

@@ -0,0 +1,2 @@
/data/
/tf-distr-hvd/

View File

@@ -18,7 +18,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -42,6 +41,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -81,7 +102,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
@@ -92,10 +113,8 @@
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
@@ -148,7 +167,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore. The below code will upload the contents of the data directory to the path `./data` on the default datastore."
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore."
]
},
{
@@ -158,8 +177,22 @@
"outputs": [],
"source": [
"ds = ws.get_default_datastore()\n",
"print(ds.datastore_type, ds.account_name, ds.container_name)\n",
"\n",
"print(ds.datastore_type, ds.account_name, ds.container_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upload the contents of the data directory to the path `./data` on the default datastore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)"
]
},
@@ -202,6 +235,8 @@
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"project_folder = './tf-distr-hvd'\n",
"os.makedirs(project_folder, exist_ok=True)"
]
@@ -314,7 +349,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -336,6 +371,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -222,7 +222,7 @@ with tf.Session(graph=graph, config=config) as session:
init.run()
bcast.run()
print('Initialized')
run = Run.get_submitted_run()
run = Run.get_context()
average_loss = 0
for step in xrange(num_steps):
# simulate various sentence length by randomization

View File

@@ -0,0 +1 @@
/tf-distr-ps/

View File

@@ -41,6 +41,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -80,7 +102,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
@@ -91,10 +113,8 @@
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
@@ -187,7 +207,8 @@
"from azureml.train.dnn import TensorFlow\n",
"\n",
"script_params={\n",
" '--num_gpus': 1\n",
" '--num_gpus': 1,\n",
" '--train_steps': 500\n",
"}\n",
"\n",
"estimator = TensorFlow(source_directory=project_folder,\n",
@@ -223,7 +244,7 @@
"outputs": [],
"source": [
"run = experiment.submit(estimator)\n",
"print(run.get_details())"
"print(run)"
]
},
{
@@ -240,7 +261,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -262,6 +283,11 @@
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -263,7 +263,7 @@ def main(unused_argv):
print("After %d training step(s), validation cross entropy = %g" %
(FLAGS.train_steps, val_xent))
if job_name == "worker" and task_index == 0:
run = Run.get_submitted_run()
run = Run.get_context()
run.log("CrossEntropy", val_xent)

View File

@@ -40,6 +40,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -80,7 +102,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
@@ -91,10 +113,8 @@
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=6)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
@@ -261,7 +281,7 @@
"from azureml.train.estimator import *\n",
"\n",
"script_params = {\n",
" '--num_epochs': 50,\n",
" '--num_epochs': 20,\n",
" '--data_dir': ds_data.as_mount(),\n",
" '--output_dir': './outputs'\n",
"}\n",
@@ -319,7 +339,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -341,6 +361,11 @@
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -1,321 +0,0 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Script adapted from:
# 1. https://github.com/Microsoft/CNTK/blob/v2.0/Tutorials/CNTK_103A_MNIST_DataLoader.ipynb
# 2. https://github.com/Microsoft/CNTK/blob/v2.0/Tutorials/CNTK_103C_MNIST_MultiLayerPerceptron.ipynb
# ===================================================================================================
"""Train a CNTK multi-layer perceptron on the MNIST dataset."""
from __future__ import print_function
import gzip
import numpy as np
import os
import shutil
import struct
import sys
import time
import cntk as C
from azureml.core.run import Run
import argparse
run = Run.get_submitted_run()
parser = argparse.ArgumentParser()
parser.add_argument('--learning_rate', type=float, default=0.001, help='learning rate')
parser.add_argument('--num_hidden_layers', type=int, default=2, help='number of hidden layers')
parser.add_argument('--minibatch_size', type=int, default=64, help='minibatchsize')
args = parser.parse_args()
# Functions to load MNIST images and unpack into train and test set.
# - loadData reads image data and formats into a 28x28 long array
# - loadLabels reads the corresponding labels data, 1 for each image
# - load packs the downloaded image and labels data into a combined format to be read later by
# CNTK text reader
def loadData(src, cimg):
print('Downloading ' + src)
gzfname, h = urlretrieve(src, './delete.me')
print('Done.')
try:
with gzip.open(gzfname) as gz:
n = struct.unpack('I', gz.read(4))
# Read magic number.
if n[0] != 0x3080000:
raise Exception('Invalid file: unexpected magic number.')
# Read number of entries.
n = struct.unpack('>I', gz.read(4))[0]
if n != cimg:
raise Exception('Invalid file: expected {0} entries.'.format(cimg))
crow = struct.unpack('>I', gz.read(4))[0]
ccol = struct.unpack('>I', gz.read(4))[0]
if crow != 28 or ccol != 28:
raise Exception('Invalid file: expected 28 rows/cols per image.')
# Read data.
res = np.fromstring(gz.read(cimg * crow * ccol), dtype=np.uint8)
finally:
os.remove(gzfname)
return res.reshape((cimg, crow * ccol))
def loadLabels(src, cimg):
print('Downloading ' + src)
gzfname, h = urlretrieve(src, './delete.me')
print('Done.')
try:
with gzip.open(gzfname) as gz:
n = struct.unpack('I', gz.read(4))
# Read magic number.
if n[0] != 0x1080000:
raise Exception('Invalid file: unexpected magic number.')
# Read number of entries.
n = struct.unpack('>I', gz.read(4))
if n[0] != cimg:
raise Exception('Invalid file: expected {0} rows.'.format(cimg))
# Read labels.
res = np.fromstring(gz.read(cimg), dtype=np.uint8)
finally:
os.remove(gzfname)
return res.reshape((cimg, 1))
def try_download(dataSrc, labelsSrc, cimg):
data = loadData(dataSrc, cimg)
labels = loadLabels(labelsSrc, cimg)
return np.hstack((data, labels))
# Save the data files into a format compatible with CNTK text reader
def savetxt(filename, ndarray):
dir = os.path.dirname(filename)
if not os.path.exists(dir):
os.makedirs(dir)
if not os.path.isfile(filename):
print("Saving", filename)
with open(filename, 'w') as f:
labels = list(map(' '.join, np.eye(10, dtype=np.uint).astype(str)))
for row in ndarray:
row_str = row.astype(str)
label_str = labels[row[-1]]
feature_str = ' '.join(row_str[:-1])
f.write('|labels {} |features {}\n'.format(label_str, feature_str))
else:
print("File already exists", filename)
# Read a CTF formatted text (as mentioned above) using the CTF deserializer from a file
def create_reader(path, is_training, input_dim, num_label_classes):
return C.io.MinibatchSource(C.io.CTFDeserializer(path, C.io.StreamDefs(
labels=C.io.StreamDef(field='labels', shape=num_label_classes, is_sparse=False),
features=C.io.StreamDef(field='features', shape=input_dim, is_sparse=False)
)), randomize=is_training, max_sweeps=C.io.INFINITELY_REPEAT if is_training else 1)
# Defines a utility that prints the training progress
def print_training_progress(trainer, mb, frequency, verbose=1):
training_loss = "NA"
eval_error = "NA"
if mb % frequency == 0:
training_loss = trainer.previous_minibatch_loss_average
eval_error = trainer.previous_minibatch_evaluation_average
if verbose:
print("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}%".format(mb, training_loss, eval_error * 100))
return mb, training_loss, eval_error
# Create the network architecture
def create_model(features):
with C.layers.default_options(init=C.layers.glorot_uniform(), activation=C.ops.relu):
h = features
for _ in range(num_hidden_layers):
h = C.layers.Dense(hidden_layers_dim)(h)
r = C.layers.Dense(num_output_classes, activation=None)(h)
return r
if __name__ == '__main__':
run = Run.get_submitted_run()
try:
from urllib.request import urlretrieve
except ImportError:
from urllib import urlretrieve
# Select the right target device when this script is being used:
if 'TEST_DEVICE' in os.environ:
if os.environ['TEST_DEVICE'] == 'cpu':
C.device.try_set_default_device(C.device.cpu())
else:
C.device.try_set_default_device(C.device.gpu(0))
# URLs for the train image and labels data
url_train_image = 'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz'
url_train_labels = 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz'
num_train_samples = 60000
print("Downloading train data")
train = try_download(url_train_image, url_train_labels, num_train_samples)
url_test_image = 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz'
url_test_labels = 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
num_test_samples = 10000
print("Downloading test data")
test = try_download(url_test_image, url_test_labels, num_test_samples)
# Save the train and test files (prefer our default path for the data
rank = os.environ.get("OMPI_COMM_WORLD_RANK")
data_dir = os.path.join("outputs", "MNIST")
sentinel_path = os.path.join(data_dir, "complete.txt")
if rank == '0':
print('Writing train text file...')
savetxt(os.path.join(data_dir, "Train-28x28_cntk_text.txt"), train)
print('Writing test text file...')
savetxt(os.path.join(data_dir, "Test-28x28_cntk_text.txt"), test)
with open(sentinel_path, 'w+') as f:
f.write("download complete")
print('Done with downloading data.')
else:
while not os.path.exists(sentinel_path):
time.sleep(0.01)
# Ensure we always get the same amount of randomness
np.random.seed(0)
# Define the data dimensions
input_dim = 784
num_output_classes = 10
# Ensure the training and test data is generated and available for this tutorial.
# We search in two locations in the toolkit for the cached MNIST data set.
data_found = False
for data_dir in [os.path.join("..", "Examples", "Image", "DataSets", "MNIST"),
os.path.join("data_" + str(rank), "MNIST"),
os.path.join("outputs", "MNIST")]:
train_file = os.path.join(data_dir, "Train-28x28_cntk_text.txt")
test_file = os.path.join(data_dir, "Test-28x28_cntk_text.txt")
if os.path.isfile(train_file) and os.path.isfile(test_file):
data_found = True
break
if not data_found:
raise ValueError("Please generate the data by completing CNTK 103 Part A")
print("Data directory is {0}".format(data_dir))
num_hidden_layers = args.num_hidden_layers
hidden_layers_dim = 400
input = C.input_variable(input_dim)
label = C.input_variable(num_output_classes)
z = create_model(input)
# Scale the input to 0-1 range by dividing each pixel by 255.
z = create_model(input / 255.0)
loss = C.cross_entropy_with_softmax(z, label)
label_error = C.classification_error(z, label)
# Instantiate the trainer object to drive the model training
learning_rate = args.learning_rate
lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)
learner = C.sgd(z.parameters, lr_schedule)
trainer = C.Trainer(z, (loss, label_error), [learner])
# Initialize the parameters for the trainer
minibatch_size = args.minibatch_size
num_samples_per_sweep = 60000
num_sweeps_to_train_with = 10
num_minibatches_to_train = (num_samples_per_sweep * num_sweeps_to_train_with) / minibatch_size
# Create the reader to training data set
reader_train = create_reader(train_file, True, input_dim, num_output_classes)
# Map the data streams to the input and labels.
input_map = {
label: reader_train.streams.labels,
input: reader_train.streams.features
}
# Run the trainer on and perform model training
training_progress_output_freq = 500
errors = []
losses = []
for i in range(0, int(num_minibatches_to_train)):
# Read a mini batch from the training data file
data = reader_train.next_minibatch(minibatch_size, input_map=input_map)
trainer.train_minibatch(data)
batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq, verbose=1)
if (error != 'NA') and (loss != 'NA'):
errors.append(float(error))
losses.append(float(loss))
# log the losses
if rank == '0':
run.log_list("Loss", losses)
run.log_list("Error", errors)
# Read the training data
reader_test = create_reader(test_file, False, input_dim, num_output_classes)
test_input_map = {
label: reader_test.streams.labels,
input: reader_test.streams.features,
}
# Test data for trained model
test_minibatch_size = 512
num_samples = 10000
num_minibatches_to_test = num_samples // test_minibatch_size
test_result = 0.0
for i in range(num_minibatches_to_test):
# We are loading test data in batches specified by test_minibatch_size
# Each data point in the minibatch is a MNIST digit image of 784 dimensions
# with one pixel per dimension that we will encode / decode with the
# trained model.
data = reader_test.next_minibatch(test_minibatch_size,
input_map=test_input_map)
eval_error = trainer.test_minibatch(data)
test_result = test_result + eval_error
# Average of evaluation errors of all test minibatches
print("Average test error: {0:.2f}%".format((test_result * 100) / num_minibatches_to_test))
out = C.softmax(z)
# Read the data for evaluation
reader_eval = create_reader(test_file, False, input_dim, num_output_classes)
eval_minibatch_size = 25
eval_input_map = {input: reader_eval.streams.features}
data = reader_test.next_minibatch(eval_minibatch_size, input_map=test_input_map)
img_label = data[label].asarray()
img_data = data[input].asarray()
predicted_label_prob = [out.eval(img_data[i]) for i in range(len(img_data))]
# Find the index with the maximum value for both predicted as well as the ground truth
pred = [np.argmax(predicted_label_prob[i]) for i in range(len(predicted_label_prob))]
gtlabel = [np.argmax(img_label[i]) for i in range(len(img_label))]
print("Label :", gtlabel[:25])
print("Predicted:", pred)
# save model to outputs folder
z.save('outputs/cntk.model')

View File

@@ -41,6 +41,28 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -154,13 +176,13 @@
"from azureml.core.script_run_config import ScriptRunConfig\n",
"import tensorflow as tf\n",
"\n",
"logs_dir = os.curdir + os.sep + \"logs\"\n",
"tensorflow_logs_dir = os.path.join(logs_dir, \"tensorflow\")\n",
"logs_dir = os.path.join(os.curdir, \"logs\")\n",
"data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n",
"\n",
"if not path.exists(tensorflow_logs_dir):\n",
" makedirs(tensorflow_logs_dir)\n",
"if not path.exists(data_dir):\n",
" makedirs(data_dir)\n",
"\n",
"os.environ[\"TEST_TMPDIR\"] = logs_dir\n",
"os.environ[\"TEST_TMPDIR\"] = data_dir\n",
"\n",
"# Writing logs to ./logs results in their being uploaded to Artifact Service,\n",
"# and thus, made accessible to our Tensorboard instance.\n",
@@ -169,15 +191,15 @@
"# Create an experiment\n",
"exp = Experiment(ws, experiment_name)\n",
"\n",
"script = ScriptRunConfig(exp_dir,\n",
" script=\"mnist_with_summaries.py\",\n",
" run_config=run_config)\n",
"\n",
"# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n",
"# arguments_list += [\"--max_steps\", \"5000\"]\n",
"kwargs = {}\n",
"kwargs['arguments_list'] = arguments_list\n",
"run = exp.submit(script, kwargs)\n",
"\n",
"script = ScriptRunConfig(exp_dir,\n",
" script=\"mnist_with_summaries.py\",\n",
" run_config=run_config,\n",
" arguments=arguments_list)\n",
"\n",
"run = exp.submit(script)\n",
"# You can also wait for the run to complete\n",
"# run.wait_for_completion(show_output=True)\n",
"runs.append(run)"
@@ -345,23 +367,21 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import BatchAiCompute\n",
"from azureml.core.compute import AmlCompute\n",
"\n",
"clust_name = ws.name + \"cpu\"\n",
"clust_name = \"cpucluster\"\n",
"\n",
"try:\n",
" # If you already have a cluster named this, we don't need to make a new one.\n",
" cts = ws.compute_targets() \n",
" cts = ws.compute_targets \n",
" compute_target = cts[clust_name]\n",
" assert compute_target.type == 'BatchAI'\n",
"except:\n",
" # Let's make a new one here.\n",
" provisioning_config = BatchAiCompute.provisioning_configuration(cluster_max_nodes=2, \n",
" autoscale_enabled=True, \n",
" cluster_min_nodes=1,\n",
" vm_size='Standard_D11_V2')\n",
" provisioning_config = AmlCompute.provisioning_configuration(max_nodes=6, \n",
" vm_size='STANDARD_D2_V2')\n",
" \n",
" compute_target = BatchAiCompute.create(ws, clust_name, provisioning_config)\n",
" compute_target = AmlCompute.create(ws, clust_name, provisioning_config)\n",
"compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n",
"print(compute_target.name)\n",
"# For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
@@ -481,6 +501,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -220,6 +220,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -57,10 +57,14 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"check version"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%matplotlib notebook\n",
"import numpy as np\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
@@ -84,7 +88,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"load workspace"
]
},
"outputs": [],
"source": [
"# load workspace configuration from the config.json file in the current folder.\n",
@@ -104,7 +112,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"create experiment"
]
},
"outputs": [],
"source": [
"experiment_name = 'sklearn-mnist'\n",
@@ -119,9 +131,9 @@
"source": [
"### Create remote compute target\n",
"\n",
"Azure Azure ML Managed Compute is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Managed Compute cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace. \n",
"Azure Machine Learning Managed Compute(AmlCompute) is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create AmlCompute as your training environment. This code creates compute for you if it does not already exist in your workspace. \n",
"\n",
" **Creation of the cluster takes approximately 5 minutes.** If the cluster is already in the workspace this code uses it and skips the creation process."
" **Creation of the compute takes approximately 5 minutes.** If the compute is already in the workspace this code uses it and skips the creation process."
]
},
{
@@ -135,35 +147,37 @@
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"import os\n",
"\n",
"# choose a name for your cluster\n",
"batchai_cluster_name = \"traincluster\"\n",
"compute_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", \"cpucluster\")\n",
"compute_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 0)\n",
"compute_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 4)\n",
"\n",
"try:\n",
" # look for the existing cluster by name\n",
" compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n",
" if type(compute_target) is BatchAiCompute:\n",
" print('found compute target {}, just use it.'.format(batchai_cluster_name))\n",
"# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n",
"vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_D2_V2\")\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n",
"except ComputeTargetException:\n",
" print('creating a new compute target...')\n",
" compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\", # small CPU-based VM\n",
" #vm_priority='lowpriority', # optional\n",
" autoscale_enabled=True,\n",
" cluster_min_nodes=0, \n",
" cluster_max_nodes=4)\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,\n",
" min_nodes = compute_min_nodes, \n",
" max_nodes = compute_max_nodes)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # Use the 'status' property to get a detailed status for the current cluster. \n",
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
" print(compute_target.status.serialize())"
]
},
@@ -186,13 +200,6 @@
"Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
@@ -265,7 +272,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"use datastore"
]
},
"outputs": [],
"source": [
"ds = ws.get_default_datastore()\n",
@@ -394,7 +405,7 @@
"print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n",
"\n",
"# get hold of the current run\n",
"run = Run.get_submitted_run()\n",
"run = Run.get_context()\n",
"\n",
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n",
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n",
@@ -473,7 +484,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"configure estimator"
]
},
"outputs": [],
"source": [
"from azureml.train.estimator import Estimator\n",
@@ -502,7 +517,13 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"remote run",
"batchai",
"scikit-learn"
]
},
"outputs": [],
"source": [
"run = exp.submit(config=est)\n",
@@ -549,7 +570,7 @@
},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
@@ -565,7 +586,13 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"remote run",
"batchai",
"scikit-learn"
]
},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
@@ -609,7 +636,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"query history"
]
},
"outputs": [],
"source": [
"print(run.get_file_names())"
@@ -625,7 +656,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"register model from history"
]
},
"outputs": [],
"source": [
"# register model \n",
@@ -633,27 +668,6 @@
"print(model.name, model.id, model.version, sep = '\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up resources\n",
"\n",
"If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n",
"\n",
"You can also just delete the Azure Managed Compute cluster. But even if you don't delete it, since `autoscale_enabled` is set to `True`, and `cluster_min_nodes` is set to `0`, when the jobs are done, all cluster nodes will be shut down and you will not incur any additional compute charges. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# optionally, delete the Azure Managed Compute cluster\n",
"compute_target.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -675,6 +689,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -690,7 +709,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.6.2"
},
"msauthor": "sgilley"
},

View File

@@ -39,7 +39,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"# If you did NOT complete the tutorial, you can instead run this cell \n",
@@ -86,10 +90,14 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"check version"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%matplotlib notebook\n",
"import numpy as np\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
@@ -113,7 +121,12 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"load workspace",
"download model"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
@@ -121,7 +134,7 @@
"\n",
"ws = Workspace.from_config()\n",
"model=Model(ws, 'sklearn_mnist')\n",
"model.download(target_dir = '.')\n",
"model.download(target_dir='.', exists_ok=True)\n",
"import os \n",
"# verify the downloaded model file\n",
"os.stat('./sklearn_mnist_model.pkl')"
@@ -283,7 +296,8 @@
" data = np.array(json.loads(raw_data)['data'])\n",
" # make prediction\n",
" y_hat = model.predict(data)\n",
" return json.dumps(y_hat.tolist())"
" # you can return any data type as long as it is JSON-serializable\n",
" return y_hat.tolist()"
]
},
{
@@ -298,7 +312,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"set conda dependencies"
]
},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
@@ -339,7 +357,12 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"configure web service",
"aci"
]
},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
@@ -372,7 +395,14 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"configure image",
"create image",
"deploy web service",
"aci"
]
},
"outputs": [],
"source": [
"%%time\n",
@@ -403,7 +433,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"get scoring uri"
]
},
"outputs": [],
"source": [
"print(service.scoring_uri)"
@@ -430,7 +464,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"score web service"
]
},
"outputs": [],
"source": [
"import json\n",
@@ -443,7 +481,7 @@
"test_samples = bytes(test_samples, encoding='utf8')\n",
"\n",
"# predict using the deployed model\n",
"result = json.loads(service.run(input_data=test_samples))\n",
"result = service.run(input_data=test_samples)\n",
"\n",
"# compare actual value vs. the predicted values:\n",
"i = 0\n",
@@ -475,7 +513,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"score web service"
]
},
"outputs": [],
"source": [
"import requests\n",
@@ -511,7 +553,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"delete web service"
]
},
"outputs": [],
"source": [
"service.delete()"
@@ -540,6 +586,11 @@
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",

View File

@@ -15,7 +15,7 @@
"source": [
"# Tutorial: Train a classification model with automated machine learning\n",
"\n",
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform data preprocessing, algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
"\n",
"[flow diagram](./imgs/flow2.png)\n",
"\n",
@@ -132,13 +132,9 @@
"\n",
"digits = datasets.load_digits()\n",
"\n",
"# only take the first 100 rows if you want the training steps to run faster\n",
"X_digits = digits.data[100:,:]\n",
"y_digits = digits.target[100:]\n",
"\n",
"# use full dataset\n",
"#X_digits = digits.data\n",
"#y_digits = digits.target"
"# Exclude the first 100 rows from training so that they can be used for test.\n",
"X_train = digits.data[100:,:]\n",
"y_train = digits.target[100:]"
]
},
{
@@ -159,13 +155,13 @@
"count = 0\n",
"sample_size = 30\n",
"plt.figure(figsize = (16, 6))\n",
"for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n",
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
" count = count + 1\n",
" plt.subplot(1, sample_size, count)\n",
" plt.axhline('')\n",
" plt.axvline('')\n",
" plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n",
" plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
" plt.text(x = 2, y = -2, s = y_train[i], fontsize = 18)\n",
" plt.imshow(X_train[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
"plt.show()"
]
},
@@ -191,15 +187,18 @@
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
"|**n_cross_validations**|3|Number of cross validation splits|\n",
"|**preprocess**|False| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n",
"|**exit_score**|0.995|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
"|**exit_score**|0.9985|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"configure automl"
]
},
"outputs": [],
"source": [
"from azureml.train.automl import AutoMLConfig\n",
@@ -210,11 +209,10 @@
" max_time_sec = 12000,\n",
" iterations = 20,\n",
" n_cross_validations = 3,\n",
" preprocess = False,\n",
" exit_score = 0.995,\n",
" exit_score = 0.9985,\n",
" blacklist_algos = ['kNN','LinearSVM'],\n",
" X = X_digits,\n",
" y = y_digits,\n",
" X = X_train,\n",
" y = y_train,\n",
" path=project_folder)"
]
},
@@ -230,7 +228,12 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"local submitted run",
"automl"
]
},
"outputs": [],
"source": [
"from azureml.core.experiment import Experiment\n",
@@ -254,10 +257,14 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"use notebook widget"
]
},
"outputs": [],
"source": [
"from azureml.train.widgets import RunDetails\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show()"
]
},
@@ -273,7 +280,12 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"get metrics",
"query history"
]
},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
@@ -300,7 +312,12 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": [
"query history",
"register model from history"
]
},
"outputs": [],
"source": [
"# find the run with the highest accuracy value.\n",
@@ -332,8 +349,10 @@
"source": [
"# find 30 random samples from test set\n",
"n = 30\n",
"sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n",
"test_samples = X_digits[sample_indices]\n",
"X_test = digits.data[:100, :]\n",
"y_test = digits.target[:100]\n",
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
"test_samples = X_test[sample_indices]\n",
"\n",
"\n",
"# predict using the model\n",
@@ -349,11 +368,11 @@
" plt.axvline('')\n",
" \n",
" # use different color for misclassified sample\n",
" font_color = 'red' if y_digits[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n",
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
" plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n",
" plt.imshow(X_test[s].reshape(8, 8), cmap = clr_map)\n",
" \n",
" i = i + 1\n",
"plt.show()"
@@ -374,11 +393,16 @@
"> * Review training results\n",
"> * Register the best model\n",
"\n",
"Learn more about [how to configure settings for automatic training]() or [how to use automatic training on a remote resource]()."
"Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-to-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)."
]
}
],
"metadata": {
"authors": [
{
"name": "jeffshep"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",